[dpdk-dev] tcpdump support in DPDK 2.3
    Morten Brørup 
    mb at smartsharesystems.com
       
    Wed Dec 16 23:45:46 CET 2015
    
    
  
Bruce,
Matthew presented a very important point a few hours ago: We don't need tcpdump support for debugging the application in a lab; we already have plenty of other tools for debugging what we are developing. We need tcpdump support for debugging network issues in a production network.
In my "hardened network appliance" world, a solution designed purely for legacy applications (tcpdump, Wireshark etc.) is useless because the network technician doesn't have access to these applications on the appliance.
While a PC system running a DPDK based application might have plenty of spare lcores for filtering, the SmartShare appliances are already using all lcores for dedicated purposes, so the runtime filtering has to be done by the IO lcores (otherwise we would have to rehash everything and reallocate some lcores for mirroring, which I strongly oppose). Our non-DPDK firmware has also always been filtering directly in the fast path.
If the filter is so complex that it unexpectedly degrades the normal traffic forwarding performance, the mirror still reflects all the forwarded network traffic, not just some of it. In many real life network debugging scenarios this is better than the alternative: keeping the traffic forwarding up at full performance and having a network technician trying to understand a mirror output where some of the relevant packets are unexpectedly missing.
Although it is generally considered bad design if a system's behavior (or performance) changes unexpectedly when debugging features are being used, experienced network technicians have already grown accustomed to the performance of most non-trivial network equipment depending on the number of features enabled and how it is configured, so reality might beat theory here. (Still, other companies might prefer to keep their fast path performance unaffected and dedicate/reallocate some lcores for filtering.)
I am probably repeating myself here, but I would prefer if the DPDK provided the packet capturing framework in the form of a set of efficient libraries for 1. BPF filtering (e.g. a simple BPF interpreter or a DPDK variant of bpfjit), 2. scalable packet queueing for the mirrored packets (probably multi producer, single or multi consumer), as well as 3. high resolution time stamping (preferably easily convertible to the pcap file packet timestamp format). Then the DPDK application can take care of interfacing to the attached application and outputting the mirrored packets to the appropriate destination, e.g. a pcap file, a Wireshark excap named pipe, a dedicated RSPAN VLAN, or an ERSPAN tunnel. And an example application should show how to bind all this together in a tcpdump-like scenario for debugging a production network.
A note about timestamps: In theory, the captured packets should be time stamped as early as possible. In practice though, it is probably sufficiently accurate to time stamp the accepted packets after filtering, especially if they are filtered by an IO lcore. Alternatively, they can be time stamped when consumed from the mirror output queue.
A note about packet ordering: Mirrored packets belonging to different flows are probably out of order because of RSS, where multiple lcores contribute to the mirror output. This packet ordering inaccuracy could also serve as a reason for not being too strict about the accuracy of the timestamps on the mirrored packets.
Med venlig hilsen / kind regards
- Morten Brørup
-----Original Message-----
From: Bruce Richardson [mailto:bruce.richardson at intel.com] 
Sent: 16. december 2015 14:13
To: Morten Brørup
Cc: Matthew Hall; Kyle Larose; dev at dpdk.org
Subject: Re: [dpdk-dev] tcpdump support in DPDK 2.3
On Wed, Dec 16, 2015 at 01:26:11PM +0100, Morten Brørup wrote:
> Bruce,
> 
> Please note that tcpdump is a stupid name for a packet capture application that supports much more than just TCP.
> 
> I had missed the point about ethdev supporting virtual interfaces, so thank you for pointing that out. That covers my concerns about capturing packets inside tunnels.
> 
> I will gladly admit that you Intel guys are probably much more competent in the field of DPDK performance and scalability than I am. So Matthew and I have been asking you to kindly ensure that your solution scales well at very high packet rates too, and pointing out that filtering before copying is probably cheaper than copying before filtering. You mention that it leads to an important choice about which lcores get to do the work of filtering the packets, so that might be worth some discussion.
> 
> :-)
> 
> Med venlig hilsen / kind regards
> - Morten Brørup
> 
Thanks for your support.
We may look at having a certain amount of flexibility in the configuration of the setup, so as to avoid limiting the use of the functionality.
For scalability at very high packet rates, it's something we'll need you guys to give us pointers on too - what's acceptable or not inside an app, and what level of scalabilty is needed. I'd admit that most of our initial thinking in this area was for debugging apps at less than line rate i.e. for functional testing.
For full line rate introspection, we'll have to see when we get some working code.
/Bruce
> 
> -----Original Message-----
> From: Bruce Richardson [mailto:bruce.richardson at intel.com]
> Sent: 16. december 2015 12:56
> To: Morten Brørup
> Cc: Matthew Hall; Kyle Larose; dev at dpdk.org
> Subject: Re: [dpdk-dev] tcpdump support in DPDK 2.3
> 
> On Wed, Dec 16, 2015 at 12:40:43PM +0100, Morten Brørup wrote:
> > Bruce,
> > 
> > This doesn't really sound like tcpdump to me; it sounds like port mirroring.
> 
> It's actually a bit of both, in my opinion, it's designed to allow basic mirroring of traffic on a port to allow that traffic to be sent to a tcpdump destination.
> By going with a more generic approach, we hope to enable more possible use cases than just focusing on TCP.
> 
> 
> > 
> > Your suggestion is limited to physical ports only, and cannot be attached further inside the application, e.g. for mirroring packets related to a specific VLAN.
> 
> Yes, the lack of attachment inside the app is a limitation. There are two types of scenarios that could be considered for packet capture:
> * ones where the application can be modified to do it's own filtering and capturing.
> * ones where you want a generic capture mechanism which can be used on any application without modification.
> We have chosen to focus more on the second one, as that is where a 
> generic solution for DPDK is likely to lie. For the first case, the 
> application writer himself knows the type of traffic and how best to 
> capture and filter it, so I don't think a generic one-size-fits-all 
> solution is possible. [Though a couple of helper libraries may be of 
> use]
> 
> As for physical ports, the scheme should work for any ethdev - why do you see it only being limited to physical ports? What would you want to see monitored that we are missing.
> 
> > 
> > Furthermore, it doesn't sound like the filtering part scales well. Consider a fully loaded 40 Gbit/s port. You would need to copy all packets into a single rte_ring to the attached filtering process, which would then require its own set of lcores to probably discard most of these packets when filtering. I agree with Matthew that the filtering needs to happen as close to the source as possible, and must be scalable to multiple lcores.
> 
> Without modifying the application itself to do it's own filtering I suspect scalability is always going to be a problem. That being said, there is no particular reason why a single rte_ring needs to be used - we could allow one ring per NIC queue for instance. The trouble with filtering at the source itself is that you put extra load on the IO cores. By using a ring, we put the filtering load on extra cores in a secondary process which can be scaled by the user without touching the main app.
> 
> > 
> > On the positive side, your idea has the advantage that the filter can be any application, and is not limited to BPF. However if the purpose is "tcpdump", we should probably consider BPF, which is the type of filtering offered by tcpdump.
> 
> Having this work with any application is one of our primary targets here. The app author should not have to worry too much about getting basic debug support.
> Even if it doesn't work at 40G small packet rates, you can get a lot of benefit from a scheme that provides functional debugging for an app. Obviously, though we aim to make this as scalable as possible, which is why we want to allow fitlering in userspace before sending packets externally to DPDK.
> 
> > 
> > I would prefer having a BPF library available that the application can use at any point, either at the lowest level (when receiving/transmitting Ethernet packets) or at a higher level (e.g. when working with packets that go into or come out of a tunnel). The BPF library should implement packet length and relevant ancillary data, such as SKF_AD_VLAN_TAG etc. based on metadata in the mbuf.
> > 
> > Transferring a BPF filter from an outside application could be done by using a simple text format, e.g. the output format of "tcpdump -ddd". This also opens an easy roadmap for Wireshark integration by simply extending excap to include such a BPF filter format.
> > 
> > 
> > Lots of negativity above. I very much like the idea of attaching the secondary process and going through an rte_ring. This allows the secondary process to pass the filtered and captured packets on in any format it likes to any destination it likes.
> 
> Good, so we're not completely off-base here. :-)
> 
> /Bruce
> 
> > 
> > 
> > Med venlig hilsen / kind regards
> > - Morten Brørup
> > 
> > -----Original Message-----
> > From: Bruce Richardson [mailto:bruce.richardson at intel.com]
> > Sent: 16. december 2015 11:45
> > 
> > Hi,
> > 
> > we are currently doing some investigation and prototyping for this feature.
> > Our current thinking is the following:
> > * to allow dynamic control of the filtering, we are thinking of making use of
> >   the multi-process infrastructure in DPDK. A secondary process can attach to a
> >   primary at runtime and provide the packet filtering and dumping capability.
> > * ideally we want to create a generic packet mirroring callback inside the EAL,
> >   that can be set up to mirror packets going through Rx/Tx on an ethdev.
> > * using this, packets being received on the port to be monitored are sent via
> >   an rte_ring (ring ethdev) to the secondary process which takes those packets
> >   and does any filtering on them. [This would be where BPF could fit into
> >   things, but it's not something we have looked at yet.]
> > * initially we plan to have the secondary process then write packets to a pcap
> >   file using a pcap PMD, but down the road if we get other PMDs, like a KNI PMD
> >   or a TAP device PMD, those could be used as targets instead.
> > 
> > This implementation we hope should provide enough hooks to enable the standard tools to be used for monitoring and capturing packets. We will send out draft implementation code for various parts of this as soon as we have it.
> > 
> > Additional feedback welcome, as always. :-)
> > 
> > Regards,
> > /Bruce
> > 
> > 
> 
    
    
More information about the dev
mailing list