[dpdk-dev] tcpdump support in DPDK 2.3

Bruce Richardson bruce.richardson at intel.com
Wed Dec 16 12:56:11 CET 2015
Previous message: [dpdk-dev] tcpdump support in DPDK 2.3
Next message: [dpdk-dev] tcpdump support in DPDK 2.3
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Wed, Dec 16, 2015 at 12:40:43PM +0100, Morten Brørup wrote:
> Bruce,
> 
> This doesn't really sound like tcpdump to me; it sounds like port mirroring.

It's actually a bit of both, in my opinion, it's designed to allow basic mirroring
of traffic on a port to allow that traffic to be sent to a tcpdump destination.
By going with a more generic approach, we hope to enable more possible use
cases than just focusing on TCP.


> 
> Your suggestion is limited to physical ports only, and cannot be attached further inside the application, e.g. for mirroring packets related to a specific VLAN.

Yes, the lack of attachment inside the app is a limitation. There are two types
of scenarios that could be considered for packet capture:
* ones where the application can be modified to do it's own filtering and
capturing.
* ones where you want a generic capture mechanism which can be used on any
application without modification.
We have chosen to focus more on the second one, as that is where a generic
solution for DPDK is likely to lie. For the first case, the application writer
himself knows the type of traffic and how best to capture and filter it, so I
don't think a generic one-size-fits-all solution is possible. [Though a couple
of helper libraries may be of use]

As for physical ports, the scheme should work for any ethdev - why do you see
it only being limited to physical ports? What would you want to see monitored
that we are missing.

> 
> Furthermore, it doesn't sound like the filtering part scales well. Consider a fully loaded 40 Gbit/s port. You would need to copy all packets into a single rte_ring to the attached filtering process, which would then require its own set of lcores to probably discard most of these packets when filtering. I agree with Matthew that the filtering needs to happen as close to the source as possible, and must be scalable to multiple lcores.

Without modifying the application itself to do it's own filtering I suspect
scalability is always going to be a problem. That being said, there is no
particular reason why a single rte_ring needs to be used - we could allow one
ring per NIC queue for instance. The trouble with filtering at the source itself
is that you put extra load on the IO cores. By using a ring, we put the filtering
load on extra cores in a secondary process which can be scaled by the user without
touching the main app.

> 
> On the positive side, your idea has the advantage that the filter can be any application, and is not limited to BPF. However if the purpose is "tcpdump", we should probably consider BPF, which is the type of filtering offered by tcpdump.

Having this work with any application is one of our primary targets here. The
app author should not have to worry too much about getting basic debug support.
Even if it doesn't work at 40G small packet rates, you can get a lot of benefit
from a scheme that provides functional debugging for an app. Obviously, though
we aim to make this as scalable as possible, which is why we want to allow fitlering
in userspace before sending packets externally to DPDK.

> 
> I would prefer having a BPF library available that the application can use at any point, either at the lowest level (when receiving/transmitting Ethernet packets) or at a higher level (e.g. when working with packets that go into or come out of a tunnel). The BPF library should implement packet length and relevant ancillary data, such as SKF_AD_VLAN_TAG etc. based on metadata in the mbuf.
> 
> Transferring a BPF filter from an outside application could be done by using a simple text format, e.g. the output format of "tcpdump -ddd". This also opens an easy roadmap for Wireshark integration by simply extending excap to include such a BPF filter format.
> 
> 
> Lots of negativity above. I very much like the idea of attaching the secondary process and going through an rte_ring. This allows the secondary process to pass the filtered and captured packets on in any format it likes to any destination it likes.

Good, so we're not completely off-base here. :-)

/Bruce

> 
> 
> Med venlig hilsen / kind regards
> - Morten Brørup
> 
> -----Original Message-----
> From: Bruce Richardson [mailto:bruce.richardson at intel.com] 
> Sent: 16. december 2015 11:45
> 
> Hi,
> 
> we are currently doing some investigation and prototyping for this feature.
> Our current thinking is the following:
> * to allow dynamic control of the filtering, we are thinking of making use of
>   the multi-process infrastructure in DPDK. A secondary process can attach to a
>   primary at runtime and provide the packet filtering and dumping capability.
> * ideally we want to create a generic packet mirroring callback inside the EAL,
>   that can be set up to mirror packets going through Rx/Tx on an ethdev.
> * using this, packets being received on the port to be monitored are sent via
>   an rte_ring (ring ethdev) to the secondary process which takes those packets
>   and does any filtering on them. [This would be where BPF could fit into
>   things, but it's not something we have looked at yet.]
> * initially we plan to have the secondary process then write packets to a pcap
>   file using a pcap PMD, but down the road if we get other PMDs, like a KNI PMD
>   or a TAP device PMD, those could be used as targets instead.
> 
> This implementation we hope should provide enough hooks to enable the standard tools to be used for monitoring and capturing packets. We will send out draft implementation code for various parts of this as soon as we have it.
> 
> Additional feedback welcome, as always. :-)
> 
> Regards,
> /Bruce
> 
>
Previous message: [dpdk-dev] tcpdump support in DPDK 2.3
Next message: [dpdk-dev] tcpdump support in DPDK 2.3
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the dev mailing list