High packet capturing rate in DPDK enabled port

Stephen Hemminger stephen at networkplumber.org
Mon May 6 19:59:59 CEST 2024


On Mon, 6 May 2024 02:15:10 +0600
Fuji Nafiul <nafiul.fuji at gmail.com> wrote:

> I understand that I will need more cores and SSD, which I have. The thing
> is is there any current project available that exposes params to dump the
> highest possible rate with available resources? or I have to use the pdump
> framework and implement it myself. I previously wrote dumping code
> integrated with my dpdk media which was able to dump around 0.5 Gbits/s (1
> big rte ring and 2 cores, not much optimized) then found out that pdump
> framework does a similar kind of thing but just with a secondary process
> with interception in rx/tx. But I need to modify to scale it, That is why I
> was wondering whether there is already a project that aims to dump the
> highest rate possible in dpdk port, otherwise, I will start modifying it. I
> haven't looked into "dpdkcap" code but it says that it aims to dump around
> 10Gbit/s if resources are available. Has anyone used or tested this project
> or tried to modify pdump code to scale?

The things that could speed up dpdk-dumpcap are:

1. Use Linux async io via ioring. But creates work around supporting
   older distros. I would not make it an option, if ioring works it should
   be used. Might be easier not that RHEL/CentOS 7 is end of life and does
   not need to be supported.

2. Get rid of copy in pdump side by using ref counts. But this exposes
   potential issues with drivers and applications that don't handle
   mbufs with refcount > 1.  It means if refcount > 1 then the application
   can not overwrite the buffer.  On Tx side, that means handling vlan
   gets more complicated.  On Rx side, it needs to be an option; and most
   applications (especially 3rd party) can't handle refcounts.

3. Get rid of callback and just put mbuf into ring directly.
   Indirect calls slow things down and introduces bugs when secondary
   is doing rx/tx.

4. Have dumpcap use multithreads (one per queue) when doing ring -> write.

These are in order of complexity/performance gain.

I haven't done them because don't work full time on this. And it would
require lots of testing effort as wel.


More information about the dev mailing list