[dpdk-users] Query on handling packets

Kyle Larose eomereadig at gmail.com
Sat Nov 17 23:05:11 CET 2018


On Sat, Nov 17, 2018 at 5:22 AM Harsh Patel <thadodaharsh10 at gmail.com> wrote:
>
> Hello,
> Thanks a lot for going through the code and providing us with so much
> information.
> We removed all the memcpy/malloc from the data path as you suggested and
...
> After removing this, we are able to see a performance gain but not as good
> as raw socket.
>

You're using an unordered_map to map your buffer pointers back to the
mbufs. While it may not do a memcpy all the time, It will likely end
up doing a malloc arbitrarily when you insert or remove entries from
the map. If it needs to resize the table, it'll be even worse. You may
want to consider using librte_hash:
https://doc.dpdk.org/api/rte__hash_8h.html instead. Or, even better,
see if you can design the system to avoid needing to do a lookup like
this. Can you return a handle with the mbuf pointer and the data
together?

You're also using floating point math where it's unnecessary (the
timing check). Just multiply the numerator by 1000000 prior to doing
the division. I doubt you'll overflow a uint64_t with that. It's not
as efficient as integer math, though I'm not sure offhand it'd cause a
major perf problem.

One final thing: using a raw socket, the kernel will take over
transmitting and receiving to the NIC itself. that means it is free to
use multiple CPUs for the rx and tx. I notice that you only have one
rx/tx queue, meaning at most one CPU can send and receive packets.
When running your performance test with the raw socket, you may want
to see how busy the system is doing packet sends and receives. Is it
using more than one CPU's worth of processing? Is it using less, but
when combined with your main application's usage, the overall system
is still using more than one?


More information about the users mailing list