allocating a mempool w/ rte_pktmbuf_pool_create()
Dmitry Kozlyuk
dmitry.kozliuk at gmail.com
Sun Jan 30 12:32:47 CET 2022
Hi,
2022-01-29 21:33 (UTC-0500), fwefew 4t4tg:
[...]
> > The other crucial insight is: so long as memory is allocated on the same
> > NUMA node as the RXQ/TXQ runs that ultimately uses it, there is only marginal
> > performance advantage to having per-core caching of mbufs in a mempool
> > as provided by the private_data_size formal argument in rte_mempool_create() here:
> >
> > https://doc.dpdk.org/api/rte__mempool_8h.html#a503f2f889043a48ca9995878846db2fd
> >
> > In fact the API doc should really point out the advantage; perhaps it
> > eliminates some cache sloshing to get the last few percent of performance.
Note: "cache sloshing", aka "false sharing", is not the case here.
There is a true, not false, concurrency for the mempool ring
in case multiple lcores use one mempool (see below why you may want this).
A colloquial term is "contention", per-lcore caching reduces it.
Later you are talking about the case when a mempool is created for each queue.
The potential issue with this approach is that one queue may quickly deplete
its mempool; say, if it does IPv4 reassembly and holds fragments for long.
To counter this, each queue mempool must be large, which is a memory waste.
This is why often one mempool is created for a set of queues
(processed on lcores from a single NUMA node at least).
If one queue consumes more mbufs then the others, it is not a problem anymore
as long as the mempool as a whole is not depleted.
Per-lcore caching is optimizing this case when many lcores access one mempool.
It may be less relevant for your case.
You can run "mempool_perf_autotest" command of app/test/dpdk-test binary
to see how the cache influences performance.
See also:
https://doc.dpdk.org/guides/prog_guide/mempool_lib.html#mempool-local-cache
[...]
> > Let's turn then to a larger issue: what happens if different RXQ/TXQs have
> > radically different needs?
> >
> > As the code above illustrates, one merely allocates a size appropriate to
> > an individual RXQ/TXQ by changing the count and size of mbufs ----
> > which is as simple as it can get.
Correct.
As explained above, it can be also one mempool per queue group.
What do you think is missing here for your use case?
More information about the users
mailing list