[PATCH 3/7] net/bonding: change mbuf pool and ring allocation
Min Hu (Connor)
humin29 at huawei.com
Tue Dec 21 03:01:32 CET 2021
Hi, Sanford,
在 2021/12/21 0:47, Sanford, Robert 写道:
> Hello Connor,
>
> Please see responses inline.
>
> On 12/17/21, 10:44 PM, "Min Hu (Connor)" <humin29 at huawei.com> wrote:
>
>>> When the number of used tx-descs (0..255) + number of mbufs in the
>>> cache (0..47) reaches 257, then allocation fails.
>>>
>>> If I understand the LACP tx-burst code correctly, it would be
>>> worse if nb_tx_queues > 1, because (assuming multiple tx-cores)
>>> any queue/lcore could xmit an LACPDU. Thus, up to nb_tx_queues *
>>> 47 mbufs could be cached, and not accessible from tx_machine().
>>>
>>> You would not see this problem if the app xmits other (non-LACP)
>>> mbufs on a regular basis, to expedite the clean-up of tx-descs
>>> including LACPDU mbufs (unless nb_tx_queues tx-core caches
>>> could hold all LACPDU mbufs).
>>>
>> I think, we could not see this problem only because the mempool can
>> offer much more mbufs than cache size on no-LACP circumstance.
>>
>>> If we make mempool's cache size 0, then allocation will not fail.
>> How about enlarge the size of mempool, i.e., up to 4096 ? I think
>> it can also avoid this bug.
>>>
>>> A mempool cache for LACPDUs does not offer much additional speed:
>>> during alloc, the intr thread does not have default mempool caches
>> Why? as I know, all the core has its own default mempool caches ?
>
> These are private mbuf pools that we use *only* for LACPDUs, *one*
> mbuf per second, at most. (When LACP link peer selects long timeouts,
> we get/put one mbuf every 30 seconds.)
>
> There is *NO* benefit for the consumer thread (interrupt thread
> executing tx_machine()) to have caches on per-slave LACPDU pools.
> The interrupt thread is a control thread, i.e., a non-EAL thread.
> Its lcore_id is LCORE_ID_ANY, so it has no "default cache" in any
> mempool.
Well, sorry, I forgot that interrupt thread is non-EAL thread.
>
> There is little or no benefit for active data-plane threads to have
> caches on per-slave LACPDU pools, because on each pool, the producer
> thread puts back, at most, one mbuf per second. There is not much
> contention with the consumer (interrupt thread).
>
> I contend that caches are not necessary for these private LACPDU
I agree with you.
> mbuf pools, but just waste RAM and CPU-cache. If we still insist
> on creating them *with* caches, then we should add at least
> (cache-size x 1.5 x nb-tx-queues) mbufs per pool.
>
>
>>> Q: Why reserve one additional slot in the rx and tx rings?
>>>
>>> A: rte_ring_create() requires the ring size N, to be a power of 2,
>>> but it can only store N-1 items. Thus, if we want to store X items,
>> Hi, Robert, could you describe it for me?
>> I cannot understand why it
>> "only store N -1 items". I check the source code, It writes:
>> "The real usable ring size is *count-1* instead of *count* to
>> differentiate a free ring from an empty ring."
>> But I still can not get what it wrote.
>
> I believe there is a mistake in the ring comments (in 3 places).
> It would be better if they replace "free" with "full":
> "... to differentiate a *full* ring from an empty ring."
>
Well, I still can not understand it. I think the ring size is N, it
should store N items, why "N - 1" items.?
Hope for your description, thanks.
>
>>> we need to ask for (at least) X+1. Original code fails when the real
>>> desired size is a power of 2, because in such a case, align32pow2
>>> does not round up.
>>>
>>> For example, say we want a ring to hold 4:
>>>
>>> rte_ring_create(... rte_align32pow2(4) ...)
>>>
>>> rte_align32pow2(4) returns 4, and we end up with a ring that only
>>> stores 3 items.
>>>
>>> rte_ring_create(... rte_align32pow2(4+1) ...)
>>>
>>> rte_align32pow2(5) returns 8, and we end up with a ring that
>>> stores up to 7 items, more than we need, but acceptable.
>> To fix the bug, how about just setting the flags "RING_F_EXACT_SZ"
>
> Yes, this is a good idea. I will look for examples or test code that
> use this flag.
Yes, if fixed, ILGM.
>
> --
> Regards,
> Robert Sanford
>
>
More information about the dev
mailing list