[dpdk-dev] [PATCH 15/15] mbuf: move pool pointer in hotterfirst half

Thomas Monjalon thomas at monjalon.net
Sat Oct 31 21:40:46 CET 2020


Thanks for the thoughts Morten.
I believe we need benchmarks of different scenarios with different drivers.


31/10/2020 19:20, Morten Brørup:
> Thomas,
> 
> Adding my thoughts to the already detailed feedback on this important patch...
> 
> The first cache line is not inherently "hotter" than the second. The hotness depends on their usage.
> 
> The mbuf cacheline1 marker has the following comment:
> /* second cache line - fields only used in slow path or on TX */
> 
> In other words, the second cache line is intended not to be touched in fast path RX.
> 
> I do not think this is true anymore. Not even with simple non-scattered RX. And regression testing probably didn't catch this, because the tests perform TX after RX, so the cache miss moved from TX to RX and became a cache hit in TX instead. (I may be wrong about this claim, but it's not important for the discussion.)
> 
> I think the right question for this patch is: Can we achieve this - not using the second cache line for fast path RX - again by putting the right fields in the first cache line?
> 
> Probably not in all cases, but perhaps for some...
> 
> Consider the application scenarios.
> 
> When a packet is received, one of three things happens to it:
> 1. It is immediately transmitted on one or more ports.
> 2. It is immediately discarded, e.g. by a firewall rule.
> 3. It is put in some sort of queue, e.g. a ring for the next pipeline stage, or in a QoS queue.
> 
> 1. If the packet is immediately transmitted, the m->tx_offload field in the second cache line will be touched by the application and TX function anyway, so we don't need to optimize the mbuf layout for this scenario.
> 
> 2. The second scenario touches m->pool no matter how it is implemented. The application can avoid touching m->next by using rte_mbuf_raw_free(), knowing that the mbuf came directly from RX and thus no other fields have been touched. In this scenario, we want m->pool in the first cache line.
> 
> 3. Now, let's consider the third scenario, where RX is followed by enqueue into a ring. If the application does nothing but put the packet into a ring, we don't need to move anything into the first cache line. But applications usually does more... So it is application specific what would be good to move to the first cache line:
> 
> A. If the application does not use segmented mbufs, and performs analysis and preparation for transmission in the initial pipeline stages, and only the last pipeline stage performs TX, we could move m->tx_offload to the first cache line, which would keep the second cache line cold until the actual TX happens in the last pipeline stage - maybe even after the packet has waited in a QoS queue for a long time, and its cache lines have gone cold.
> 
> B. If the application uses segmented mbufs on RX, it might make sense moving m->next to the first cache line. (We don't use segmented mbufs, so I'm not sure about this.)
> 
> 
> However, reality perhaps beats theory:
> 
> Looking at the E1000 PMD, it seems like even its non-scattered RX function, eth_igb_recv_pkts(), sets m->next. If it only kept its own free pool pre-initialized instead... I haven't investigated other PMDs, except briefly looking at the mlx5 PMD, and it seems like it doesn't touch m->next in RX.
> 
> I haven't looked deeper into how m->pool is being used by RX in PMDs, but I suppose that it isn't touched in RX.
> 
> <rant on>
> If only we had a performance test where RX was not immediately followed by TX, but the packets were passed through a large queue in-between, so RX cache misses were not free of charge because they transform TX cache misses into cache hits instead...
> <rant off>
> 
> Whatever you choose, I am sure that most applications will find it more useful than the timestamp. :-)





More information about the dev mailing list