[dpdk-dev] [PATCH 15/15] mbuf: move pool pointer in hotterfirst half

Morten Brørup mb at smartsharesystems.com
Sat Oct 31 19:20:44 CET 2020


Thomas,

Adding my thoughts to the already detailed feedback on this important patch...

The first cache line is not inherently "hotter" than the second. The hotness depends on their usage.

The mbuf cacheline1 marker has the following comment:
/* second cache line - fields only used in slow path or on TX */

In other words, the second cache line is intended not to be touched in fast path RX.

I do not think this is true anymore. Not even with simple non-scattered RX. And regression testing probably didn't catch this, because the tests perform TX after RX, so the cache miss moved from TX to RX and became a cache hit in TX instead. (I may be wrong about this claim, but it's not important for the discussion.)

I think the right question for this patch is: Can we achieve this - not using the second cache line for fast path RX - again by putting the right fields in the first cache line?

Probably not in all cases, but perhaps for some...

Consider the application scenarios.

When a packet is received, one of three things happens to it:
1. It is immediately transmitted on one or more ports.
2. It is immediately discarded, e.g. by a firewall rule.
3. It is put in some sort of queue, e.g. a ring for the next pipeline stage, or in a QoS queue.

1. If the packet is immediately transmitted, the m->tx_offload field in the second cache line will be touched by the application and TX function anyway, so we don't need to optimize the mbuf layout for this scenario.

2. The second scenario touches m->pool no matter how it is implemented. The application can avoid touching m->next by using rte_mbuf_raw_free(), knowing that the mbuf came directly from RX and thus no other fields have been touched. In this scenario, we want m->pool in the first cache line.

3. Now, let's consider the third scenario, where RX is followed by enqueue into a ring. If the application does nothing but put the packet into a ring, we don't need to move anything into the first cache line. But applications usually does more... So it is application specific what would be good to move to the first cache line:

A. If the application does not use segmented mbufs, and performs analysis and preparation for transmission in the initial pipeline stages, and only the last pipeline stage performs TX, we could move m->tx_offload to the first cache line, which would keep the second cache line cold until the actual TX happens in the last pipeline stage - maybe even after the packet has waited in a QoS queue for a long time, and its cache lines have gone cold.

B. If the application uses segmented mbufs on RX, it might make sense moving m->next to the first cache line. (We don't use segmented mbufs, so I'm not sure about this.)


However, reality perhaps beats theory:

Looking at the E1000 PMD, it seems like even its non-scattered RX function, eth_igb_recv_pkts(), sets m->next. If it only kept its own free pool pre-initialized instead... I haven't investigated other PMDs, except briefly looking at the mlx5 PMD, and it seems like it doesn't touch m->next in RX.

I haven't looked deeper into how m->pool is being used by RX in PMDs, but I suppose that it isn't touched in RX.

<rant on>
If only we had a performance test where RX was not immediately followed by TX, but the packets were passed through a large queue in-between, so RX cache misses were not free of charge because they transform TX cache misses into cache hits instead...
<rant off>

Whatever you choose, I am sure that most applications will find it more useful than the timestamp. :-)


Med venlig hilsen / kind regards
- Morten Brørup



More information about the dev mailing list