[PATCH v1 2/3] net/af_packet: RX/TX rte_memcpy, bulk free, prefetch
Stephen Hemminger
stephen at networkplumber.org
Tue Jan 27 19:54:40 CET 2026
On Tue, 27 Jan 2026 10:13:54 -0800
scott.k.mitch1 at gmail.com wrote:
> From: Scott Mitchell <scott.k.mitch1 at gmail.com>
>
> - Add rte_prefetch0() to prefetch next frame/mbuf while processing
> current packet, reducing cache miss latency
Makes sense, if you really want to dive deeper there are more
unrolled loops patterns possible; there was a multi-step unrolled
loop pattern that fd.io does. The reason is that the first pre-fetch
is usually useless and doesn't help but skipping ahead farther
helps.
> - Replace memcpy() with rte_memcpy() for optimized copy operations
There is no good reason that rte_memcpy() should be faster than memcpy().
There were some cases observed with virtio but my hunch is that this is
because the two routines are making different alignment assumptions.
> - Use rte_pktmbuf_free_bulk() in TX path instead of individual
> rte_pktmbuf_free() calls for better batch efficiency
Makes sense.
> - Add unlikely() hints for error paths (oversized packets, VLAN
> insertion failures, sendto errors) to optimize branch prediction
Also makes sense.
> - Remove unnecessary early nb_pkts == 0 when loop handles this
> and app may never call with 0 frames.
Yes calling with nb_pkts == 0 on tx/rx burst only needs to work
does not need short circuit.
> Signed-off-by: Scott Mitchell <scott.k.mitch1 at gmail.com>
More information about the dev
mailing list