[dpdk-dev] rte_prefetch0() is effective?

Bruce Richardson bruce.richardson at intel.com
Wed Jan 13 12:34:33 CET 2016


On Thu, Dec 24, 2015 at 03:35:14PM +0900, Moon-Sang Lee wrote:
> I see codes as below in example directory, and I wonder it is effective.
> Coherent IO is adopted to modern architectures,
> so I think that DMA initiation by rte_eth_rx_burst() might already fulfills
> cache lines of RX buffers.
> Do I really need to call rte_prefetchX()?
> 
>             nb_rx = rte_eth_rx_burst(portid, queueid, pkts_burst,
> MAX_PKT_BURST);
>             ...
>             /* Prefetch and forward already prefetched packets */
>             for (j = 0; j < (nb_rx - PREFETCH_OFFSET); j++) {
>                 rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[
>                         j + PREFETCH_OFFSET], void *));
>                 l3fwd_simple_forward(pkts_burst[j], portid,
>                     qconf);
>             }
> 

Good question.
When the first example apps using this style of prefetch were originally written,
yes, there was a noticable performance increase achieved by using the prefetch.
Thereafter, I'm not sure that anyone has checked with each generation of
platforms whether the prefetches are still necessary and how much they help, but
I suspect that they still help a bit, and don't hurt performance.
It would be an interesting exercise to check whether the prefetch offsets used
in code like above can be adjusted to give better performance on our latest
supported platforms.

/Bruce


More information about the dev mailing list