[dpdk-dev] [PATCH] eal: add new prefetch0_write variant

Van Haaren, Harry harry.van.haaren at intel.com
Mon Sep 14 17:10:16 CEST 2020


> -----Original Message-----
> From: Pavan Nikhilesh Bhagavatula <pbhagavatula at marvell.com>
> Sent: Monday, September 14, 2020 11:39 AM
> To: Van Haaren, Harry <harry.van.haaren at intel.com>; dev at dpdk.org
> Subject: RE: [dpdk-dev] [PATCH] eal: add new prefetch0_write variant
> 
> >> >This commit adds a new rte_prefetch0_write() variant, suggests to
> >the
> >> >compiler to use a prefetch instruction with intention to write. As a
> >> >compiler builtin, the compiler can choose based on compilation
> >target
> >> >what the best implementation for this instruction is.
> >>
> >> Why not have the other variants too i.e. l2/l3/temporal store
> >prefetches too?
> >
> >Hi Pavan,
> >
> Hi Harry,
> (LTNS)
> 
> >Are there architectures that actually implement those? Usually for a WB
> >mem store to complete,
> >the data must be present in L1 cache (on x86 at least), and that's what
> >the patch below with write0 achieves.
> 
> ARM64 does supports all modes of store prefetch
> "
> <type> is one of:
> PLD Prefetch for load, encoded in the "Rt<4:3>" field as 0b00.
> PLI Preload instructions, encoded in the "Rt<4:3>" field as 0b01.
> PST Prefetch for store, encoded in the "Rt<4:3>" field as 0b10.
> <target> is one of:
> L1 Level 1 cache, encoded in the "Rt<2:1>" field as 0b00.
> L2 Level 2 cache, encoded in the "Rt<2:1>" field as 0b01.
> L3 Level 3 cache, encoded in the "Rt<2:1>" field as 0b10.
> <policy> is one of:
> KEEP Retained or temporal prefetch, allocated in the cache normally. Encoded in
> the "Rt<0>"
> field as 0.
> STRM Streaming or non-temporal prefetch, for data that is used only once. Encoded
> in the
> "Rt<0>" field as 1.
> For more information on these prefetch
> "
> 
> >
> >I'm against adding all the variants "just in case", it leads to API bloat,
> >and increases
> >cognitive load on the programmer. My expectation is that in 99% of
> >usage the prefetch
> >write instruction should target L1.
> >
> 
> There is a use case when cache mode is write through and application is
> pipelining work across cores sharing same L2 cluster.

OK - v2 sent: http://patches.dpdk.org/patch/77632/

APIs matching the existing prefetch APIs:
rte_prefetch0_write() L1 and all below
rte_prefetch1_write() L2 and all below
rte_prefetch2_write() L3

Cheers, -Harry



More information about the dev mailing list