[dpdk-dev] [PATCH v4 2/2] lib/eal: add temporal store memcpy support for AMD platform
Van Haaren, Harry
harry.van.haaren at intel.com
Wed Oct 27 14:15:36 CEST 2021
> -----Original Message-----
> From: Mattias Rönnblom <mattias.ronnblom at ericsson.com>
> Sent: Wednesday, October 27, 2021 12:42 PM
> To: Van Haaren, Harry <harry.van.haaren at intel.com>; Thomas Monjalon
> <thomas at monjalon.net>; Aman Kumar <aman.kumar at vvdntech.in>
> Cc: dev at dpdk.org; viacheslavo at nvidia.com; Burakov, Anatoly
> <anatoly.burakov at intel.com>; Song, Keesang <Keesang.Song at amd.com>;
> jerinjacobk at gmail.com; Ananyev, Konstantin <konstantin.ananyev at intel.com>;
> Richardson, Bruce <bruce.richardson at intel.com>;
> honnappa.nagarahalli at arm.com; Ruifeng Wang <ruifeng.wang at arm.com>;
> David Christensen <drc at linux.vnet.ibm.com>; david.marchand at redhat.com;
> stephen at networkplumber.org
> Subject: Re: [dpdk-dev] [PATCH v4 2/2] lib/eal: add temporal store memcpy
> support for AMD platform
>
> On 2021-10-27 13:03, Van Haaren, Harry wrote:
> >> -----Original Message-----
<snip>
Hi Mattias,
> > 6) What is the use-case for this? When would a user *want* to use this instead
> of rte_memcpy()?
> > If the data being loaded is relevant to datapath/packets, presumably other
> packets might require the
> > loaded data, so temporal (normal) loads should be used to cache the source
> data?
>
>
> I'm not sure if your first question is rhetorical or not, but a memcpy()
> in a NT variant is certainly useful. One use case for a memcpy() with
> temporal loads and non-temporal stores is if you need to archive packet
> payload for (distant, potential) future use, and want to avoid causing
> unnecessary LLC evictions while doing so.
Yes I agree that there are certainly benefits in using cache-locality hints.
There is an open question around if the src or dst or both are non-temporal.
In the implementation of this patch, the NT/T type of store is reversed from your use-case:
1) Loads are NT (so loaded data is not cached for future packets)
2) Stores are T (so copied/dst data is now resident in L1/L2)
In theory there might even be valid uses for this type of memcpy where loaded
data is not needed again soon and stored data is referenced again soon,
although I cannot think of any here while typing this mail..
I think some use-case examples, and clear documentation on when/how to choose
between rte_memcpy() or any (potential future) rte_memcpy_nt() variants is required
to progress this patch.
Assuming a strong use-case exists, and it can be clearly indicators to users of DPDK APIs which
rte_memcpy() to use, we can look at technical details around enabling the implementation.
-Harry
<snip remaining points>
More information about the dev
mailing list