[dpdk-dev] [PATCH v3 3/3] lib/eal: add temporal store memcpy support on AMD platform
Thomas Monjalon
thomas at monjalon.net
Wed Oct 27 09:59:59 CEST 2021
27/10/2021 08:34, Aman Kumar:
> On Tue, Oct 26, 2021 at 9:44 PM Thomas Monjalon <thomas at monjalon.net> wrote:
>
> > 26/10/2021 17:56, Aman Kumar:
> > > This patch provides a rte_memcpy* call with temporal stores.
> > > Use -Dcpu_instruction_set=znverX with build to enable this API.
> > >
> > > Signed-off-by: Aman Kumar <aman.kumar at vvdntech.in>
> > > ---
> > > config/x86/meson.build | 2 +
> > > lib/eal/x86/include/rte_memcpy.h | 114 +++++++++++++++++++++++++++++++
> >
> > It looks better as C code.
> > Do you achieve the same performance as the asm version?
> >
>
> In a few corner cases assembly performed better, but overall we have very
> similar perf observations.
>
> > > +#if defined RTE_MEMCPY_AMDEPYC
> > [...]
> > > +static __rte_always_inline void *
> > > +rte_memcpy_aligned_tstore16_generic(void *dst, void *src, int len)
> >
> > So to be clear, an application will benefit of this optimization if
> > 1/ DPDK is specifically compiled for AMD
> > 2/ the application is compiled with above DPDK build (because of
> > inlinining)
> >
> > I guess there is no good way to benefit from the optimization
> > without specific compilation, because of inlining constraint.
> > Another design, with less constraint but less performance,
> > would be to have a function pointer assigned at runtime based on the CPU.
> >
>
> You're right. We need to build DPDK and apps with this flag enabled to get
> the benefit.
So the x86 packages, as in Linux distributions, won't have this optimization.
> In future versions, we will try to adapt in a more dynamic way. Thanks.
No, I was trying to say that unfortunately there is probably no solution.
More information about the dev
mailing list