[RFC v2] non-temporal memcpy
Honnappa Nagarahalli
Honnappa.Nagarahalli at arm.com
Wed Jul 27 19:37:50 CEST 2022
<snip>
>
> > From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli at arm.com]
> > Sent: Monday, 25 July 2022 03.18
> >
>
> [...]
>
> > > Yes, x86 needs 16B alignment for NT load/stores But that's supposed
> > to be arch
> > > specific limitation, that we probably want to hide, no?
>
> Correct. However, optional hints for optimization purposes will be available.
> And it is up to the architecture specific implementation to make the best use
> of these hints, or just ignore them.
>
> > > Inside the function can check alignment of both src and dst and
> > decide should it
> > > use NT load/store instructions or just do normal copy.
> > IMO, the normal copy should not be done by this API under any
> > conditions. Why not let the application call memcpy/rte_memcpy when
> > the NT copy is not applicable? It helps the programmer to understand
> > and debug the issues much easier.
>
> Yes, the programmer must choose between normal memcpy() and non-
> temporal rte_memcpy_nt(). I am offering new functions, not modifying
> memcpy() or rte_memcpy().
>
> And rte_memcpy_nt() will silently fall back to normal memcpy() if non-
> temporal copying is unavailable, e.g. on POWER and RISC-V architectures,
> which don't have NT load/store instructions.
I am talking about a scenario where the application is being ported between architectures. Not everyone knows about the capabilities of the architecture. It is better to indicate upfront (ex: compilation failures) that a certain feature is not supported on the target architecture rather than the user having to discover through painful debugging.
More information about the dev
mailing list