[dpdk-dev] rte_memcpy
Manish Sharma
manish.sharmajee75 at gmail.com
Mon May 24 10:49:21 CEST 2021
I am looking at the source for rte_memcpy (this is a discussion only for
x64-64)
For one of the cases, when aligned correctly, it uses
/**
* Copy 64 bytes from one location to another,
* locations should not overlap.
*/
static __rte_always_inline void
rte_mov64(uint8_t *dst, const uint8_t *src)
{
__m512i zmm0;
zmm0 = _mm512_loadu_si512((const void *)src);
_mm512_storeu_si512((void *)dst, zmm0);
}
I had some questions about this:
1. What I dont see is any use of x86 fence(rmb,wmb) instructions. Is that
not required in this case and if not, why isnt it needed?
2. Are the mm512_loadu_si512 and _mm512_storeu_si512 non temporal?
3. Why isn't the code using stream variants, _mm512_stream_load_si512 and
friends?
4. Do the _mm512_stream_load_si512 need fence instructions?
TIA,
Manish
More information about the dev
mailing list