[dpdk-dev] rte_memcpy

Manish Sharma manish.sharmajee75 at gmail.com
Mon May 24 10:49:21 CEST 2021


I am looking at the source for rte_memcpy (this is a discussion only for
x64-64)

For one of the cases, when aligned correctly, it uses

/**
 * Copy 64 bytes from one location to another,
 * locations should not overlap.
 */
static __rte_always_inline void
rte_mov64(uint8_t *dst, const uint8_t *src)
{
        __m512i zmm0;

        zmm0 = _mm512_loadu_si512((const void *)src);
        _mm512_storeu_si512((void *)dst, zmm0);
}

I had some questions about this:

1. What I dont see is any use of x86 fence(rmb,wmb) instructions. Is that
not required in this case and if not, why isnt it needed?

2. Are the  mm512_loadu_si512 and  _mm512_storeu_si512 non temporal?

3. Why isn't the code using  stream variants, _mm512_stream_load_si512 and
friends?

4. Do the _mm512_stream_load_si512 need fence instructions?

TIA,
Manish


More information about the dev mailing list