[RFC v2] non-temporal memcpy

Mattias Rönnblom hofors at lysator.liu.se
Thu Aug 11 13:53:02 CEST 2022


On 2022-08-10 23:20, Honnappa Nagarahalli wrote:
> <snip>
> 
>>
>>> From: Mattias Rönnblom [mailto:hofors at lysator.liu.se]
>>> Sent: Wednesday, 10 August 2022 13.56
>>>
>>> On 2022-08-09 17:26, Stephen Hemminger wrote:
>>
>> [...]
>>
>>>
>>> Alignment seems like a non-issue to me. A NT-store memcpy() can be
>>> made free of alignment requirements, incurring only a very slight cost
>>> for the always-aligned case (who has their data always 16-byte aligned
>>> anyways?).
>>>
>>> The memory barrier required on x86 seems like a bigger issue.
>>>
>>>> Maybe rte_non_cache_copy()?
>>>>
>>>
>>> rte_memcpy_nt_weakly_ordered(), or rte_memcpy_nt_weak(). And a
>>> rte_memcpy_nt() with the sfence is place, which the user hopefully
>>> will find first? I don't know. I would prefer not having the weak
>>> variant at all.
> I think providing weakly ordered version is required to offset the cost of the barriers. One might be able to copy multiple packets and then issue a barrier.
> 

On what architecture?

I assumed that only x86 had the peculiar property of having different 
memory models for regular and NT load/stores.

>>>
>>> Accepting weak memory ordering (i.e., no sfence) could also be one of
>>> the flags, assuming rte_memcpy_nt() would have a flags parameter.
>>> Default is safe (=memcpy() semantics), but potentially slower.
>>
>> Excellent idea!
>>
>>>
>>>> Want to avoid the naive user just doing s/memcpy/rte_memcpy_nt/ and
>>> expect
>>>> everything to work.
> 


More information about the dev mailing list