[dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb.

Wang Dong dong.wang.pro at hotmail.com
Tue May 12 17:23:19 CEST 2015



> Hi Dong,
>
>> -----Original Message-----
>> From: Wang Dong [mailto:dong.wang.pro at hotmail.com]
>> Sent: Saturday, May 09, 2015 11:24 AM
>> To: Ananyev, Konstantin; dev at dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb.
>>
>> Hi Konstantin,
>>
>>>
>>> Hi Dong,
>>>
>>>> -----Original Message-----
>>>> From: Wang Dong [mailto:dong.wang.pro at hotmail.com]
>>>> Sent: Thursday, May 07, 2015 4:28 PM
>>>> To: Ananyev, Konstantin; dev at dpdk.org
>>>> Subject: Re: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb.
>>>>
>>>> Hi Konstantin,
>>>>
>>>>> Hi Dong,
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of WangDong
>>>>>> Sent: Tuesday, May 05, 2015 4:38 PM
>>>>>> To: dev at dpdk.org
>>>>>> Subject: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb.
>>>>>>
>>>>>> The current implementation of rte_wmb/rte_rmb for x86 is using processor memory barrier. It's unnessary for IA processor,
>>>> compiler
>>>>>> memory barrier is enough.
>>>>>
>>>>> I wouldn't say they are 'unnecessary'.
>>>>> There are situations, even on IA, when you need _fence_ isntructions.
>>>>> So, please leave rte_*mb() macros unmodified.
>>>> OK, leave them unmodified, but I really can't find a situation to use
>>>> sfence and lfence instructions.
>>>
>>> For example:
>>> http://bartoszmilewski.com/2008/11/05/who-ordered-memory-fences-on-an-x86/
>>> http://dpdk.org/ml/archives/dev/2014-May/002613.html
>>>
>>>>
>>>>
>>>>> I still think that we need to create a new set of architecture dependent macros, as what discussed before.
>>>>> Probably by analogy with linux kernel rte_smp_*mb() is a good name for them.
>>>>> Though if you have some better name in mind, I am open to suggestions here.
>>>> What abount rte_dma_*mb()? I find dma_*mb() in linux-4.0.1, it looks good~~
>>>
>>> Hmm, but why _dma_?
>>> We need same thing for multi-core communication too.
>>> If rte_smp_ is not good enough, might be: rte_arch_?
>> I want these two macro only used in PMD, so I think _dma_ is better. The
>> memory barrier of processor-processor maybe more complex, and I'm not
>> familiar with it... Someone can add rte_smp_*mb for multi-core.
>
> Sorry, what you are talking about?
> At the end, it will use same instructions, whateve we'll name it: _dma_, _smp_, _arch_.
> Konstantin

Hi Konstantin,

In previous mail, I want to say, both rte_smp_*mb() and rte_dma_*mb() 
can be added, but the context of rte_smp_*mb() is different from 
rte_dma_*mb(), maybe rte_smp_*mb() is for thread and rte_dma_*mb() is 
for PMD. I'm not sure how to implement rte_smp_*mb(), hope it can be 
implemented by other developer. In Linux, I find it same as _dma_, if 
so, they will use same instructions.

linux-4.0.1/arch/x86/include/asm/barrier.h, line 27:
#ifdef CONFIG_X86_PPRO_FENCE
#define dma_rmb()       rmb()
#else
#define dma_rmb()       barrier()
#endif
#define dma_wmb()       barrier()

#ifdef CONFIG_SMP
#define smp_mb()        mb()
#define smp_rmb()       dma_rmb()
#define smp_wmb()       barrier()
#define set_mb(var, value) do { (void)xchg(&var, value); } while (0)
#else /* !SMP */
#define smp_mb()        barrier()
#define smp_rmb()       barrier()
#define smp_wmb()       barrier()
#define set_mb(var, value) do { var = value; barrier(); } while (0)
#endif /* SMP */

Dong
>
>>
>> I think _arch_ is means nothing here, because rte_*mb is already for
>> architectures that dpdk supported, they are redefined in these architecture.
>>
>>>
>>>>
>>>>>
>>>>>> But if dpdk runing on a AMD processor, maybe we should use processor memory barrier.
>>>>>
>>>>> As far as I remember, amd has the same memory ordering model.
>>>> It's too hard to find a AMD's software developer manual.....
>>>
>>> There for example:
>>> http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/24593_APM_v21.pdf
>>> ?
>> Search such document on AMD offical website for a long time, this manual
>> is what I want, thanks very much!!!
>>
>> Dong
>>
>>>
>>> Konstantin
>>>
>>>>
>>>> Dong
>>>>
>>>>> So, I don't think we need  #ifdef RTE_ARCH_X86_IA here.
>>>>>
>>>>> Konstantin
>>>>>
>>>>>> I add a macro to distinguish them, if we compile DPDK for IA processor, add the macro (RTE_ARCH_X86_IA) can improve
>>>> performance
>>>>>> with compiler memory barrier. Or we can add RTE_ARCH_X86_AMD for using processor memory barrier, in this case, if didn't
>> add
>>>> the
>>>>>> macro, the memory ordering will not be guaranteed. Which macro is better?
>>>>>> If this patch applied, the PMD's old implementation of compiler memory barrier (some volatile variable) can be fixed with
>>>> rte_rmb()
>>>>>> and rte_wmb() for any architecture.
>>>>>>
>>>>>> ---
>>>>>>     lib/librte_eal/common/include/arch/x86/rte_atomic.h | 10 ++++++++++
>>>>>>     1 file changed, 10 insertions(+)
>>>>>>
>>>>>> diff --git a/lib/librte_eal/common/include/arch/x86/rte_atomic.h b/lib/librte_eal/common/include/arch/x86/rte_atomic.h
>>>>>> index e93e8ee..52b1e81 100644
>>>>>> --- a/lib/librte_eal/common/include/arch/x86/rte_atomic.h
>>>>>> +++ b/lib/librte_eal/common/include/arch/x86/rte_atomic.h
>>>>>> @@ -49,10 +49,20 @@ extern "C" {
>>>>>>
>>>>>>     #define	rte_mb() _mm_mfence()
>>>>>>
>>>>>> +#ifdef RTE_ARCH_X86_IA
>>>>>> +
>>>>>> +#define rte_wmb() rte_compiler_barrier()
>>>>>> +
>>>>>> +#define rte_rmb() rte_compiler_barrier()
>>>>>> +
>>>>>> +#else
>>>>>> +
>>>>>>     #define	rte_wmb() _mm_sfence()
>>>>>>
>>>>>>     #define	rte_rmb() _mm_lfence()
>>>>>>
>>>>>> +#endif
>>>>>> +
>>>>>>     /*------------------------- 16 bit atomic operations -------------------------*/
>>>>>>
>>>>>>     #ifndef RTE_FORCE_INTRINSICS
>>>>>> --
>>>>>> 1.9.1
>>>>>


More information about the dev mailing list