[dpdk-dev] [PATCH] ring: guarantee ordering of cons/prod loading when doing enqueue/dequeue

Jia He hejianet at gmail.com
Thu Oct 26 04:27:01 CEST 2017


Hi Jerin


On 10/25/2017 9:26 PM, Jerin Jacob Wrote:
> -----Original Message-----
>> Date: Tue, 24 Oct 2017 10:04:26 +0800
>> From: Jia He <hejianet at gmail.com>
>> To: Jerin Jacob <jerin.jacob at caviumnetworks.com>
>> Cc: "Ananyev, Konstantin" <konstantin.ananyev at intel.com>, "Zhao, Bing"
>>   <ilovethull at 163.com>, Olivier MATZ <olivier.matz at 6wind.com>,
>>   "dev at dpdk.org" <dev at dpdk.org>, "jia.he at hxt-semitech.com"
>>   <jia.he at hxt-semitech.com>, "jie2.liu at hxt-semitech.com"
>>   <jie2.liu at hxt-semitech.com>, "bing.zhao at hxt-semitech.com"
>>   <bing.zhao at hxt-semitech.com>, "Richardson, Bruce"
>>   <bruce.richardson at intel.com>
>> Subject: Re: [dpdk-dev] [PATCH] ring: guarantee ordering of cons/prod
>>   loading when doing enqueue/dequeue
>> User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101
>>   Thunderbird/52.4.0
>>
>> Hi Jerin
> Hi Jia,
>
>
>>> example:
>>> ./build/app/test -c 0xff -n 4
>>>>> ring_perf_autotest
>> Seem in our arm64 server, the ring_perf_autotest will be finished in a few
>> seconds:
> Yes. It just need a few seconds.
>
>> Anything wrong about configuration or environment setup?
> By default, arm64+dpdk will be using el0 counter to measure the cycles. I
> think, in your SoC, it will be running at 50MHz or 100MHz.So, You can
> follow the below scheme to get accurate cycle measurement scheme:
>
> See: http://dpdk.org/doc/guides/prog_guide/profile_app.html
> check: 44.2.2. High-resolution cycle counter
Thank you for the suggestions.
But I tried your provided ko module to enable the accurate cycle 
measurement in user space, the
test result is as below:

root at nfv-demo01:~/dpdk/build/build/test/test# lsmod |grep pmu
pmu_el0_cycle_counter   262144  0
[old codes, without any patches]
============================================
RTE>>ring_perf_autotest
### Testing single element and burst enq/deq ###
SP/SC single enq/dequeue: 0
MP/MC single enq/dequeue: 0
SP/SC burst enq/dequeue (size: 8): 0
MP/MC burst enq/dequeue (size: 8): 0
SP/SC burst enq/dequeue (size: 32): 0
MP/MC burst enq/dequeue (size: 32): 0

### Testing empty dequeue ###
SC empty dequeue: 0.00
MC empty dequeue: 0.00

### Testing using a single lcore ###
SP/SC bulk enq/dequeue (size: 8): 0.00
MP/MC bulk enq/dequeue (size: 8): 0.00
SP/SC bulk enq/dequeue (size: 32): 0.00
MP/MC bulk enq/dequeue (size: 32): 0.00

### Testing using two hyperthreads ###
SP/SC bulk enq/dequeue (size: 8): 0.00
MP/MC bulk enq/dequeue (size: 8): 0.00
SP/SC bulk enq/dequeue (size: 32): 0.00
MP/MC bulk enq/dequeue (size: 32): 0.00
Test OK

[with full rte_smp_rmb barrier patch]
======================================
RTE>>ring_perf_autotest
### Testing single element and burst enq/deq ###
SP/SC single enq/dequeue: 0
MP/MC single enq/dequeue: 0
SP/SC burst enq/dequeue (size: 8): 0
MP/MC burst enq/dequeue (size: 8): 0
SP/SC burst enq/dequeue (size: 32): 0
MP/MC burst enq/dequeue (size: 32): 0

### Testing empty dequeue ###
SC empty dequeue: 0.00
MC empty dequeue: 0.00

### Testing using a single lcore ###
SP/SC bulk enq/dequeue (size: 8): 0.00
MP/MC bulk enq/dequeue (size: 8): 0.00
SP/SC bulk enq/dequeue (size: 32): 0.00
MP/MC bulk enq/dequeue (size: 32): 0.00

### Testing using two hyperthreads ###
SP/SC bulk enq/dequeue (size: 8): 0.00
MP/MC bulk enq/dequeue (size: 8): 0.00
SP/SC bulk enq/dequeue (size: 32): 0.00
MP/MC bulk enq/dequeue (size: 32): 0.00
Test OK
RTE>>

No difference,all time is 0 ?

If I rmmod pmu_el0_cycle_counter and revise the ./build/.config to 
comment the config line
#CONFIG_RTE_ARM_EAL_RDTSC_USE_PMU=y

Then the time is bigger than 0

>> root at ubuntu:/home/hj/dpdk/build/build/test/test# ./test -c 0xff -n 4
>> EAL: Detected 44 lcore(s)
>> EAL: Probing VFIO support...
>> APP: HPET is not enabled, using TSC as default timer
>> RTE>>per_lcore_autotest
>> RTE>>ring_perf_autotest
>> ### Testing single element and burst enq/deq ###
>> SP/SC single enq/dequeue: 0
>> MP/MC single enq/dequeue: 2
>> SP/SC burst enq/dequeue (size: 8): 0
> If you follow the above link, The value '0' will be replaced with more meaning full data.
>
>> MP/MC burst enq/dequeue (size: 8): 0
>> SP/SC burst enq/dequeue (size: 32): 0
>> MP/MC burst enq/dequeue (size: 32): 0
>>
>> ### Testing empty dequeue ###
>> SC empty dequeue: 0.02
>> MC empty dequeue: 0.04
>>
>> ### Testing using a single lcore ###
>> SP/SC bulk enq/dequeue (size: 8): 0.12
>> MP/MC bulk enq/dequeue (size: 8): 0.31
>> SP/SC bulk enq/dequeue (size: 32): 0.05
>> MP/MC bulk enq/dequeue (size: 32): 0.09
>>
>> ### Testing using two hyperthreads ###
>> SP/SC bulk enq/dequeue (size: 8): 0.12
>> MP/MC bulk enq/dequeue (size: 8): 0.39
>> SP/SC bulk enq/dequeue (size: 32): 0.04
>> MP/MC bulk enq/dequeue (size: 32): 0.12
>>
>> ### Testing using two physical cores ###
>> SP/SC bulk enq/dequeue (size: 8): 0.37
>> MP/MC bulk enq/dequeue (size: 8): 0.92
>> SP/SC bulk enq/dequeue (size: 32): 0.12
>> MP/MC bulk enq/dequeue (size: 32): 0.26
>> Test OK
>> RTE>>
>>
>> Cheers,
>> Jia
>>> By default, arm64+dpdk will be using el0 counter to measure the cycles. I
>>> think, in your SoC, it will be running at 50MHz or 100MHz.So, You can
>>> follow the below scheme to get accurate cycle measurement scheme:
>>>
>>> See: http://dpdk.org/doc/guides/prog_guide/profile_app.html
>>> check: 44.2.2. High-resolution cycle counter

-- 
Cheers,
Jia



More information about the dev mailing list