[dpdk-dev] [PATCH] ring: guarantee ordering of cons/prod loading when doing enqueue/dequeue

Jerin Jacob jerin.jacob at caviumnetworks.com
Fri Nov 3 13:47:41 CET 2017


-----Original Message-----
> Date: Fri, 3 Nov 2017 10:55:40 +0800
> From: Jia He <hejianet at gmail.com>
> To: Jerin Jacob <jerin.jacob at caviumnetworks.com>
> Cc: "Ananyev, Konstantin" <konstantin.ananyev at intel.com>, "Zhao, Bing"
>  <ilovethull at 163.com>, Olivier MATZ <olivier.matz at 6wind.com>,
>  "dev at dpdk.org" <dev at dpdk.org>, "jia.he at hxt-semitech.com"
>  <jia.he at hxt-semitech.com>, "jie2.liu at hxt-semitech.com"
>  <jie2.liu at hxt-semitech.com>, "bing.zhao at hxt-semitech.com"
>  <bing.zhao at hxt-semitech.com>, "Richardson, Bruce"
>  <bruce.richardson at intel.com>, jianbo.liu at arm.com, hemant.agrawal at nxp.com
> Subject: Re: [dpdk-dev] [PATCH] ring: guarantee ordering of cons/prod
>  loading when doing enqueue/dequeue
> User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101
>  Thunderbird/52.4.0
> 
> Hi Jerin
> 
> 
> On 11/2/2017 4:57 PM, Jia He Wrote:
> > 
> > Hi, Jerin
> > please see my performance test below
> > On 11/2/2017 3:04 AM, Jerin Jacob Wrote:
> > [...]
> > > Should it be like instead?
> > > 
> > > +#else
> > > +        *old_head = __atomic_load_n(&r->cons.head, __ATOMIC_ACQUIRE);
> > > +        const uint32_t prod_tail = __atomic_load_n(&r->prod.tail,
> > > __ATOMIC_ACQUIRE);
> > > It would be nice to see how much overhead it gives.ie back to back
> > > __ATOMIC_ACQUIRE.
> > I can NOT test ring_perf_autotest in our server because of the something
> > wrong in PMU counter.
> > All the return value of rte_rdtsc is 0 with and without your provided ko
> > module. I am still
> > investigating the reason.
> > 
> 
> Hi Jerin
> 
> As for the root cause of rte_rdtsc issue, it might be due to the pmu counter
> frequency is too low
> 
> in our arm64 server("Amberwing" from qualcom)
> 
> [586990.057779] arch_timer_get_cntfrq()=20000000
> 
> Only 20MHz instead of 100M/200MHz, and CNTFRQ_EL0 is not even writable in
> kernel space.

May not be true, as I guess, linux 'perf' write those register in kernel
space. Another option could be write from ATF/Secure boot loader if that is the case.

> 
> Maybe the code in ring_perf_autotest needs to be changed?

Increase the "iterations" to measure @ 200MHz.

> 
> e.g.
> 
>     printf("SC empty dequeue: %.2F\n",
>             (double)(sc_end-sc_start) / iterations);
>     printf("MC empty dequeue: %.2F\n",
>             (double)(mc_end-mc_start) / iterations);
> 
> Otherwise it is always 0 if the time difference divides by iterations.
> 
> 
> -- 
> Cheers,
> Jia
> 


More information about the dev mailing list