[dpdk-dev] [PATCH] ring: guarantee ordering of cons/prod loading when doing enqueue/dequeue
jerin.jacob at caviumnetworks.com
Fri Nov 3 13:47:41 CET 2017
> Date: Fri, 3 Nov 2017 10:55:40 +0800
> From: Jia He <hejianet at gmail.com>
> To: Jerin Jacob <jerin.jacob at caviumnetworks.com>
> Cc: "Ananyev, Konstantin" <konstantin.ananyev at intel.com>, "Zhao, Bing"
> <ilovethull at 163.com>, Olivier MATZ <olivier.matz at 6wind.com>,
> "dev at dpdk.org" <dev at dpdk.org>, "jia.he at hxt-semitech.com"
> <jia.he at hxt-semitech.com>, "jie2.liu at hxt-semitech.com"
> <jie2.liu at hxt-semitech.com>, "bing.zhao at hxt-semitech.com"
> <bing.zhao at hxt-semitech.com>, "Richardson, Bruce"
> <bruce.richardson at intel.com>, jianbo.liu at arm.com, hemant.agrawal at nxp.com
> Subject: Re: [dpdk-dev] [PATCH] ring: guarantee ordering of cons/prod
> loading when doing enqueue/dequeue
> User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101
> Hi Jerin
> On 11/2/2017 4:57 PM, Jia He Wrote:
> > Hi， Jerin
> > please see my performance test below
> > On 11/2/2017 3:04 AM, Jerin Jacob Wrote:
> > [...]
> > > Should it be like instead?
> > >
> > > +#else
> > > + *old_head = __atomic_load_n(&r->cons.head, __ATOMIC_ACQUIRE);
> > > + const uint32_t prod_tail = __atomic_load_n(&r->prod.tail,
> > > __ATOMIC_ACQUIRE);
> > > It would be nice to see how much overhead it gives.ie back to back
> > > __ATOMIC_ACQUIRE.
> > I can NOT test ring_perf_autotest in our server because of the something
> > wrong in PMU counter.
> > All the return value of rte_rdtsc is 0 with and without your provided ko
> > module. I am still
> > investigating the reason.
> Hi Jerin
> As for the root cause of rte_rdtsc issue, it might be due to the pmu counter
> frequency is too low
> in our arm64 server("Amberwing" from qualcom)
> [586990.057779] arch_timer_get_cntfrq()=20000000
> Only 20MHz instead of 100M/200MHz, and CNTFRQ_EL0 is not even writable in
> kernel space.
May not be true, as I guess, linux 'perf' write those register in kernel
space. Another option could be write from ATF/Secure boot loader if that is the case.
> Maybe the code in ring_perf_autotest needs to be changed?
Increase the "iterations" to measure @ 200MHz.
> printf("SC empty dequeue: %.2F\n",
> (double)(sc_end-sc_start) / iterations);
> printf("MC empty dequeue: %.2F\n",
> (double)(mc_end-mc_start) / iterations);
> Otherwise it is always 0 if the time difference divides by iterations.
More information about the dev