[dpdk-dev] [PATCH v3 1/3] ring: read tail using atomic load

Honnappa Nagarahalli Honnappa.Nagarahalli at arm.com
Wed Oct 10 21:26:55 CEST 2018


> 
> Hi Jerin,
> 
> Following the guide to use the PMU counters(KO inserted and DPDK
> recompiled), the numbers increased 10+ folds(bigger numbers here mean
> more precise?), is this valid and expected?
This is correct, big numbers mean, more precise/granular results.

> No significant difference was seen.
This is what we are interested in. Do you have any before and after this change numbers?

> 
> gavin at net-arm-thunderx2:~/community/dpdk$ sudo ./test/test/test -l 16-
> 19,44-47,72-75,100-103 -n 4 --socket-mem=1024  -- -i
> RTE>>ring_perf_autotest (#1 run w/o the patch)
> ### Testing single element and burst enq/deq ### SP/SC single
> enq/dequeue: 103 MP/MC single enq/dequeue: 130 SP/SC burst
> enq/dequeue (size: 8): 18 MP/MC burst enq/dequeue (size: 8): 21 SP/SC
> burst enq/dequeue (size: 32): 7 MP/MC burst enq/dequeue (size: 32): 8
> 
> ### Testing empty dequeue ###
> SC empty dequeue: 3.00
> MC empty dequeue: 3.00
> 
> ### Testing using a single lcore ###
> SP/SC bulk enq/dequeue (size: 8): 17.48
> MP/MC bulk enq/dequeue (size: 8): 21.77
> SP/SC bulk enq/dequeue (size: 32): 7.39
> MP/MC bulk enq/dequeue (size: 32): 8.52
> 
> ### Testing using two hyperthreads ###
> SP/SC bulk enq/dequeue (size: 8): 31.32
> MP/MC bulk enq/dequeue (size: 8): 38.52
> SP/SC bulk enq/dequeue (size: 32): 13.39 MP/MC bulk enq/dequeue (size:
> 32): 14.15
> 
> ### Testing using two physical cores ### SP/SC bulk enq/dequeue (size: 8):
> 75.00 MP/MC bulk enq/dequeue (size: 8): 141.97 SP/SC bulk enq/dequeue
> (size: 32): 23.85 MP/MC bulk enq/dequeue (size: 32): 36.13 Test OK
> RTE>>ring_perf_autotest (#2 run w/o the patch)
> ### Testing single element and burst enq/deq ### SP/SC single
> enq/dequeue: 103 MP/MC single enq/dequeue: 130 SP/SC burst
> enq/dequeue (size: 8): 18 MP/MC burst enq/dequeue (size: 8): 21 SP/SC
> burst enq/dequeue (size: 32): 7 MP/MC burst enq/dequeue (size: 32): 8
> 
> ### Testing empty dequeue ###
> SC empty dequeue: 3.00
> MC empty dequeue: 3.00
> 
> ### Testing using a single lcore ###
> SP/SC bulk enq/dequeue (size: 8): 17.48
> MP/MC bulk enq/dequeue (size: 8): 21.77
> SP/SC bulk enq/dequeue (size: 32): 7.38
> MP/MC bulk enq/dequeue (size: 32): 8.52
> 
> ### Testing using two hyperthreads ###
> SP/SC bulk enq/dequeue (size: 8): 31.31
> MP/MC bulk enq/dequeue (size: 8): 38.52
> SP/SC bulk enq/dequeue (size: 32): 13.33 MP/MC bulk enq/dequeue (size:
> 32): 14.16
> 
> ### Testing using two physical cores ### SP/SC bulk enq/dequeue (size: 8):
> 75.74 MP/MC bulk enq/dequeue (size: 8): 147.33 SP/SC bulk enq/dequeue
> (size: 32): 24.79 MP/MC bulk enq/dequeue (size: 32): 40.09 Test OK
> 
> RTE>>ring_perf_autotest (#1 run w/ the patch)
> ### Testing single element and burst enq/deq ### SP/SC single
> enq/dequeue: 103 MP/MC single enq/dequeue: 129 SP/SC burst
> enq/dequeue (size: 8): 18 MP/MC burst enq/dequeue (size: 8): 22 SP/SC
> burst enq/dequeue (size: 32): 7 MP/MC burst enq/dequeue (size: 32): 8
> 
> ### Testing empty dequeue ###
> SC empty dequeue: 3.00
> MC empty dequeue: 4.00
> 
> ### Testing using a single lcore ###
> SP/SC bulk enq/dequeue (size: 8): 17.89
> MP/MC bulk enq/dequeue (size: 8): 21.77
> SP/SC bulk enq/dequeue (size: 32): 7.50
> MP/MC bulk enq/dequeue (size: 32): 8.52
> 
> ### Testing using two hyperthreads ###
> SP/SC bulk enq/dequeue (size: 8): 31.24
> MP/MC bulk enq/dequeue (size: 8): 38.14
> SP/SC bulk enq/dequeue (size: 32): 13.24 MP/MC bulk enq/dequeue (size:
> 32): 14.69
> 
> ### Testing using two physical cores ### SP/SC bulk enq/dequeue (size: 8):
> 74.63 MP/MC bulk enq/dequeue (size: 8): 137.61 SP/SC bulk enq/dequeue
> (size: 32): 24.82 MP/MC bulk enq/dequeue (size: 32): 36.64 Test OK
> RTE>>ring_perf_autotest (#1 run w/ the patch)
> ### Testing single element and burst enq/deq ### SP/SC single
> enq/dequeue: 103 MP/MC single enq/dequeue: 129 SP/SC burst
> enq/dequeue (size: 8): 18 MP/MC burst enq/dequeue (size: 8): 22 SP/SC
> burst enq/dequeue (size: 32): 7 MP/MC burst enq/dequeue (size: 32): 8
> 
> ### Testing empty dequeue ###
> SC empty dequeue: 3.00
> MC empty dequeue: 4.00
> 
> ### Testing using a single lcore ###
> SP/SC bulk enq/dequeue (size: 8): 17.89
> MP/MC bulk enq/dequeue (size: 8): 21.77
> SP/SC bulk enq/dequeue (size: 32): 7.50
> MP/MC bulk enq/dequeue (size: 32): 8.52
> 
> ### Testing using two hyperthreads ###
> SP/SC bulk enq/dequeue (size: 8): 31.53
> MP/MC bulk enq/dequeue (size: 8): 38.59
> SP/SC bulk enq/dequeue (size: 32): 13.24 MP/MC bulk enq/dequeue (size:
> 32): 14.69
> 
> ### Testing using two physical cores ### SP/SC bulk enq/dequeue (size: 8):
> 75.60 MP/MC bulk enq/dequeue (size: 8): 149.14 SP/SC bulk enq/dequeue
> (size: 32): 25.13 MP/MC bulk enq/dequeue (size: 32): 40.60 Test OK
> 
> 
> > -----Original Message-----
> > From: Jerin Jacob <jerin.jacob at caviumnetworks.com>
> > Sent: Monday, October 8, 2018 6:50 PM
> > To: Gavin Hu (Arm Technology China) <Gavin.Hu at arm.com>
> > Cc: Ola Liljedahl <Ola.Liljedahl at arm.com>; dev at dpdk.org; Honnappa
> > Nagarahalli <Honnappa.Nagarahalli at arm.com>; Ananyev, Konstantin
> > <konstantin.ananyev at intel.com>; Steve Capper
> <Steve.Capper at arm.com>;
> > nd <nd at arm.com>; stable at dpdk.org
> > Subject: Re: [PATCH v3 1/3] ring: read tail using atomic load
> >
> > -----Original Message-----
> > > Date: Mon, 8 Oct 2018 10:33:43 +0000
> > > From: "Gavin Hu (Arm Technology China)" <Gavin.Hu at arm.com>
> > > To: Ola Liljedahl <Ola.Liljedahl at arm.com>, Jerin Jacob
> > > <jerin.jacob at caviumnetworks.com>
> > > CC: "dev at dpdk.org" <dev at dpdk.org>, Honnappa Nagarahalli
> > > <Honnappa.Nagarahalli at arm.com>, "Ananyev, Konstantin"
> > >  <konstantin.ananyev at intel.com>, Steve Capper
> > <Steve.Capper at arm.com>,
> > > nd  <nd at arm.com>, "stable at dpdk.org" <stable at dpdk.org>
> > > Subject: RE: [PATCH v3 1/3] ring: read tail using atomic load
> > >
> > >
> > > I did benchmarking w/o and w/ the patch, it did not show any
> > > noticeable
> > differences in terms of latency.
> > > Here is the full log( 3 runs w/o the patch and 2 runs w/ the patch).
> > >
> > > sudo ./test/test/test -l 16-19,44-47,72-75,100-103 -n 4
> > > --socket-mem=1024  -- -i
> >
> > These counters are running at 100MHz. Use PMU counters to get more
> > accurate results.
> >
> > https://doc.dpdk.org/guides/prog_guide/profile_app.html
> > See: 55.2. Profiling on ARM64
> >


More information about the dev mailing list