[dpdk-dev] [PATCH v3 1/3] ring: read tail using atomic load

Gavin Hu (Arm Technology China) Gavin.Hu at arm.com
Wed Oct 10 08:28:36 CEST 2018


Hi Jerin,

Following the guide to use the PMU counters(KO inserted and DPDK recompiled), the numbers increased 10+ folds(bigger numbers here mean more precise?), is this valid and expected? 
No significant difference was seen. 

gavin at net-arm-thunderx2:~/community/dpdk$ sudo ./test/test/test -l 16-19,44-47,72-75,100-103 -n 4 --socket-mem=1024  -- -i
RTE>>ring_perf_autotest (#1 run w/o the patch)
### Testing single element and burst enq/deq ###
SP/SC single enq/dequeue: 103
MP/MC single enq/dequeue: 130
SP/SC burst enq/dequeue (size: 8): 18
MP/MC burst enq/dequeue (size: 8): 21
SP/SC burst enq/dequeue (size: 32): 7
MP/MC burst enq/dequeue (size: 32): 8

### Testing empty dequeue ###
SC empty dequeue: 3.00
MC empty dequeue: 3.00

### Testing using a single lcore ###
SP/SC bulk enq/dequeue (size: 8): 17.48
MP/MC bulk enq/dequeue (size: 8): 21.77
SP/SC bulk enq/dequeue (size: 32): 7.39
MP/MC bulk enq/dequeue (size: 32): 8.52

### Testing using two hyperthreads ###
SP/SC bulk enq/dequeue (size: 8): 31.32
MP/MC bulk enq/dequeue (size: 8): 38.52
SP/SC bulk enq/dequeue (size: 32): 13.39
MP/MC bulk enq/dequeue (size: 32): 14.15

### Testing using two physical cores ###
SP/SC bulk enq/dequeue (size: 8): 75.00
MP/MC bulk enq/dequeue (size: 8): 141.97
SP/SC bulk enq/dequeue (size: 32): 23.85
MP/MC bulk enq/dequeue (size: 32): 36.13
Test OK
RTE>>ring_perf_autotest (#2 run w/o the patch)
### Testing single element and burst enq/deq ###
SP/SC single enq/dequeue: 103
MP/MC single enq/dequeue: 130
SP/SC burst enq/dequeue (size: 8): 18
MP/MC burst enq/dequeue (size: 8): 21
SP/SC burst enq/dequeue (size: 32): 7
MP/MC burst enq/dequeue (size: 32): 8

### Testing empty dequeue ###
SC empty dequeue: 3.00
MC empty dequeue: 3.00

### Testing using a single lcore ###
SP/SC bulk enq/dequeue (size: 8): 17.48
MP/MC bulk enq/dequeue (size: 8): 21.77
SP/SC bulk enq/dequeue (size: 32): 7.38
MP/MC bulk enq/dequeue (size: 32): 8.52

### Testing using two hyperthreads ###
SP/SC bulk enq/dequeue (size: 8): 31.31
MP/MC bulk enq/dequeue (size: 8): 38.52
SP/SC bulk enq/dequeue (size: 32): 13.33
MP/MC bulk enq/dequeue (size: 32): 14.16

### Testing using two physical cores ###
SP/SC bulk enq/dequeue (size: 8): 75.74
MP/MC bulk enq/dequeue (size: 8): 147.33
SP/SC bulk enq/dequeue (size: 32): 24.79
MP/MC bulk enq/dequeue (size: 32): 40.09
Test OK

RTE>>ring_perf_autotest (#1 run w/ the patch)
### Testing single element and burst enq/deq ###
SP/SC single enq/dequeue: 103
MP/MC single enq/dequeue: 129
SP/SC burst enq/dequeue (size: 8): 18
MP/MC burst enq/dequeue (size: 8): 22
SP/SC burst enq/dequeue (size: 32): 7
MP/MC burst enq/dequeue (size: 32): 8

### Testing empty dequeue ###
SC empty dequeue: 3.00
MC empty dequeue: 4.00

### Testing using a single lcore ###
SP/SC bulk enq/dequeue (size: 8): 17.89
MP/MC bulk enq/dequeue (size: 8): 21.77
SP/SC bulk enq/dequeue (size: 32): 7.50
MP/MC bulk enq/dequeue (size: 32): 8.52

### Testing using two hyperthreads ###
SP/SC bulk enq/dequeue (size: 8): 31.24
MP/MC bulk enq/dequeue (size: 8): 38.14
SP/SC bulk enq/dequeue (size: 32): 13.24
MP/MC bulk enq/dequeue (size: 32): 14.69

### Testing using two physical cores ###
SP/SC bulk enq/dequeue (size: 8): 74.63
MP/MC bulk enq/dequeue (size: 8): 137.61
SP/SC bulk enq/dequeue (size: 32): 24.82
MP/MC bulk enq/dequeue (size: 32): 36.64
Test OK
RTE>>ring_perf_autotest (#1 run w/ the patch)
### Testing single element and burst enq/deq ###
SP/SC single enq/dequeue: 103
MP/MC single enq/dequeue: 129
SP/SC burst enq/dequeue (size: 8): 18
MP/MC burst enq/dequeue (size: 8): 22
SP/SC burst enq/dequeue (size: 32): 7
MP/MC burst enq/dequeue (size: 32): 8

### Testing empty dequeue ###
SC empty dequeue: 3.00
MC empty dequeue: 4.00

### Testing using a single lcore ###
SP/SC bulk enq/dequeue (size: 8): 17.89
MP/MC bulk enq/dequeue (size: 8): 21.77
SP/SC bulk enq/dequeue (size: 32): 7.50
MP/MC bulk enq/dequeue (size: 32): 8.52

### Testing using two hyperthreads ###
SP/SC bulk enq/dequeue (size: 8): 31.53
MP/MC bulk enq/dequeue (size: 8): 38.59
SP/SC bulk enq/dequeue (size: 32): 13.24
MP/MC bulk enq/dequeue (size: 32): 14.69

### Testing using two physical cores ###
SP/SC bulk enq/dequeue (size: 8): 75.60
MP/MC bulk enq/dequeue (size: 8): 149.14
SP/SC bulk enq/dequeue (size: 32): 25.13
MP/MC bulk enq/dequeue (size: 32): 40.60
Test OK


> -----Original Message-----
> From: Jerin Jacob <jerin.jacob at caviumnetworks.com>
> Sent: Monday, October 8, 2018 6:50 PM
> To: Gavin Hu (Arm Technology China) <Gavin.Hu at arm.com>
> Cc: Ola Liljedahl <Ola.Liljedahl at arm.com>; dev at dpdk.org; Honnappa
> Nagarahalli <Honnappa.Nagarahalli at arm.com>; Ananyev, Konstantin
> <konstantin.ananyev at intel.com>; Steve Capper <Steve.Capper at arm.com>;
> nd <nd at arm.com>; stable at dpdk.org
> Subject: Re: [PATCH v3 1/3] ring: read tail using atomic load
> 
> -----Original Message-----
> > Date: Mon, 8 Oct 2018 10:33:43 +0000
> > From: "Gavin Hu (Arm Technology China)" <Gavin.Hu at arm.com>
> > To: Ola Liljedahl <Ola.Liljedahl at arm.com>, Jerin Jacob
> > <jerin.jacob at caviumnetworks.com>
> > CC: "dev at dpdk.org" <dev at dpdk.org>, Honnappa Nagarahalli
> > <Honnappa.Nagarahalli at arm.com>, "Ananyev, Konstantin"
> >  <konstantin.ananyev at intel.com>, Steve Capper
> <Steve.Capper at arm.com>,
> > nd  <nd at arm.com>, "stable at dpdk.org" <stable at dpdk.org>
> > Subject: RE: [PATCH v3 1/3] ring: read tail using atomic load
> >
> >
> > I did benchmarking w/o and w/ the patch, it did not show any noticeable
> differences in terms of latency.
> > Here is the full log( 3 runs w/o the patch and 2 runs w/ the patch).
> >
> > sudo ./test/test/test -l 16-19,44-47,72-75,100-103 -n 4
> > --socket-mem=1024  -- -i
> 
> These counters are running at 100MHz. Use PMU counters to get more
> accurate results.
> 
> https://doc.dpdk.org/guides/prog_guide/profile_app.html
> See: 55.2. Profiling on ARM64
> 


More information about the dev mailing list