[dpdk-dev] [PATCH v4 1/2] lib/ring: apis to support configurable element size

Honnappa Nagarahalli Honnappa.Nagarahalli at arm.com
Wed Oct 23 20:58:50 CEST 2019


<snip>
> >
> > > > I have applied your
> > > > suggestion in 6/6 in v6 along with my corrections. The
> > > > rte_ring_elem test cases are added in 3/6. I have verified that they are
> running fine (they are done for 64b alone, will add more). Hopefully, there are
> no more errors.
> >
> > Applied v6 and re-run the tests.
> > Functional test passes ok on my boxes.
> > Pert-tests numbers below.
> > As I can see pretty much same pattern as in v5 remains:
> > MP/MC on 2 different cores
> 
> Forgot to add: for 8 elems, for 32 - new ones always better.
> 
> > and SP/SC single enq/deq
> > show lower numbers for _elem_.
> > For others _elem_ numbers are about the same or higher.
> > Personally, I am ok to go ahead with these changes.
> > Konstantin
> >
> > A - ring_perf_autotes
> > B - ring_perf_elem_autotest
> >
> >  ### Testing single element and burst enq/deq ###	A	B
> > SP/SC single enq/dequeue: 				8.27	10.94
> > MP/MC single enq/dequeue: 				56.11	47.43
> > SP/SC burst enq/dequeue (size: 8): 			4.20	3.50
> > MP/MC burst enq/dequeue (size: 8): 			9.93	9.29
> > SP/SC burst enq/dequeue (size: 32): 			2.93	1.94
> > MP/MC burst enq/dequeue (size: 32): 			4.10	3.35
> >
> > ### Testing empty dequeue ###
> > SC empty dequeue: 					2.00	3.00
> > MC empty dequeue: 					3.00	2.00
> >
> > ### Testing using a single lcore ###
> > SP/SC bulk enq/dequeue (size: 8): 			4.06	3.30
> > MP/MC bulk enq/dequeue (size: 8): 			9.84	9.28
> > SP/SC bulk enq/dequeue (size: 32): 			2.93	1.88
> > MP/MC bulk enq/dequeue (size: 32): 			4.10	3.32
> >
> > ### Testing using two hyperthreads ###
> > SP/SC bulk enq/dequeue (size: 8): 			9.22	8.83
> > MP/MC bulk enq/dequeue (size: 8): 			15.73	15.86
> > SP/SC bulk enq/dequeue (size: 32): 			5.78	3.83
> > MP/MC bulk enq/dequeue (size: 32): 			6.33	4.53
> >
> > ### Testing using two physical cores ###
> > SP/SC bulk enq/dequeue (size: 8): 			23.78	19.32
> > MP/MC bulk enq/dequeue (size: 8): 			68.54	71.97
> > SP/SC bulk enq/dequeue (size: 32): 			11.99	10.77
> > MP/MC bulk enq/dequeue (size: 32): 			21.96	18.66
> >
> > ### Testing using two NUMA nodes ###
> > SP/SC bulk enq/dequeue (size: 8): 			50.13	33.92
> > MP/MC bulk enq/dequeue (size: 8): 			177.98	195.87
> > SP/SC bulk enq/dequeue (size: 32): 			32.98	23.12
> > MP/MC bulk enq/dequeue (size: 32): 			55.86	48.76

Thanks Konstantin. The performance of 5/6 is mostly worst than 6/6. So, we should not consider 5/6 (will not be included in the future).
A - ring_perf_autotest (existing code)
B - ring_perf_elem_autotest (6/6)

Numbers from my side:
On one Arm platform:
### Testing single element and burst enq/deq ###	A	B
SP/SC single enq/dequeue:				1.04	1.06 (1.92)
MP/MC single enq/dequeue: 				1.46	1.51 (3.42)
SP/SC burst enq/dequeue (size: 8): 			0.18	0.17 (-5.55)
MP/MC burst enq/dequeue (size: 8): 			0.23	0.22 (-4.34)
SP/SC burst enq/dequeue (size: 32): 			0.05	0.05 (0)
MP/MC burst enq/dequeue (size: 32): 			0.07	0.06 (-14.28)
	
### Testing empty dequeue ###	
SC empty dequeue: 					0.27	0.27 (0)
MC empty dequeue: 					0.27	0.27 (0)
	
### Testing using a single lcore ###	
SP/SC bulk enq/dequeue (size: 8): 			0.18	0.17 (-5.55)
MP/MC bulk enq/dequeue (size: 8): 			0.23	0.23 (0)
SP/SC bulk enq/dequeue (size: 32): 			0.05	0.05 (0)
MP/MC bulk enq/dequeue (size: 32): 			0.07	0.06 (0)
	
### Testing using two physical cores ###	
SP/SC bulk enq/dequeue (size: 8): 			0.79	0.79 (0)
MP/MC bulk enq/dequeue (size: 8): 			1.42	1.37 (-3.52)
SP/SC bulk enq/dequeue (size: 32): 			0.20	0.20 (0)
MP/MC bulk enq/dequeue (size: 32): 			0.33	0.35 (6.06)

On another Arm platform:

### Testing single element and burst enq/deq ###	A	B	
SP/SC single enq/dequeue:				11.54	11.79 (2.16)
MP/MC single enq/dequeue: 				11.84	12.54 (5.91)
SP/SC burst enq/dequeue (size: 8): 			1.51	1.33   (-11.92)
MP/MC burst enq/dequeue (size: 8): 			1.91	1.73   (-9.42)
SP/SC burst enq/dequeue (size: 32): 			0.62	0.42   (-32.25)
MP/MC burst enq/dequeue (size: 32): 			0.72	0.52   (-27.77)
	
### Testing empty dequeue ###	
SC empty dequeue: 					2.48	2.48 (0)
MC empty dequeue: 					2.48	2.48 (0)
	
### Testing using a single lcore ###	
SP/SC bulk enq/dequeue (size: 8): 			1.52	1.33 (-12.5)
MP/MC bulk enq/dequeue (size: 8): 			1.92	1.73 (-9.89)
SP/SC bulk enq/dequeue (size: 32): 			0.62	0.42 (-32.25)
MP/MC bulk enq/dequeue (size: 32): 			0.72	0.52 (-27.77)
	
### Testing using two physical cores ###	
SP/SC bulk enq/dequeue (size: 8): 			6.30	6.57   (4.28)
MP/MC bulk enq/dequeue (size: 8): 			10.59	10.45 (-1.32)
SP/SC bulk enq/dequeue (size: 32): 			1.92	1.58   (-17.70)
MP/MC bulk enq/dequeue (size: 32): 			2.51	2.47   (-1.59)

From my side, I would say let us just go with patch 2/6.

Jerin/David, any opinion on your side?


More information about the dev mailing list