[dpdk-dev] [RFC 0/6] New sync modes for ring

David Christensen drc at linux.vnet.ibm.com
Fri Feb 28 01:17:54 CET 2020


> On Tue, Feb 25, 2020 at 7:11 PM Ananyev, Konstantin
> <konstantin.ananyev at intel.com> wrote:
> 
>> We do have a run-time check in our current enqueue()/dequeue implementation.
>> In fact we support both modes: we have generic rte_ring_enqueue(/dequeue)_bulk(/burst)
>> where sync behaviour is determined at runtime by value of prod(/cons).single.
>> Or user can call  rte_ring_(mp/sp)_enqueue_* functions directly.
>> This RFC follows exactly the same paradigm:
>> rte_ring_enqueue(/dequeue)_bulk(/burst) kept generic and it's
>> behaviour is determined at runtime, by value of prod(/cons).sync_type.
>> Or user can call enqueue/dequeue with particular sync mode directly:
>> rte_ring_(mp/sp/rts/hts)_enqueue_(bulk/burst)*.
>> The only thing that changed:
>>   Format of prod/cons now could differ depending on mode selected at _init_.
>>   So you can't create a ring for let say SP mode and then in the middle of data-path
>>   change your mind and start using MP_RTS mode.
>>   For existing modes (SP/MP, SC/MC)  format remains the same and user can still
>>   use them interchangeably, though of course that is an error prone practice.
> 
> Makes sense.
> 
> 
>>
>>>> But I agree with the problem statement that in the virtualization use
>>>> case, It may be possible to have N virtual cores runs on a physical
>>>> core.
>>>>
>>>> IMO, The best solution would be keeping the ring API same and have a
>>>> different flavor in "compile-time". Something like
>>>> liburcu did for accommodating different flavors.
>>>>
>>>> i.e urcu-qsbr.h and urcu-bp.h will identical definition of API. The
>>>> application can simply include ONE header file in a C file based on
>>>> the flavor.
>>
>> I don't think it is a flexible enough approach.
>> In one app user might need to have several rings with different sync modes.
>> Or even user might need a ring with different sync modes for enqueue/dequeue.
> 
> Ack.
> 
> 
>> Yes, hiding rte_ring implementation inside .c would help a lot
>> in terms of ABI maintenance and would make our future life easier.
>> The question is what is the price for it in terms of performance,
>> and are we ready to pay it. Not to mention that it would cause
>> changes in many other libs/apps...
>> So I think it should be a subject for a separate discussion.
>> But, agree it would be good at least to measure the performance
>> impact of such change.
>> If I'll have some spare cycles, will give it a try.
>> Meanwhile, can I ask Jerin and other guys to repeat tests from this RFC
>> on their HW? Before continuing discussion would probably be good to know
>> does the suggested patch work as expected across different platforms.
> 
> 
> I tested on an arm64 HW. The former section is without the
> patch(20.02) and later one with this patch.
> I agree with Konstantin that getting more platform tests will be good
> early so that we can focus on the approach
> to avoid back and forth latter.
> 
> 
> RTE>>ring_perf_autotest // without path
> 
> ### Testing single element enq/deq ###
> legacy APIs: SP/SC: single: 289.78
> legacy APIs: MP/MC: single: 516.20
> 
> ### Testing burst enq/deq ###
> legacy APIs: SP/SC: burst (size: 8): 312.88
> legacy APIs: SP/SC: burst (size: 32): 426.72
> legacy APIs: MP/MC: burst (size: 8): 510.95
> legacy APIs: MP/MC: burst (size: 32): 702.01
> 
> ### Testing bulk enq/deq ###
> legacy APIs: SP/SC: bulk (size: 8): 306.74
> legacy APIs: SP/SC: bulk (size: 32): 411.56
> legacy APIs: MP/MC: bulk (size: 8): 501.32
> legacy APIs: MP/MC: bulk (size: 32): 693.07
> 
> ### Testing empty bulk deq ###
> legacy APIs: SP/SC: bulk (size: 8): 7.00
> legacy APIs: MP/MC: bulk (size: 8): 7.00
> 
> ### Testing using two physical cores ###
> legacy APIs: SP/SC: bulk (size: 8): 74.36
> legacy APIs: MP/MC: bulk (size: 8): 110.18
> legacy APIs: SP/SC: bulk (size: 32): 23.04
> legacy APIs: MP/MC: bulk (size: 32): 32.29
> 
> ### Testing using all slave nodes ##
> Bulk enq/dequeue count on size 8
> Core [8] count = 293741
> Core [9] count = 293741
> Total count (size: 8): 587482
> 
> Bulk enq/dequeue count on size 32
> Core [8] count = 244909
> Core [9] count = 244909
> Total count (size: 32): 1077300
> 
> ### Testing single element enq/deq ###
> elem APIs: element size 16B: SP/SC: single: 255.37
> elem APIs: element size 16B: MP/MC: single: 456.68
> 
> ### Testing burst enq/deq ###
> elem APIs: element size 16B: SP/SC: burst (size: 8): 291.99
> elem APIs: element size 16B: SP/SC: burst (size: 32): 456.25
> elem APIs: element size 16B: MP/MC: burst (size: 8): 497.77
> elem APIs: element size 16B: MP/MC: burst (size: 32): 680.87
> 
> ### Testing bulk enq/deq ###
> elem APIs: element size 16B: SP/SC: bulk (size: 8): 284.40
> elem APIs: element size 16B: SP/SC: bulk (size: 32): 453.17
> elem APIs: element size 16B: MP/MC: bulk (size: 8): 485.77
> elem APIs: element size 16B: MP/MC: bulk (size: 32): 675.08
> 
> ### Testing empty bulk deq ###
> elem APIs: element size 16B: SP/SC: bulk (size: 8): 8.00
> elem APIs: element size 16B: MP/MC: bulk (size: 8): 7.00
> 
> ### Testing using two physical cores ###
> elem APIs: element size 16B: SP/SC: bulk (size: 8): 74.45
> elem APIs: element size 16B: MP/MC: bulk (size: 8): 105.91
> elem APIs: element size 16B: SP/SC: bulk (size: 32): 22.92
> elem APIs: element size 16B: MP/MC: bulk (size: 32): 31.55
> 
> ### Testing using all slave nodes ###
> 
> Bulk enq/dequeue count on size 8
> Core [8] count = 308724
> Core [9] count = 308723
> Total count (size: 8): 617447
> 
> Bulk enq/dequeue count on size 32
> Core [8] count = 214269
> Core [9] count = 214269
> Total count (size: 32): 1045985
> 
> RTE>>ring_perf_autotest // with patch
> 
> ### Testing single element enq/deq ###
> legacy APIs: SP/SC: single: 289.78
> legacy APIs: MP/MC: single: 475.76
> 
> ### Testing burst enq/deq ###
> legacy APIs: SP/SC: burst (size: 8): 323.91
> legacy APIs: SP/SC: burst (size: 32): 424.60
> legacy APIs: MP/MC: burst (size: 8): 523.00
> legacy APIs: MP/MC: burst (size: 32): 717.09
> 
> ### Testing bulk enq/deq ###
> legacy APIs: SP/SC: bulk (size: 8): 317.74
> legacy APIs: SP/SC: bulk (size: 32): 413.57
> legacy APIs: MP/MC: bulk (size: 8): 512.89
> legacy APIs: MP/MC: bulk (size: 32): 712.45
> 
> ### Testing empty bulk deq ###
> legacy APIs: SP/SC: bulk (size: 8): 7.00
> legacy APIs: MP/MC: bulk (size: 8): 7.00
> 
> ### Testing using two physical cores ###
> legacy APIs: SP/SC: bulk (size: 8): 74.82
> legacy APIs: MP/MC: bulk (size: 8): 96.45
> legacy APIs: SP/SC: bulk (size: 32): 22.97
> legacy APIs: MP/MC: bulk (size: 32): 32.52
> 
> ### Testing using all slave nodes ###
> 
> Bulk enq/dequeue count on size 8
> Core [8] count = 283928
> Core [9] count = 283927
> Total count (size: 8): 567855
> 
> Bulk enq/dequeue count on size 32
> Core [8] count = 223916
> Core [9] count = 223915
> Total count (size: 32): 1015686
> 
> ### Testing single element enq/deq ###
> elem APIs: element size 16B: SP/SC: single: 267.65
> elem APIs: element size 16B: MP/MC: single: 439.06
> 
> ### Testing burst enq/deq ###
> elem APIs: element size 16B: SP/SC: burst (size: 8): 302.44
> elem APIs: element size 16B: SP/SC: burst (size: 32): 466.31
> elem APIs: element size 16B: MP/MC: burst (size: 8): 502.51
> elem APIs: element size 16B: MP/MC: burst (size: 32): 695.81
> 
> ### Testing bulk enq/deq ###
> elem APIs: element size 16B: SP/SC: bulk (size: 8): 295.15
> elem APIs: element size 16B: SP/SC: bulk (size: 32): 462.77
> elem APIs: element size 16B: MP/MC: bulk (size: 8): 496.89
> elem APIs: element size 16B: MP/MC: bulk (size: 32): 690.46
> 
> ### Testing empty bulk deq ###
> elem APIs: element size 16B: SP/SC: bulk (size: 8): 7.50
> elem APIs: element size 16B: MP/MC: bulk (size: 8): 7.44
> 
> ### Testing using two physical cores ###
> elem APIs: element size 16B: SP/SC: bulk (size: 8): 65.85
> elem APIs: element size 16B: MP/MC: bulk (size: 8): 103.80
> elem APIs: element size 16B: SP/SC: bulk (size: 32): 23.27
> elem APIs: element size 16B: MP/MC: bulk (size: 32): 31.17
> 
> ### Testing using all slave nodes ###
> 
> Bulk enq/dequeue count on size 8
> Core [8] count = 304223
> Core [9] count = 304221
> Total count (size: 8): 608444
> 
> Bulk enq/dequeue count on size 32
> Core [8] count = 214856
> Core [9] count = 214855
> Total count (size: 32): 1038155
> Test OK
> RTE>>quit
> 
> 

Encountered a couple of different build errors with these patches on my 
Power 9 system:

In file included from ../lib/librte_ring/rte_ring.h:534,
                  from ../drivers/mempool/ring/rte_mempool_ring.c:9:
../lib/librte_ring/rte_ring_hts_generic.h: In function 
‘__rte_ring_hts_update_tail’:
../lib/librte_ring/rte_ring_hts_generic.h:61:2: warning: implicit 
declaration of function ‘RTE_ASSERT’; did you mean ‘RTE_STR’? 
[-Wimplicit-function-declaration]
   RTE_ASSERT(n >= num);
   ^~~~~~~~~~
   RTE_STR

Fixed by adding "#include <rte_debug.h>" to rte_ring.h.

Also encountered:

In file included from ../app/test/test_ring_hts_stress.c:5:
../app/test/test_ring_stress.h: In function ‘check_updt_elem’:
../app/test/test_ring_stress.h:162:9: error: unknown type name 
‘rte_spinlock_t’
   static rte_spinlock_t dump_lock;
          ^~~~~~~~~~~~~~
../app/test/test_ring_stress.h:166:4: warning: implicit declaration of 
function ‘rte_spinlock_lock’; did you mean ‘rte_calloc_socket’? 
[-Wimplicit-function-declaration]
     rte_spinlock_lock(&dump_lock);
     ^~~~~~~~~~~~~~~~~
     rte_calloc_socket
../app/test/test_ring_stress.h:166:4: warning: nested extern declaration 
of ‘rte_spinlock_lock’ [-Wnested-externs]
../app/test/test_ring_stress.h:172:4: warning: implicit declaration of 
function ‘rte_spinlock_unlock’; did you mean ‘pthread_rwlock_unlock’? 
[-Wimplicit-function-declaration]
     rte_spinlock_unlock(&dump_lock);
     ^~~~~~~~~~~~~~~~~~~
     pthread_rwlock_unlock
../app/test/test_ring_stress.h:172:4: warning: nested extern declaration 
of ‘rte_spinlock_unlock’ [-Wnested-externs]

Fixed by adding "#include <rte_spinlock.h>" to test_ring_stress.h.

Autoperf test results
---------------------
RTE>>ring_perf_autotest // DPDK 20.02, without patch

### Testing single element enq/deq ###
legacy APIs: SP/SC: single: 42.14
legacy APIs: MP/MC: single: 56.26

### Testing burst enq/deq ###
legacy APIs: SP/SC: burst (size: 8): 43.59
legacy APIs: SP/SC: burst (size: 32): 49.87
legacy APIs: MP/MC: burst (size: 8): 58.43
legacy APIs: MP/MC: burst (size: 32): 65.68

### Testing bulk enq/deq ###
legacy APIs: SP/SC: bulk (size: 8): 43.59
legacy APIs: SP/SC: bulk (size: 32): 49.85
legacy APIs: MP/MC: bulk (size: 8): 58.43
legacy APIs: MP/MC: bulk (size: 32): 65.60

### Testing empty bulk deq ###
legacy APIs: SP/SC: bulk (size: 8): 7.16
legacy APIs: MP/MC: bulk (size: 8): 7.16

### Testing using two hyperthreads ###
legacy APIs: SP/SC: bulk (size: 8): 12.46
legacy APIs: MP/MC: bulk (size: 8): 16.20
legacy APIs: SP/SC: bulk (size: 32): 3.21
legacy APIs: MP/MC: bulk (size: 32): 3.73

### Testing using two physical cores ###
legacy APIs: SP/SC: bulk (size: 8): 33.34
legacy APIs: MP/MC: bulk (size: 8): 37.99
legacy APIs: SP/SC: bulk (size: 32): 10.19
legacy APIs: MP/MC: bulk (size: 32): 11.90

### Testing using two NUMA nodes ###
legacy APIs: SP/SC: bulk (size: 8): 49.50
legacy APIs: MP/MC: bulk (size: 8): 63.65
legacy APIs: SP/SC: bulk (size: 32): 12.49
legacy APIs: MP/MC: bulk (size: 32): 23.53

### Testing using all slave nodes ###

Bulk enq/dequeue count on size 8
Core [4] count = 5604
Core [5] count = 5563
Core [6] count = 5576
Core [7] count = 5630
Core [8] count = 5643
Core [9] count = 5727
Core [10] count = 5698
Core [11] count = 5711
Core [64] count = 5259
Core [65] count = 5322
Core [66] count = 5321
Core [67] count = 5310
Core [68] count = 4350
Core [69] count = 4455
Core [70] count = 4546
Core [71] count = 4475
Total count (size: 8): 84190

Bulk enq/dequeue count on size 32
Core [4] count = 5543
Core [5] count = 5555
Core [6] count = 5596
Core [7] count = 5584
Core [8] count = 5613
Core [9] count = 5686
Core [10] count = 5689
Core [11] count = 5677
Core [64] count = 5228
Core [65] count = 5389
Core [66] count = 5406
Core [67] count = 5359
Core [68] count = 4554
Core [69] count = 4673
Core [70] count = 4675
Core [71] count = 4644
Total count (size: 32): 169061

### Testing single element enq/deq ###
elem APIs: element size 16B: SP/SC: single: 42.84
elem APIs: element size 16B: MP/MC: single: 56.77

### Testing burst enq/deq ###
elem APIs: element size 16B: SP/SC: burst (size: 8): 44.98
elem APIs: element size 16B: SP/SC: burst (size: 32): 59.16
elem APIs: element size 16B: MP/MC: burst (size: 8): 60.58
elem APIs: element size 16B: MP/MC: burst (size: 32): 74.75

### Testing bulk enq/deq ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 45.01
elem APIs: element size 16B: SP/SC: bulk (size: 32): 59.08
elem APIs: element size 16B: MP/MC: bulk (size: 8): 60.58
elem APIs: element size 16B: MP/MC: bulk (size: 32): 74.76

### Testing empty bulk deq ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 7.16
elem APIs: element size 16B: MP/MC: bulk (size: 8): 7.16

### Testing using two hyperthreads ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 12.18
elem APIs: element size 16B: MP/MC: bulk (size: 8): 15.44
elem APIs: element size 16B: SP/SC: bulk (size: 32): 3.22
elem APIs: element size 16B: MP/MC: bulk (size: 32): 3.97

### Testing using two physical cores ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 42.07
elem APIs: element size 16B: MP/MC: bulk (size: 8): 44.50
elem APIs: element size 16B: SP/SC: bulk (size: 32): 10.73
elem APIs: element size 16B: MP/MC: bulk (size: 32): 11.73

### Testing using two NUMA nodes ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 49.55
elem APIs: element size 16B: MP/MC: bulk (size: 8): 93.10
elem APIs: element size 16B: SP/SC: bulk (size: 32): 12.33
elem APIs: element size 16B: MP/MC: bulk (size: 32): 27.10

### Testing using all slave nodes ###

Bulk enq/dequeue count on size 8
Core [4] count = 5489
Core [5] count = 5559
Core [6] count = 5566
Core [7] count = 5577
Core [8] count = 5645
Core [9] count = 5699
Core [10] count = 5695
Core [11] count = 5733
Core [64] count = 5202
Core [65] count = 5284
Core [66] count = 5319
Core [67] count = 5349
Core [68] count = 4331
Core [69] count = 4465
Core [70] count = 4484
Core [71] count = 4439
Total count (size: 8): 83836

Bulk enq/dequeue count on size 32
Core [4] count = 5567
Core [5] count = 5492
Core [6] count = 5517
Core [7] count = 5515
Core [8] count = 5593
Core [9] count = 5650
Core [10] count = 5694
Core [11] count = 5665
Core [64] count = 5236
Core [65] count = 5319
Core [66] count = 5333
Core [67] count = 5304
Core [68] count = 4608
Core [69] count = 4669
Core [70] count = 4690
Core [71] count = 4654
Total count (size: 32): 168342
Test OK
---------------------
RTE>>ring_perf_autotest // DPDK 20.02, without patch

### Testing single element enq/deq ###
legacy APIs: SP/SC: single: 42.18
legacy APIs: MP/MC: single: 56.26

### Testing burst enq/deq ###
legacy APIs: SP/SC: burst (size: 8): 43.60
legacy APIs: SP/SC: burst (size: 32): 49.86
legacy APIs: MP/MC: burst (size: 8): 58.43
legacy APIs: MP/MC: burst (size: 32): 65.67

### Testing bulk enq/deq ###
legacy APIs: SP/SC: bulk (size: 8): 43.59
legacy APIs: SP/SC: bulk (size: 32): 49.86
legacy APIs: MP/MC: bulk (size: 8): 58.43
legacy APIs: MP/MC: bulk (size: 32): 65.63

### Testing empty bulk deq ###
legacy APIs: SP/SC: bulk (size: 8): 7.16
legacy APIs: MP/MC: bulk (size: 8): 7.16

### Testing using two hyperthreads ###
legacy APIs: SP/SC: bulk (size: 8): 12.07
legacy APIs: MP/MC: bulk (size: 8): 16.24
legacy APIs: SP/SC: bulk (size: 32): 3.20
legacy APIs: MP/MC: bulk (size: 32): 3.72

### Testing using two physical cores ###
legacy APIs: SP/SC: bulk (size: 8): 33.41
legacy APIs: MP/MC: bulk (size: 8): 38.01
legacy APIs: SP/SC: bulk (size: 32): 10.23
legacy APIs: MP/MC: bulk (size: 32): 11.90

### Testing using two NUMA nodes ###
legacy APIs: SP/SC: bulk (size: 8): 49.27
legacy APIs: MP/MC: bulk (size: 8): 64.80
legacy APIs: SP/SC: bulk (size: 32): 12.45
legacy APIs: MP/MC: bulk (size: 32): 23.11

### Testing using all slave nodes ###

Bulk enq/dequeue count on size 8
Core [4] count = 5637
Core [5] count = 5599
Core [6] count = 5623
Core [7] count = 5627
Core [8] count = 5723
Core [9] count = 5758
Core [10] count = 5714
Core [11] count = 5724
Core [64] count = 5310
Core [65] count = 5438
Core [66] count = 5448
Core [67] count = 5374
Core [68] count = 4441
Core [69] count = 4550
Core [70] count = 4550
Core [71] count = 4558
Total count (size: 8): 85074

Bulk enq/dequeue count on size 32
Core [4] count = 5608
Core [5] count = 5623
Core [6] count = 5590
Core [7] count = 5658
Core [8] count = 5680
Core [9] count = 5738
Core [10] count = 5692
Core [11] count = 5712
Core [64] count = 5273
Core [65] count = 5363
Core [66] count = 5341
Core [67] count = 5349
Core [68] count = 4591
Core [69] count = 4673
Core [70] count = 4698
Core [71] count = 4687
Total count (size: 32): 170350

### Testing single element enq/deq ###
elem APIs: element size 16B: SP/SC: single: 42.82
elem APIs: element size 16B: MP/MC: single: 56.79

### Testing burst enq/deq ###
elem APIs: element size 16B: SP/SC: burst (size: 8): 44.99
elem APIs: element size 16B: SP/SC: burst (size: 32): 59.00
elem APIs: element size 16B: MP/MC: burst (size: 8): 60.59
elem APIs: element size 16B: MP/MC: burst (size: 32): 74.78

### Testing bulk enq/deq ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 44.97
elem APIs: element size 16B: SP/SC: bulk (size: 32): 58.91
elem APIs: element size 16B: MP/MC: bulk (size: 8): 60.60
elem APIs: element size 16B: MP/MC: bulk (size: 32): 74.61

### Testing empty bulk deq ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 7.16
elem APIs: element size 16B: MP/MC: bulk (size: 8): 7.16

### Testing using two hyperthreads ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 12.18
elem APIs: element size 16B: MP/MC: bulk (size: 8): 15.41
elem APIs: element size 16B: SP/SC: bulk (size: 32): 3.19
elem APIs: element size 16B: MP/MC: bulk (size: 32): 4.06

### Testing using two physical cores ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 42.08
elem APIs: element size 16B: MP/MC: bulk (size: 8): 44.52
elem APIs: element size 16B: SP/SC: bulk (size: 32): 10.73
elem APIs: element size 16B: MP/MC: bulk (size: 32): 12.39

### Testing using two NUMA nodes ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 49.65
elem APIs: element size 16B: MP/MC: bulk (size: 8): 93.27
elem APIs: element size 16B: SP/SC: bulk (size: 32): 12.38
elem APIs: element size 16B: MP/MC: bulk (size: 32): 27.19

### Testing using all slave nodes ###

Bulk enq/dequeue count on size 8
Core [4] count = 5629
Core [5] count = 5585
Core [6] count = 5676
Core [7] count = 5604
Core [8] count = 5639
Core [9] count = 5731
Core [10] count = 5694
Core [11] count = 5707
Core [64] count = 5254
Core [65] count = 5331
Core [66] count = 5340
Core [67] count = 5355
Core [68] count = 4339
Core [69] count = 4481
Core [70] count = 4504
Core [71] count = 4507
Total count (size: 8): 84376

Bulk enq/dequeue count on size 32
Core [4] count = 5518
Core [5] count = 5493
Core [6] count = 5559
Core [7] count = 5484
Core [8] count = 5623
Core [9] count = 5669
Core [10] count = 5661
Core [11] count = 5658
Core [64] count = 5207
Core [65] count = 5305
Core [66] count = 5273
Core [67] count = 5303
Core [68] count = 4542
Core [69] count = 4682
Core [70] count = 4672
Core [71] count = 4609
Total count (size: 32): 168634
Test OK


Dave


More information about the dev mailing list