[dpdk-dev] [PATCH 3/3] ring: use element APIs to implement legacy APIs

David Christensen drc at linux.vnet.ibm.com
Tue Jul 7 22:07:29 CEST 2020



On 7/7/20 7:04 AM, Ananyev, Konstantin wrote:
> 
> Hi Feifei,
> 
>   > Hi, Konstantin, David
>>
>> I'm Feifei Wang from Arm. Sorry to make the following request:
>> Would you please do some ring performance tests of this patch in your platforms at the time you are free?
>> And I want to know whether this patch has a significant impact on other platforms except ARM.
> 
> I run few tests on SKX box and so far didn’t notice any real perf difference.
> Konstantin
>   

Full performance results for IBM POWER9 system below.  I ran the tests  
twice for each version and the results were consistent.

                                      without this patch   with this patch
Testing burst enq/deq
legacy APIs: SP/SC: burst (size: 8):          43.63              43.63
legacy APIs: SP/SC: burst (size: 32):         50.07              50.04
legacy APIs: MP/MC: burst (size: 8):          58.43              58.42
legacy APIs: MP/MC: burst (size: 32):         65.52              65.51
Testing bulk enq/deq
legacy APIs: SP/SC: bulk (size: 8):           43.61              43.61
legacy APIs: SP/SC: bulk (size: 32):          50.05              50.02
legacy APIs: MP/MC: bulk (size: 8):           58.43              58.43
legacy APIs: MP/MC: bulk (size: 32):          65.50              65.49

HW:
Architecture:        ppc64le
Byte Order:          Little Endian
CPU(s):              128
On-line CPU(s) list: 0-127
Thread(s) per core:  4
Core(s) per socket:  16
Socket(s):           2
NUMA node(s):        6
Model:               2.3 (pvr 004e 1203)
Model name:          POWER9, altivec supported
CPU max MHz:         3800.0000
CPU min MHz:         2300.0000
L1d cache:           32K
L1i cache:           32K
L2 cache:            512K
L3 cache:            10240K
NUMA node0 CPU(s):   0-63
NUMA node8 CPU(s):   64-127

OS: RHEL 8.2

GCC: gcc version 8.3.1 20191121 (Red Hat 8.3.1-5) (GCC)

DPDK: 20.08.0-rc0 (a8550b773)



Unpatched
===========
sudo app/test/dpdk-test -l 68,69
EAL: Detected 128 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: No available hugepages reported in hugepages-2048kB
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: Probe PCI driver: net_mlx5 (15b3:1019) device: 0000:01:00.0 (socket 0)
EAL: Probe PCI driver: net_mlx5 (15b3:1019) device: 0000:01:00.1 (socket 0)
EAL: Probe PCI driver: net_mlx5 (15b3:1019) device: 0030:01:00.0 (socket 8)
EAL: Probe PCI driver: net_mlx5 (15b3:1019) device: 0030:01:00.1 (socket 8)
EAL:   using IOMMU type 7 (sPAPR)
EAL: Probe PCI driver: net_i40e (8086:1583) device: 0034:01:00.0 (socket 8)
EAL: Probe PCI driver: net_i40e (8086:1583) device: 0034:01:00.1 (socket 8)
APP: HPET is not enabled, using TSC as default timer
RTE>>ring_perf_autotest

### Testing single element enq/deq ###
legacy APIs: SP/SC: single: 42.01
legacy APIs: MP/MC: single: 56.27

### Testing burst enq/deq ###
legacy APIs: SP/SC: burst (size: 8): 43.63
legacy APIs: SP/SC: burst (size: 32): 50.07
legacy APIs: MP/MC: burst (size: 8): 58.43
legacy APIs: MP/MC: burst (size: 32): 65.52

### Testing bulk enq/deq ###
legacy APIs: SP/SC: bulk (size: 8): 43.61
legacy APIs: SP/SC: bulk (size: 32): 50.05
legacy APIs: MP/MC: bulk (size: 8): 58.43
legacy APIs: MP/MC: bulk (size: 32): 65.50

### Testing empty bulk deq ###
legacy APIs: SP/SC: bulk (size: 8): 7.16
legacy APIs: MP/MC: bulk (size: 8): 7.16

### Testing using two hyperthreads ###
legacy APIs: SP/SC: bulk (size: 8): 12.44
legacy APIs: MP/MC: bulk (size: 8): 16.19
legacy APIs: SP/SC: bulk (size: 32): 3.10
legacy APIs: MP/MC: bulk (size: 32): 3.64

### Testing using all slave nodes ###

Bulk enq/dequeue count on size 8
Core [68] count = 362382
Core [69] count = 362516
Total count (size: 8): 724898

Bulk enq/dequeue count on size 32
Core [68] count = 361565
Core [69] count = 361852
Total count (size: 32): 723417

### Testing single element enq/deq ###
elem APIs: element size 16B: SP/SC: single: 42.81
elem APIs: element size 16B: MP/MC: single: 56.78

### Testing burst enq/deq ###
elem APIs: element size 16B: SP/SC: burst (size: 8): 45.04
elem APIs: element size 16B: SP/SC: burst (size: 32): 59.27
elem APIs: element size 16B: MP/MC: burst (size: 8): 60.68
elem APIs: element size 16B: MP/MC: burst (size: 32): 75.00

### Testing bulk enq/deq ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 45.05
elem APIs: element size 16B: SP/SC: bulk (size: 32): 59.23
elem APIs: element size 16B: MP/MC: bulk (size: 8): 60.64
elem APIs: element size 16B: MP/MC: bulk (size: 32): 75.11

### Testing empty bulk deq ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 7.16
elem APIs: element size 16B: MP/MC: bulk (size: 8): 7.16

### Testing using two hyperthreads ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 12.15
elem APIs: element size 16B: MP/MC: bulk (size: 8): 15.55
elem APIs: element size 16B: SP/SC: bulk (size: 32): 3.22
elem APIs: element size 16B: MP/MC: bulk (size: 32): 3.86

### Testing using all slave nodes ###

Bulk enq/dequeue count on size 8
Core [68] count = 374327
Core [69] count = 374433
Total count (size: 8): 748760

Bulk enq/dequeue count on size 32
Core [68] count = 324111
Core [69] count = 320038
Total count (size: 32): 644149
Test OK

Patched
=======
$ sudo app/test/dpdk-test -l 68,69
EAL: Detected 128 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: No available hugepages reported in hugepages-2048kB
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: Probe PCI driver: net_mlx5 (15b3:1019) device: 0000:01:00.0 (socket 0)
EAL: Probe PCI driver: net_mlx5 (15b3:1019) device: 0000:01:00.1 (socket 0)
EAL: Probe PCI driver: net_mlx5 (15b3:1019) device: 0030:01:00.0 (socket 8)
EAL: Probe PCI driver: net_mlx5 (15b3:1019) device: 0030:01:00.1 (socket 8)
EAL:   using IOMMU type 7 (sPAPR)
EAL: Probe PCI driver: net_i40e (8086:1583) device: 0034:01:00.0 (socket 8)
EAL: Probe PCI driver: net_i40e (8086:1583) device: 0034:01:00.1 (socket 8)
APP: HPET is not enabled, using TSC as default timer
RTE>>ring_perf_autotest

### Testing single element enq/deq ###
legacy APIs: SP/SC: single: 42.00
legacy APIs: MP/MC: single: 56.27

### Testing burst enq/deq ###
legacy APIs: SP/SC: burst (size: 8): 43.63
legacy APIs: SP/SC: burst (size: 32): 50.04
legacy APIs: MP/MC: burst (size: 8): 58.42
legacy APIs: MP/MC: burst (size: 32): 65.51

### Testing bulk enq/deq ###
legacy APIs: SP/SC: bulk (size: 8): 43.61
legacy APIs: SP/SC: bulk (size: 32): 50.02
legacy APIs: MP/MC: bulk (size: 8): 58.43
legacy APIs: MP/MC: bulk (size: 32): 65.49

### Testing empty bulk deq ###
legacy APIs: SP/SC: bulk (size: 8): 7.16
legacy APIs: MP/MC: bulk (size: 8): 7.16

### Testing using two hyperthreads ###
legacy APIs: SP/SC: bulk (size: 8): 12.43
legacy APIs: MP/MC: bulk (size: 8): 16.17
legacy APIs: SP/SC: bulk (size: 32): 3.10
legacy APIs: MP/MC: bulk (size: 32): 3.65

### Testing using all slave nodes ###

Bulk enq/dequeue count on size 8
Core [68] count = 363208
Core [69] count = 363334
Total count (size: 8): 726542

Bulk enq/dequeue count on size 32
Core [68] count = 361592
Core [69] count = 361690
Total count (size: 32): 723282

### Testing single element enq/deq ###
elem APIs: element size 16B: SP/SC: single: 42.78
elem APIs: element size 16B: MP/MC: single: 56.75

### Testing burst enq/deq ###
elem APIs: element size 16B: SP/SC: burst (size: 8): 45.04
elem APIs: element size 16B: SP/SC: burst (size: 32): 59.27
elem APIs: element size 16B: MP/MC: burst (size: 8): 60.66
elem APIs: element size 16B: MP/MC: burst (size: 32): 75.03

### Testing bulk enq/deq ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 45.04
elem APIs: element size 16B: SP/SC: bulk (size: 32): 59.33
elem APIs: element size 16B: MP/MC: bulk (size: 8): 60.65
elem APIs: element size 16B: MP/MC: bulk (size: 32): 75.04

### Testing empty bulk deq ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 7.16
elem APIs: element size 16B: MP/MC: bulk (size: 8): 7.16

### Testing using two hyperthreads ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 12.14
elem APIs: element size 16B: MP/MC: bulk (size: 8): 15.56
elem APIs: element size 16B: SP/SC: bulk (size: 32): 3.22
elem APIs: element size 16B: MP/MC: bulk (size: 32): 3.86

### Testing using all slave nodes ###

Bulk enq/dequeue count on size 8
Core [68] count = 372618
Core [69] count = 372415
Total count (size: 8): 745033

Bulk enq/dequeue count on size 32
Core [68] count = 318784
Core [69] count = 316066
Total count (size: 32): 634850
Test OK


More information about the dev mailing list