[net/mlx5] Performance drop with HWS compared to SWS

Dmitry Kozlyuk dmitry.kozliuk at gmail.com
Thu Jun 13 22:14:48 CEST 2024


Hi Dariusz,

Thank you for looking into the issue, please find full details below.

Summary:

Case       SWS (Mpps)  HWS (Mpps)
--------   ----------  ----------
baseline       148          -
jump_rss        37        148
jump_miss      148        107
jump_drop      148        107

From "baseline" vs "jump_rss", the problem is not in jump.
From "jump_miss" vs "jump_drop", the problem is not only in miss.
This is a lab so I can try anything else you need for diagnostic.

Disabling flow control only fixes the number of packets received by PHY,
but not the number of packets processed by steering.

> - Could you share mlnx_perf stats for SWS case as well?

      rx_vport_unicast_packets: 151,716,299
        rx_vport_unicast_bytes: 9,709,843,136 Bps    = 77,678.74 Mbps      
                rx_packets_phy: 151,716,517
                  rx_bytes_phy: 9,709,856,896 Bps    = 77,678.85 Mbps      
               rx_64_bytes_phy: 151,716,867 Bps      = 1,213.73 Mbps       
                rx_prio0_bytes: 9,710,051,648 Bps    = 77,680.41 Mbps      
              rx_prio0_packets: 151,719,564

> - If group 1 had a flow rule with empty match and RSS action, is the performance difference the same?
>   (This would help to understand if the problem is with miss behavior or with jump between group 0 and group 1).

Case "baseline"
===============
No flow rules, just to make sure the host can poll the NIC fast enough.
Result: 148 Mpps

/root/build/app/dpdk-testpmd -l 0-31,64-95 -a 21:00.0,dv_flow_en=1,mprq_en=1,rx_vec_en=1 --in-memory -- \
	-i --rxq=32 --txq=32 --forward-mode=rxonly --nb-cores=32

mlnx_perf -i enp33s0f0np0 -t 1

      rx_vport_unicast_packets: 151,622,123
        rx_vport_unicast_bytes: 9,703,815,872 Bps    = 77,630.52 Mbps      
                rx_packets_phy: 151,621,983
                  rx_bytes_phy: 9,703,807,872 Bps    = 77,630.46 Mbps      
               rx_64_bytes_phy: 151,621,026 Bps      = 1,212.96 Mbps       
                rx_prio0_bytes: 9,703,716,480 Bps    = 77,629.73 Mbps      
              rx_prio0_packets: 151,620,576

Attached: "neohost-cx6dx-baseline-sws.txt".

Case "jump_rss", SWS
====================
Jump to group 1, then RSS.
Result: 37 Mpps (?!)
This "37 Mpps" seems to be caused by PCIe bottleneck, which MPRQ is supposed to overcome.
Is MPRQ limited only to default RSS in SWS mode?

/root/build/app/dpdk-testpmd -l 0-31,64-95 -a 21:00.0,dv_flow_en=1,mprq_en=1,rx_vec_en=1 --in-memory -- \
	-i --rxq=32 --txq=32 --forward-mode=rxonly --nb-cores=32

flow create 0 ingress group 0 pattern end actions jump group 1 / end
flow create 0 ingress group 1 pattern end actions rss queues 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 end / end
#
start

mlnx_perf -i enp33s0f0np0 -t 1:

      rx_vport_unicast_packets: 38,155,359
        rx_vport_unicast_bytes: 2,441,942,976 Bps    = 19,535.54 Mbps      
                tx_packets_phy: 7,586
                rx_packets_phy: 151,531,694
                  tx_bytes_phy: 485,568 Bps          = 3.88 Mbps           
                  rx_bytes_phy: 9,698,029,248 Bps    = 77,584.23 Mbps      
            tx_mac_control_phy: 7,587
             tx_pause_ctrl_phy: 7,587
               rx_discards_phy: 113,376,265
               rx_64_bytes_phy: 151,531,748 Bps      = 1,212.25 Mbps       
    rx_buffer_passed_thres_phy: 203
                rx_prio0_bytes: 9,698,066,560 Bps    = 77,584.53 Mbps      
              rx_prio0_packets: 38,155,328
             rx_prio0_discards: 113,376,963
               tx_global_pause: 7,587
      tx_global_pause_duration: 1,018,266

Attached: "neohost-cx6dx-jump_rss-sws.txt".

Case "jump_rss", HWS
====================
Result: 148 Mpps

/root/build/app/dpdk-testpmd -l 0-31,64-95 -a 21:00.0,dv_flow_en=2,mprq_en=1,rx_vec_en=1 --in-memory -- \
	-i --rxq=32 --txq=32 --forward-mode=rxonly --nb-cores=32

port stop 0
flow configure 0 queues_number 1 queues_size 128 counters_number 16
port start 0
#
flow pattern_template 0 create pattern_template_id 1 ingress template end
flow actions_template 0 create ingress actions_template_id 1 template jump group 1 / end mask jump group 0xFFFFFFFF / end
flow template_table 0 create ingress group 0 table_id 1 pattern_template 1 actions_template 1 rules_number 1
flow queue 0 create 0 template_table 1 pattern_template 0 actions_template 0 postpone false pattern end actions jump group 1 / end
flow pull 0 queue 0
#
flow actions_template 0 create ingress actions_template_id 2 template rss / end mask rss / end
flow template_table 0 create ingress group 1 table_id 2 pattern_template 1 actions_template 2 rules_number 1
flow queue 0 create 0 template_table 2 pattern_template 0 actions_template 0 postpone false pattern end actions rss queues 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 end / end
flow pull 0 queue 0
#
start

mlnx_perf -i enp33s0f0np0 -t 1:

      rx_vport_unicast_packets: 151,514,131
        rx_vport_unicast_bytes: 9,696,904,384 Bps    = 77,575.23 Mbps      
                rx_packets_phy: 151,514,275
                  rx_bytes_phy: 9,696,913,600 Bps    = 77,575.30 Mbps      
               rx_64_bytes_phy: 151,514,122 Bps      = 1,212.11 Mbps       
                rx_prio0_bytes: 9,696,814,528 Bps    = 77,574.51 Mbps      
              rx_prio0_packets: 151,512,717

Attached: "neohost-cx6dx-jump_rss-hws.txt".

> - Would you be able to do the test with miss in empty group 1, with Ethernet Flow Control disabled?

$ ethtool -A enp33s0f0np0 rx off tx off

$ ethtool -a enp33s0f0np0
Pause parameters for enp33s0f0np0:
Autonegotiate:	off
RX:		off
TX:		off

testpmd> show port 0 flow_ctrl 

********************* Flow control infos for port 0  *********************
FC mode:
   Rx pause: off
   Tx pause: off
Autoneg: off
Pause time: 0x0
High waterline: 0x0
Low waterline: 0x0
Send XON: off
Forward MAC control frames: off


Case "jump_miss", SWS
=====================
Result: 148 Mpps

/root/build/app/dpdk-testpmd -l 0-31,64-95 -a 21:00.0,dv_flow_en=1,mprq_en=1,rx_vec_en=1 --in-memory -- \
	-i --rxq=32 --txq=32 --forward-mode=rxonly --nb-cores=32

flow create 0 ingress group 0 pattern end actions jump group 1 / end
start

mlnx_perf -i enp33s0f0np0

      rx_vport_unicast_packets: 151,526,489
        rx_vport_unicast_bytes: 9,697,695,296 Bps    = 77,581.56 Mbps      
                rx_packets_phy: 151,526,193
                  rx_bytes_phy: 9,697,676,672 Bps    = 77,581.41 Mbps      
               rx_64_bytes_phy: 151,525,423 Bps      = 1,212.20 Mbps       
                rx_prio0_bytes: 9,697,488,256 Bps    = 77,579.90 Mbps      
              rx_prio0_packets: 151,523,240

Attached: "neohost-cx6dx-jump_miss-sws.txt".


Case "jump_miss", HWS
=====================
Result: 107 Mpps
Neohost shows RX Packet Rate = 148 Mpps, but RX Steering Packets = 107 Mpps.

/root/build/app/dpdk-testpmd -l 0-31,64-95 -a 21:00.0,dv_flow_en=2,mprq_en=1,rx_vec_en=1 --in-memory -- \
	-i --rxq=32 --txq=32 --forward-mode=rxonly --nb-cores=32

port stop 0
flow configure 0 queues_number 1 queues_size 128 counters_number 16
port start 0
flow pattern_template 0 create pattern_template_id 1 ingress template end
flow actions_template 0 create ingress actions_template_id 1 template jump group 1 / end mask jump group 0xFFFFFFFF / end
flow template_table 0 create ingress group 0 table_id 1 pattern_template 1 actions_template 1 rules_number 1
flow queue 0 create 0 template_table 1 pattern_template 0 actions_template 0 postpone false pattern end actions jump group 1 / end
flow pull 0 queue 0

mlnx_perf -i enp33s0f0np0

       rx_steer_missed_packets: 109,463,466
      rx_vport_unicast_packets: 109,463,450
        rx_vport_unicast_bytes: 7,005,660,800 Bps    = 56,045.28 Mbps      
                rx_packets_phy: 151,518,062
                  rx_bytes_phy: 9,697,155,840 Bps    = 77,577.24 Mbps      
               rx_64_bytes_phy: 151,516,201 Bps      = 1,212.12 Mbps       
                rx_prio0_bytes: 9,697,137,280 Bps    = 77,577.9 Mbps       
              rx_prio0_packets: 151,517,782
          rx_prio0_buf_discard: 42,055,156

Attached: "neohost-cx6dx-jump_miss-hws.txt".

Case "jump_drop", SWS
=====================
Result: 148 Mpps
Match all in group 0, jump to group 1; match all in group 1, drop.

/root/build/app/dpdk-testpmd -l 0-31,64-95 -a 21:00.0,dv_flow_en=1,mprq_en=1,rx_vec_en=1 --in-memory -- \
	-i --rxq=32 --txq=32 --forward-mode=rxonly --nb-cores=32

flow create 0 ingress group 0 pattern end actions jump group 1 / end
flow create 0 ingress group 1 pattern end actions drop / end

mlnx_perf -i enp33s0f0np0

      rx_vport_unicast_packets: 151,705,269
        rx_vport_unicast_bytes: 9,709,137,216 Bps    = 77,673.9 Mbps       
                rx_packets_phy: 151,701,498
                  rx_bytes_phy: 9,708,896,128 Bps    = 77,671.16 Mbps      
               rx_64_bytes_phy: 151,693,532 Bps      = 1,213.54 Mbps       
                rx_prio0_bytes: 9,707,005,888 Bps    = 77,656.4 Mbps       
              rx_prio0_packets: 151,671,959

Attached: "neohost-cx6dx-jump_drop-sws.txt".


Case "jump_drop", HWS
=====================
Result: 107 Mpps
Match all in group 0, jump to group 1; match all in group 1, drop.
I've also run this test with a counter attached to the dropping table,
and it showed that indeed only 107 Mpps hit the rule.

/root/build/app/dpdk-testpmd -l 0-31,64-95 -a 21:00.0,dv_flow_en=2,mprq_en=1,rx_vec_en=1 --in-memory -- \
	-i --rxq=32 --txq=32 --forward-mode=rxonly --nb-cores=32

port stop 0
flow configure 0 queues_number 1 queues_size 128 counters_number 16
port start 0
flow pattern_template 0 create pattern_template_id 1 ingress template end
flow actions_template 0 create ingress actions_template_id 1 template jump group 1 / end mask jump group 0xFFFFFFFF / end
flow template_table 0 create ingress group 0 table_id 1 pattern_template 1 actions_template 1 rules_number 1
flow queue 0 create 0 template_table 1 pattern_template 0 actions_template 0 postpone false pattern end actions jump group 1 / end
flow pull 0 queue 0
#
flow actions_template 0 create ingress actions_template_id 2 template drop / end mask drop / end
flow template_table 0 create ingress group 1 table_id 2 pattern_template 1 actions_template 2 rules_number 1
flow queue 0 create 0 template_table 2 pattern_template 0 actions_template 0 postpone false pattern end actions drop / end
flow pull 0 queue 0

mlnx_perf -i enp33s0f0np0

      rx_vport_unicast_packets: 109,500,637
        rx_vport_unicast_bytes: 7,008,040,768 Bps    = 56,064.32 Mbps      
                rx_packets_phy: 151,568,915
                  rx_bytes_phy: 9,700,410,560 Bps    = 77,603.28 Mbps      
               rx_64_bytes_phy: 151,569,146 Bps      = 1,212.55 Mbps       
                rx_prio0_bytes: 9,699,889,216 Bps    = 77,599.11 Mbps      
              rx_prio0_packets: 151,560,756
          rx_prio0_buf_discard: 42,065,705

Attached: "neohost-cx6dx-jump_drop-hws.txt".
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: neohost-cx6dx-baseline-sws.txt
URL: <http://mails.dpdk.org/archives/users/attachments/20240613/1b133442/attachment-0007.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: neohost-cx6dx-jump_drop-hws.txt
URL: <http://mails.dpdk.org/archives/users/attachments/20240613/1b133442/attachment-0008.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: neohost-cx6dx-jump_drop-sws.txt
URL: <http://mails.dpdk.org/archives/users/attachments/20240613/1b133442/attachment-0009.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: neohost-cx6dx-jump_miss-hws.txt
URL: <http://mails.dpdk.org/archives/users/attachments/20240613/1b133442/attachment-0010.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: neohost-cx6dx-jump_miss-sws.txt
URL: <http://mails.dpdk.org/archives/users/attachments/20240613/1b133442/attachment-0011.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: neohost-cx6dx-jump_rss-hws.txt
URL: <http://mails.dpdk.org/archives/users/attachments/20240613/1b133442/attachment-0012.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: neohost-cx6dx-jump_rss-sws.txt
URL: <http://mails.dpdk.org/archives/users/attachments/20240613/1b133442/attachment-0013.txt>


More information about the users mailing list