[net/mlx5] Performance drop with HWS compared to SWS
Dmitry Kozlyuk
dmitry.kozliuk at gmail.com
Thu Jun 13 22:14:48 CEST 2024
Hi Dariusz,
Thank you for looking into the issue, please find full details below.
Summary:
Case SWS (Mpps) HWS (Mpps)
-------- ---------- ----------
baseline 148 -
jump_rss 37 148
jump_miss 148 107
jump_drop 148 107
From "baseline" vs "jump_rss", the problem is not in jump.
From "jump_miss" vs "jump_drop", the problem is not only in miss.
This is a lab so I can try anything else you need for diagnostic.
Disabling flow control only fixes the number of packets received by PHY,
but not the number of packets processed by steering.
> - Could you share mlnx_perf stats for SWS case as well?
rx_vport_unicast_packets: 151,716,299
rx_vport_unicast_bytes: 9,709,843,136 Bps = 77,678.74 Mbps
rx_packets_phy: 151,716,517
rx_bytes_phy: 9,709,856,896 Bps = 77,678.85 Mbps
rx_64_bytes_phy: 151,716,867 Bps = 1,213.73 Mbps
rx_prio0_bytes: 9,710,051,648 Bps = 77,680.41 Mbps
rx_prio0_packets: 151,719,564
> - If group 1 had a flow rule with empty match and RSS action, is the performance difference the same?
> (This would help to understand if the problem is with miss behavior or with jump between group 0 and group 1).
Case "baseline"
===============
No flow rules, just to make sure the host can poll the NIC fast enough.
Result: 148 Mpps
/root/build/app/dpdk-testpmd -l 0-31,64-95 -a 21:00.0,dv_flow_en=1,mprq_en=1,rx_vec_en=1 --in-memory -- \
-i --rxq=32 --txq=32 --forward-mode=rxonly --nb-cores=32
mlnx_perf -i enp33s0f0np0 -t 1
rx_vport_unicast_packets: 151,622,123
rx_vport_unicast_bytes: 9,703,815,872 Bps = 77,630.52 Mbps
rx_packets_phy: 151,621,983
rx_bytes_phy: 9,703,807,872 Bps = 77,630.46 Mbps
rx_64_bytes_phy: 151,621,026 Bps = 1,212.96 Mbps
rx_prio0_bytes: 9,703,716,480 Bps = 77,629.73 Mbps
rx_prio0_packets: 151,620,576
Attached: "neohost-cx6dx-baseline-sws.txt".
Case "jump_rss", SWS
====================
Jump to group 1, then RSS.
Result: 37 Mpps (?!)
This "37 Mpps" seems to be caused by PCIe bottleneck, which MPRQ is supposed to overcome.
Is MPRQ limited only to default RSS in SWS mode?
/root/build/app/dpdk-testpmd -l 0-31,64-95 -a 21:00.0,dv_flow_en=1,mprq_en=1,rx_vec_en=1 --in-memory -- \
-i --rxq=32 --txq=32 --forward-mode=rxonly --nb-cores=32
flow create 0 ingress group 0 pattern end actions jump group 1 / end
flow create 0 ingress group 1 pattern end actions rss queues 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 end / end
#
start
mlnx_perf -i enp33s0f0np0 -t 1:
rx_vport_unicast_packets: 38,155,359
rx_vport_unicast_bytes: 2,441,942,976 Bps = 19,535.54 Mbps
tx_packets_phy: 7,586
rx_packets_phy: 151,531,694
tx_bytes_phy: 485,568 Bps = 3.88 Mbps
rx_bytes_phy: 9,698,029,248 Bps = 77,584.23 Mbps
tx_mac_control_phy: 7,587
tx_pause_ctrl_phy: 7,587
rx_discards_phy: 113,376,265
rx_64_bytes_phy: 151,531,748 Bps = 1,212.25 Mbps
rx_buffer_passed_thres_phy: 203
rx_prio0_bytes: 9,698,066,560 Bps = 77,584.53 Mbps
rx_prio0_packets: 38,155,328
rx_prio0_discards: 113,376,963
tx_global_pause: 7,587
tx_global_pause_duration: 1,018,266
Attached: "neohost-cx6dx-jump_rss-sws.txt".
Case "jump_rss", HWS
====================
Result: 148 Mpps
/root/build/app/dpdk-testpmd -l 0-31,64-95 -a 21:00.0,dv_flow_en=2,mprq_en=1,rx_vec_en=1 --in-memory -- \
-i --rxq=32 --txq=32 --forward-mode=rxonly --nb-cores=32
port stop 0
flow configure 0 queues_number 1 queues_size 128 counters_number 16
port start 0
#
flow pattern_template 0 create pattern_template_id 1 ingress template end
flow actions_template 0 create ingress actions_template_id 1 template jump group 1 / end mask jump group 0xFFFFFFFF / end
flow template_table 0 create ingress group 0 table_id 1 pattern_template 1 actions_template 1 rules_number 1
flow queue 0 create 0 template_table 1 pattern_template 0 actions_template 0 postpone false pattern end actions jump group 1 / end
flow pull 0 queue 0
#
flow actions_template 0 create ingress actions_template_id 2 template rss / end mask rss / end
flow template_table 0 create ingress group 1 table_id 2 pattern_template 1 actions_template 2 rules_number 1
flow queue 0 create 0 template_table 2 pattern_template 0 actions_template 0 postpone false pattern end actions rss queues 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 end / end
flow pull 0 queue 0
#
start
mlnx_perf -i enp33s0f0np0 -t 1:
rx_vport_unicast_packets: 151,514,131
rx_vport_unicast_bytes: 9,696,904,384 Bps = 77,575.23 Mbps
rx_packets_phy: 151,514,275
rx_bytes_phy: 9,696,913,600 Bps = 77,575.30 Mbps
rx_64_bytes_phy: 151,514,122 Bps = 1,212.11 Mbps
rx_prio0_bytes: 9,696,814,528 Bps = 77,574.51 Mbps
rx_prio0_packets: 151,512,717
Attached: "neohost-cx6dx-jump_rss-hws.txt".
> - Would you be able to do the test with miss in empty group 1, with Ethernet Flow Control disabled?
$ ethtool -A enp33s0f0np0 rx off tx off
$ ethtool -a enp33s0f0np0
Pause parameters for enp33s0f0np0:
Autonegotiate: off
RX: off
TX: off
testpmd> show port 0 flow_ctrl
********************* Flow control infos for port 0 *********************
FC mode:
Rx pause: off
Tx pause: off
Autoneg: off
Pause time: 0x0
High waterline: 0x0
Low waterline: 0x0
Send XON: off
Forward MAC control frames: off
Case "jump_miss", SWS
=====================
Result: 148 Mpps
/root/build/app/dpdk-testpmd -l 0-31,64-95 -a 21:00.0,dv_flow_en=1,mprq_en=1,rx_vec_en=1 --in-memory -- \
-i --rxq=32 --txq=32 --forward-mode=rxonly --nb-cores=32
flow create 0 ingress group 0 pattern end actions jump group 1 / end
start
mlnx_perf -i enp33s0f0np0
rx_vport_unicast_packets: 151,526,489
rx_vport_unicast_bytes: 9,697,695,296 Bps = 77,581.56 Mbps
rx_packets_phy: 151,526,193
rx_bytes_phy: 9,697,676,672 Bps = 77,581.41 Mbps
rx_64_bytes_phy: 151,525,423 Bps = 1,212.20 Mbps
rx_prio0_bytes: 9,697,488,256 Bps = 77,579.90 Mbps
rx_prio0_packets: 151,523,240
Attached: "neohost-cx6dx-jump_miss-sws.txt".
Case "jump_miss", HWS
=====================
Result: 107 Mpps
Neohost shows RX Packet Rate = 148 Mpps, but RX Steering Packets = 107 Mpps.
/root/build/app/dpdk-testpmd -l 0-31,64-95 -a 21:00.0,dv_flow_en=2,mprq_en=1,rx_vec_en=1 --in-memory -- \
-i --rxq=32 --txq=32 --forward-mode=rxonly --nb-cores=32
port stop 0
flow configure 0 queues_number 1 queues_size 128 counters_number 16
port start 0
flow pattern_template 0 create pattern_template_id 1 ingress template end
flow actions_template 0 create ingress actions_template_id 1 template jump group 1 / end mask jump group 0xFFFFFFFF / end
flow template_table 0 create ingress group 0 table_id 1 pattern_template 1 actions_template 1 rules_number 1
flow queue 0 create 0 template_table 1 pattern_template 0 actions_template 0 postpone false pattern end actions jump group 1 / end
flow pull 0 queue 0
mlnx_perf -i enp33s0f0np0
rx_steer_missed_packets: 109,463,466
rx_vport_unicast_packets: 109,463,450
rx_vport_unicast_bytes: 7,005,660,800 Bps = 56,045.28 Mbps
rx_packets_phy: 151,518,062
rx_bytes_phy: 9,697,155,840 Bps = 77,577.24 Mbps
rx_64_bytes_phy: 151,516,201 Bps = 1,212.12 Mbps
rx_prio0_bytes: 9,697,137,280 Bps = 77,577.9 Mbps
rx_prio0_packets: 151,517,782
rx_prio0_buf_discard: 42,055,156
Attached: "neohost-cx6dx-jump_miss-hws.txt".
Case "jump_drop", SWS
=====================
Result: 148 Mpps
Match all in group 0, jump to group 1; match all in group 1, drop.
/root/build/app/dpdk-testpmd -l 0-31,64-95 -a 21:00.0,dv_flow_en=1,mprq_en=1,rx_vec_en=1 --in-memory -- \
-i --rxq=32 --txq=32 --forward-mode=rxonly --nb-cores=32
flow create 0 ingress group 0 pattern end actions jump group 1 / end
flow create 0 ingress group 1 pattern end actions drop / end
mlnx_perf -i enp33s0f0np0
rx_vport_unicast_packets: 151,705,269
rx_vport_unicast_bytes: 9,709,137,216 Bps = 77,673.9 Mbps
rx_packets_phy: 151,701,498
rx_bytes_phy: 9,708,896,128 Bps = 77,671.16 Mbps
rx_64_bytes_phy: 151,693,532 Bps = 1,213.54 Mbps
rx_prio0_bytes: 9,707,005,888 Bps = 77,656.4 Mbps
rx_prio0_packets: 151,671,959
Attached: "neohost-cx6dx-jump_drop-sws.txt".
Case "jump_drop", HWS
=====================
Result: 107 Mpps
Match all in group 0, jump to group 1; match all in group 1, drop.
I've also run this test with a counter attached to the dropping table,
and it showed that indeed only 107 Mpps hit the rule.
/root/build/app/dpdk-testpmd -l 0-31,64-95 -a 21:00.0,dv_flow_en=2,mprq_en=1,rx_vec_en=1 --in-memory -- \
-i --rxq=32 --txq=32 --forward-mode=rxonly --nb-cores=32
port stop 0
flow configure 0 queues_number 1 queues_size 128 counters_number 16
port start 0
flow pattern_template 0 create pattern_template_id 1 ingress template end
flow actions_template 0 create ingress actions_template_id 1 template jump group 1 / end mask jump group 0xFFFFFFFF / end
flow template_table 0 create ingress group 0 table_id 1 pattern_template 1 actions_template 1 rules_number 1
flow queue 0 create 0 template_table 1 pattern_template 0 actions_template 0 postpone false pattern end actions jump group 1 / end
flow pull 0 queue 0
#
flow actions_template 0 create ingress actions_template_id 2 template drop / end mask drop / end
flow template_table 0 create ingress group 1 table_id 2 pattern_template 1 actions_template 2 rules_number 1
flow queue 0 create 0 template_table 2 pattern_template 0 actions_template 0 postpone false pattern end actions drop / end
flow pull 0 queue 0
mlnx_perf -i enp33s0f0np0
rx_vport_unicast_packets: 109,500,637
rx_vport_unicast_bytes: 7,008,040,768 Bps = 56,064.32 Mbps
rx_packets_phy: 151,568,915
rx_bytes_phy: 9,700,410,560 Bps = 77,603.28 Mbps
rx_64_bytes_phy: 151,569,146 Bps = 1,212.55 Mbps
rx_prio0_bytes: 9,699,889,216 Bps = 77,599.11 Mbps
rx_prio0_packets: 151,560,756
rx_prio0_buf_discard: 42,065,705
Attached: "neohost-cx6dx-jump_drop-hws.txt".
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: neohost-cx6dx-baseline-sws.txt
URL: <http://mails.dpdk.org/archives/users/attachments/20240613/1b133442/attachment-0007.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: neohost-cx6dx-jump_drop-hws.txt
URL: <http://mails.dpdk.org/archives/users/attachments/20240613/1b133442/attachment-0008.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: neohost-cx6dx-jump_drop-sws.txt
URL: <http://mails.dpdk.org/archives/users/attachments/20240613/1b133442/attachment-0009.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: neohost-cx6dx-jump_miss-hws.txt
URL: <http://mails.dpdk.org/archives/users/attachments/20240613/1b133442/attachment-0010.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: neohost-cx6dx-jump_miss-sws.txt
URL: <http://mails.dpdk.org/archives/users/attachments/20240613/1b133442/attachment-0011.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: neohost-cx6dx-jump_rss-hws.txt
URL: <http://mails.dpdk.org/archives/users/attachments/20240613/1b133442/attachment-0012.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: neohost-cx6dx-jump_rss-sws.txt
URL: <http://mails.dpdk.org/archives/users/attachments/20240613/1b133442/attachment-0013.txt>
More information about the users
mailing list