[net/mlx5] Performance drop with HWS compared to SWS

Dariusz Sosnowski dsosnowski at nvidia.com
Thu Jun 13 17:06:53 CEST 2024
Previous message (by thread): [net/mlx5] Performance drop with HWS compared to SWS
Next message (by thread): [net/mlx5] Performance drop with HWS compared to SWS
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi,

> -----Original Message-----
> From: Dmitry Kozlyuk <dmitry.kozliuk at gmail.com>
> Sent: Thursday, June 13, 2024 11:02
> To: users at dpdk.org
> Subject: [net/mlx5] Performance drop with HWS compared to SWS
> 
> Hello,
> 
> We're observing an abrupt performance drop from 148 to 107 Mpps @ 64B
> packets apparently caused by any rule that jumps out of ingress group 0 when
> using HWS (async API) instead of SWS (sync API).
> Is it some known issue or temporary limitation?

This is not an expected behavior. It's expected that performance will be the same. 
Thank you for reporting that and for neohost dumps.

I have a few questions:

- Could you share mlnx_perf stats for SWS case as well?
- If group 1 had a flow rule with empty match and RSS action, is the performance difference the same?
  (This would help to understand if the problem is with miss behavior or with jump between group 0 and group 1).
- Would you be able to do the test with miss in empty group 1, with Ethernet Flow Control disabled?

> NIC: ConnectX-6 Dx EN adapter card; 100GbE; Dual-port QSFP56; PCIe 4.0/3.0
> x16;
> FW: 22.40.1000
> OFED: MLNX_OFED_LINUX-24.01-0.3.3.1
> DPDK: v24.03-23-g76cef1af8b
> TG is custom, traffic is Ethernet / VLAN / IPv4 / TCP SYN @ 148 Mpps.
> 
> Examples below do only the jump and miss all packets in group 1, but the same is
> observed when dropping all the packets in group 1.
> 
> Software steering:
> 
> /root/build/app/dpdk-testpmd -a 21:00.0,dv_flow_en=1 -- -i --rxq=1 --txq=1
> 
> flow create 0 ingress group 0 pattern end actions jump group 1 / end
> 
> Neohost (from OFED 5.7):
> 
> ||=====================================================================
> =
> ||=====
> |||                               Packet Rate                               ||
> ||----------------------------------------------------------------------
> ||-----
> ||| RX Packet Rate                      || 148,813,590   [Packets/Seconds]  ||
> ||| TX Packet Rate                      || 0             [Packets/Seconds]  ||
> ||=====================================================================
> =
> ||=====
> |||                                 eSwitch                                 ||
> ||----------------------------------------------------------------------
> ||-----
> ||| RX Hops Per Packet                  || 3.075         [Hops/Packet]      ||
> ||| RX Optimal Hops Per Packet Per Pipe || 1.5375        [Hops/Packet]      ||
> ||| RX Optimal Packet Rate Bottleneck   || 279.6695      [MPPS]             ||
> ||| RX Packet Rate Bottleneck           || 262.2723      [MPPS]             ||
> 
> (Full Neohost output is attached.)
> 
> Hardware steering:
> 
> /root/build/app/dpdk-testpmd -a 21:00.0,dv_flow_en=2 -- -i --rxq=1 --txq=1
> 
> port stop 0
> flow configure 0 queues_number 1 queues_size 128 counters_number 16 port
> start 0 flow pattern_template 0 create pattern_template_id 1 ingress template
> end flow actions_template 0 create ingress actions_template_id 1 template jump
> group 1 / end mask jump group 0xFFFFFFFF / end flow template_table 0 create
> ingress group 0 table_id 1 pattern_template 1 actions_template 1 rules_number
> 1 flow queue 0 create 0 template_table 1 pattern_template 0 actions_template 0
> postpone false pattern end actions jump group 1 / end flow pull 0 queue 0
> 
> Neohost:
> 
> ||=====================================================================
> =
> ||=====
> |||                               Packet Rate                               ||
> ||----------------------------------------------------------------------
> ||-----
> ||| RX Packet Rate                      || 107,498,115   [Packets/Seconds]  ||
> ||| TX Packet Rate                      || 0             [Packets/Seconds]  ||
> ||=====================================================================
> =
> ||=====
> |||                                 eSwitch                                 ||
> ||----------------------------------------------------------------------
> ||-----
> ||| RX Hops Per Packet                  || 4.5503        [Hops/Packet]      ||
> ||| RX Optimal Hops Per Packet Per Pipe || 2.2751        [Hops/Packet]      ||
> ||| RX Optimal Packet Rate Bottleneck   || 188.9994      [MPPS]             ||
> ||| RX Packet Rate Bottleneck           || 182.5796      [MPPS]             ||
> 
> AFAIU, performance is not constrained by the complexity of the rules.
> 
> mlnx_perf -i enp33s0f0np0 -t 1:
> 
>        rx_steer_missed_packets: 108,743,272
>       rx_vport_unicast_packets: 108,743,424
>         rx_vport_unicast_bytes: 6,959,579,136 Bps    = 55,676.63 Mbps
>                 tx_packets_phy: 7,537
>                 rx_packets_phy: 150,538,251
>                   tx_bytes_phy: 482,368 Bps          = 3.85 Mbps
>                   rx_bytes_phy: 9,634,448,128 Bps    = 77,075.58 Mbps
>             tx_mac_control_phy: 7,536
>              tx_pause_ctrl_phy: 7,536
>                rx_discards_phy: 41,794,740
>                rx_64_bytes_phy: 150,538,352 Bps      = 1,204.30 Mbps
>     rx_buffer_passed_thres_phy: 202
>                 rx_prio0_bytes: 9,634,520,256 Bps    = 77,076.16 Mbps
>               rx_prio0_packets: 108,744,322
>              rx_prio0_discards: 41,795,050
>                tx_global_pause: 7,537
>       tx_global_pause_duration: 1,011,592
> 
> "rx_discards_phy" is described as follows [1]:
> 
>     The number of received packets dropped due to lack of buffers on a
>     physical port. If this counter is increasing, it implies that the adapter
>     is congested and cannot absorb the traffic coming from the network.
> 
> However, the adapter certainly *is* able to process 148 Mpps, since it does so
> with SWS and it can deliver this much to SW (with MPRQ).
> 
> [1]:
> https://www.kernel.org/doc/Documentation/networking/device_drivers/etherne
> t/mellanox/mlx5/counters.rst

Best regards,
Dariusz Sosnowski
Previous message (by thread): [net/mlx5] Performance drop with HWS compared to SWS
Next message (by thread): [net/mlx5] Performance drop with HWS compared to SWS
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the users mailing list