[net/mlx5] Performance drop with HWS compared to SWS

Dmitry Kozlyuk dmitry.kozliuk at gmail.com
Thu Jun 13 11:01:45 CEST 2024


Hello,

We're observing an abrupt performance drop from 148 to 107 Mpps @ 64B packets
apparently caused by any rule that jumps out of ingress group 0
when using HWS (async API) instead of SWS (sync API).
Is it some known issue or temporary limitation?

NIC: ConnectX-6 Dx EN adapter card; 100GbE; Dual-port QSFP56; PCIe 4.0/3.0 x16;
FW: 22.40.1000
OFED: MLNX_OFED_LINUX-24.01-0.3.3.1
DPDK: v24.03-23-g76cef1af8b
TG is custom, traffic is Ethernet / VLAN / IPv4 / TCP SYN @ 148 Mpps.

Examples below do only the jump and miss all packets in group 1,
but the same is observed when dropping all the packets in group 1.

Software steering:

/root/build/app/dpdk-testpmd -a 21:00.0,dv_flow_en=1 -- -i --rxq=1 --txq=1

flow create 0 ingress group 0 pattern end actions jump group 1 / end

Neohost (from OFED 5.7):

||===========================================================================
|||                               Packet Rate                               ||
||---------------------------------------------------------------------------
||| RX Packet Rate                      || 148,813,590   [Packets/Seconds]  ||
||| TX Packet Rate                      || 0             [Packets/Seconds]  ||
||===========================================================================
|||                                 eSwitch                                 ||
||---------------------------------------------------------------------------
||| RX Hops Per Packet                  || 3.075         [Hops/Packet]      ||
||| RX Optimal Hops Per Packet Per Pipe || 1.5375        [Hops/Packet]      ||
||| RX Optimal Packet Rate Bottleneck   || 279.6695      [MPPS]             ||
||| RX Packet Rate Bottleneck           || 262.2723      [MPPS]             ||

(Full Neohost output is attached.)

Hardware steering:

/root/build/app/dpdk-testpmd -a 21:00.0,dv_flow_en=2 -- -i --rxq=1 --txq=1

port stop 0
flow configure 0 queues_number 1 queues_size 128 counters_number 16
port start 0
flow pattern_template 0 create pattern_template_id 1 ingress template end
flow actions_template 0 create ingress actions_template_id 1 template jump group 1 / end mask jump group 0xFFFFFFFF / end
flow template_table 0 create ingress group 0 table_id 1 pattern_template 1 actions_template 1 rules_number 1
flow queue 0 create 0 template_table 1 pattern_template 0 actions_template 0 postpone false pattern end actions jump group 1 / end
flow pull 0 queue 0

Neohost:

||===========================================================================
|||                               Packet Rate                               ||
||---------------------------------------------------------------------------
||| RX Packet Rate                      || 107,498,115   [Packets/Seconds]  ||
||| TX Packet Rate                      || 0             [Packets/Seconds]  ||
||===========================================================================
|||                                 eSwitch                                 ||
||---------------------------------------------------------------------------
||| RX Hops Per Packet                  || 4.5503        [Hops/Packet]      ||
||| RX Optimal Hops Per Packet Per Pipe || 2.2751        [Hops/Packet]      ||
||| RX Optimal Packet Rate Bottleneck   || 188.9994      [MPPS]             ||
||| RX Packet Rate Bottleneck           || 182.5796      [MPPS]             ||

AFAIU, performance is not constrained by the complexity of the rules.

mlnx_perf -i enp33s0f0np0 -t 1:

       rx_steer_missed_packets: 108,743,272
      rx_vport_unicast_packets: 108,743,424
        rx_vport_unicast_bytes: 6,959,579,136 Bps    = 55,676.63 Mbps      
                tx_packets_phy: 7,537
                rx_packets_phy: 150,538,251
                  tx_bytes_phy: 482,368 Bps          = 3.85 Mbps           
                  rx_bytes_phy: 9,634,448,128 Bps    = 77,075.58 Mbps      
            tx_mac_control_phy: 7,536
             tx_pause_ctrl_phy: 7,536
               rx_discards_phy: 41,794,740
               rx_64_bytes_phy: 150,538,352 Bps      = 1,204.30 Mbps       
    rx_buffer_passed_thres_phy: 202
                rx_prio0_bytes: 9,634,520,256 Bps    = 77,076.16 Mbps      
              rx_prio0_packets: 108,744,322
             rx_prio0_discards: 41,795,050
               tx_global_pause: 7,537
      tx_global_pause_duration: 1,011,592

"rx_discards_phy" is described as follows [1]:

    The number of received packets dropped due to lack of buffers on a
    physical port. If this counter is increasing, it implies that the adapter
    is congested and cannot absorb the traffic coming from the network.

However, the adapter certainly *is* able to process 148 Mpps,
since it does so with SWS and it can deliver this much to SW (with MPRQ).

[1]: https://www.kernel.org/doc/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: neohost-cx6dx-jump-hws.txt
URL: <http://mails.dpdk.org/archives/users/attachments/20240613/405b5d1c/attachment-0002.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: neohost-cx6dx-jump-sws.txt
URL: <http://mails.dpdk.org/archives/users/attachments/20240613/405b5d1c/attachment-0003.txt>


More information about the users mailing list