mlx5: Packet drops without RX queue overflow?

Sangjin Han Sangjin.Han at spacex.com
Tue Aug 23 22:54:08 CEST 2022


Hi All,

I am seeing RX packet drops on a ConnectX-5 100G NIC. Even at a low input rate, e.g., 2Mpps and 20Gbps, about 0.1% of packets are being dropped. The setup is as follows:

* DPDK 21.08
* one mlx5 VF with SR-IOV (VLAN)
* a DPDK application running on 16 cores.
* 16 RX queues, with 2,048 descriptors for each
* tried both scalar / vector versions of RX burst functions.

I observe that packets are being lost, from reading the rx_discards_phy counter with ethtool for the PF. No relevant counter is found for the VF.

I checked the usual suspect -- CPU not being fast enough to drain the RX queues, causing overflow -- but it doesn't seem to be the case. I checked the queue occupancy with rte_eth_rx_queue_count() before every rte_eth_rx_burst(), but it mostly stays below 10% and never exceeded 20% while packets being dropped. Given that it loses packets without RX queue overflow, its behavior resembles flow control based on watermarks or Random Early Drop.

Does anyone have any suggestions that I can try? I tried various tuning parameters via ethtool, latest NIC firmware, etc., and the problem still exists.

Thanks,
Sangjin



More information about the users mailing list