[dpdk-users] Mlx4/5 : Packets lost between phy and good counters

Shahaf Shuler shahafs at mellanox.com
Sun Jul 22 07:14:05 CEST 2018


Hi Tom,

Wednesday, July 18, 2018 6:41 PM, tom.barbette at uliege.be:
> Cc: katsikas at kth.se
> Subject: [dpdk-users] Mlx4/5 : Packets lost between phy and good counters
> 
> Hi all,
> 
> During a simple forwarding experiment using mlx4 (but we observed the
> same with mlx5) 100G NICs, we have a sender reporting more TX throughput
> than what the receiver is receiving, but the receiver does not report any
> packet loss... They are connected by a simple QSFP28 direct attach cable. So
> where did the packet disappear?
> 
> The only thing we could find is that rx_good_packets in xstats is lower than
> rx_packets_phy. rx_packets_phy is in line with what the sender is reporting,
> so I guess some of the "phy" are not "good". But no error counter, missed,
> mbuf_alloc, ... is giving as a clue why those packets are not "good".
> 
> We tried with real traces and UDP crafted packets of various size, same
> problem.
> 
> Any idea ?

Yes, what you are experiencing is a packet drop due to backpressure from the device.

The rx_good_packets are the good packets (w/o errors) received by the port (can be either PF or VF). 
The rx_packets_phy are the packets received by the physical port (this is the aggregation of the PF and all of the VFs).
A gap between those means some packet has been lost, or as you said received w/ errors.  We are indeed missing one counter here which is the rx_discard_phy which counts
The number of received packets dropped due to lack of buffers on a physical port. This work is in progress. 

There is another way to query this counter (and many others) for Mellanox devices by using linux ethtool: "ethtool -S <ifname>" (Mellanox devices keep their kernel module). 
The statistics in DPDK are shadow of the ethtool ones. You can read more about those counters in the community doc[1].
w/ the ethtool statistics look for the discard counter and validate if it is increasing. 

Assuming it does, we need to understand why you have such backpressure.
Things to check:
1. is the PCI slot for your mlx5 device is indeed by 16x? 
2. are you using scatter mode w/ large max_rx_pkt_len? 
3. have you followed the mlx5 performance tunning guide[2]?  


> 
> Below the detail stats of the receiver (which is a forwarder but it is not of
> importance in this context) :
> 
> stats.count:
> 31986429
> stats.missed:
> 0
> stats.error:
> 0
> fd0.xstats:
> rx_good_packets[0] = 31986429
> tx_good_packets[1] = 31986429
> rx_good_bytes[2] = 47979639204
> tx_good_bytes[3] = 47851693488
> rx_missed_errors[4] = 0
> rx_errors[5] = 0
> tx_errors[6] = 0
> rx_mbuf_allocation_errors[7] = 0
> rx_q0packets[8] = 4000025
> rx_q0bytes[9] = 6000036068
> rx_q0errors[10] = 0
> rx_q1packets[11] = 4002151
> rx_q1bytes[12] = 6003226500
> rx_q1errors[13] = 0
> rx_q2packets[14] = 3996758
> rx_q2bytes[15] = 5995137000
> rx_q2errors[16] = 0
> rx_q3packets[17] = 3993614
> rx_q3bytes[18] = 5990421000
> rx_q3errors[19] = 0
> rx_q4packets[20] = 3995758
> rx_q4bytes[21] = 5993637000
> rx_q4errors[22] = 0
> rx_q5packets[23] = 3992126
> rx_q5bytes[24] = 5988189000
> rx_q5errors[25] = 0
> rx_q6packets[26] = 4007488
> rx_q6bytes[27] = 6011230568
> rx_q6errors[28] = 0
> rx_q7packets[29] = 3998509
> rx_q7bytes[30] = 5997762068
> rx_q7errors[31] = 0
> tx_q0packets[32] = 4000025
> tx_q0bytes[33] = 5984035968
> tx_q1packets[34] = 4002151
> tx_q1bytes[35] = 5987217896
> tx_q2packets[36] = 3996758
> tx_q2bytes[37] = 5979149968
> tx_q3packets[38] = 3993614
> tx_q3bytes[39] = 5974446544
> tx_q4packets[40] = 3995758
> tx_q4bytes[41] = 5977653968
> tx_q5packets[42] = 3992126
> tx_q5bytes[43] = 5972220496
> tx_q6packets[44] = 4007488
> tx_q6bytes[45] = 5995200616
> tx_q7packets[46] = 3998509
> tx_q7bytes[47] = 5981768032
> rx_port_unicast_bytes[48] = 47851693488
> rx_port_multicast_bytes[49] = 0
> rx_port_broadcast_bytes[50] = 0
> rx_port_unicast_packets[51] = 31986429
> rx_port_multicast_packets[52] = 0
> rx_port_broadcast_packets[53] = 0
> tx_port_unicast_bytes[54] = 47851693488
> tx_port_multicast_bytes[55] = 0
> tx_port_broadcast_bytes[56] = 0
> tx_port_unicast_packets[57] = 31986429
> tx_port_multicast_packets[58] = 0
> tx_port_broadcast_packets[59] = 0
> rx_wqe_err[60] = 0
> rx_crc_errors_phy[61] = 0
> rx_in_range_len_errors_phy[62] = 0
> rx_symbol_err_phy[63] = 0
> tx_errors_phy[64] = 0
> rx_out_of_buffer[65] = 0
> tx_packets_phy[66] = 31986429
> rx_packets_phy[67] = 36243270
> tx_bytes_phy[68] = 47979639204
> rx_bytes_phy[69] = 54364900704
> 
> 
> Thanks,
> Tom

[1] https://community.mellanox.com/docs/DOC-2532
[2] https://doc.dpdk.org/guides/nics/mlx5.html





More information about the users mailing list