[dpdk-users] Mlx4/5 : Packets lost between phy and good counters

tom.barbette at uliege.be tom.barbette at uliege.be
Mon Jul 23 13:14:02 CEST 2018


Hi Shahaf,

Thank you for the help !

I did not notice ethtool showed more stats, indeed it would be great to have them in DPDK. As you suggested, rx_discards_phy is increasing so packets are dropped there.

However, it is not due to a lack of buffer (if you meant queues/ring buffer as opposed to some mellanox internals) as the CPU is starving for work on any queues. We also ensured the CPU was not the problem by 1) using more CPU cores, 2) introducing on-purpose instructions and cache misses on the CPU, that did not lead to any performance loss.

1) Both cards on both machines are on a PCIe Gen 3 x 16 and acknowledged both by lspci and Mlx5 driver as it.
2) Disabling/enabling scatter mode in ethtool does not change performances, but I don't think we're using it anyway (we do nothing special in DPDK for this ? Packets are always one segment)
3) We followed the performance guide(s) among other things, with the exception of CQE_COMPRESSION as we didn't find any "mst" reference.

We noticed that when using only one side of a port, that is one machine only doing TX, and the other doing RX (discarding packets, but still rewriting them), we do send/receive 100G (the numbers discussed before lead to a ~80G "bouncing" throughput cap).

This is still true with Connect-X 4 or 5, and with different (Intel) machines with different motherboards. Maybe the mlx5 perform slightly better (bouncing 84G) but there is still this cap, and it may be due to other parameters.

Interestingly, we found that this cap is somehow dependent on the card and not the port, as if we use the two ports of the PCIe card, forwarding from A to B and B to A at full speed, the throughput goes down to ~40G per port (so 80G total forwarding throughput), but if we use two different PCI express card, it is back to ~80G per side, so ~160G forwarding rate total (also leading to the conclusion that our problem is not CPU-based as with more PCIe card we have better perfs).

Thanks,


Tom

----- Mail original -----
> De: "Shahaf Shuler" <shahafs at mellanox.com>
> À: "tom barbette" <tom.barbette at uliege.be>, users at dpdk.org
> Cc: katsikas at kth.se, "Erez Ferber" <erezf at mellanox.com>
> Envoyé: Dimanche 22 Juillet 2018 07:14:05
> Objet: RE: Mlx4/5 : Packets lost between phy and good counters

> Hi Tom,
> 
> Wednesday, July 18, 2018 6:41 PM, tom.barbette at uliege.be:
>> Cc: katsikas at kth.se
>> Subject: [dpdk-users] Mlx4/5 : Packets lost between phy and good counters
>> 
>> Hi all,
>> 
>> During a simple forwarding experiment using mlx4 (but we observed the
>> same with mlx5) 100G NICs, we have a sender reporting more TX throughput
>> than what the receiver is receiving, but the receiver does not report any
>> packet loss... They are connected by a simple QSFP28 direct attach cable. So
>> where did the packet disappear?
>> 
>> The only thing we could find is that rx_good_packets in xstats is lower than
>> rx_packets_phy. rx_packets_phy is in line with what the sender is reporting,
>> so I guess some of the "phy" are not "good". But no error counter, missed,
>> mbuf_alloc, ... is giving as a clue why those packets are not "good".
>> 
>> We tried with real traces and UDP crafted packets of various size, same
>> problem.
>> 
>> Any idea ?
> 
> Yes, what you are experiencing is a packet drop due to backpressure from the
> device.
> 
> The rx_good_packets are the good packets (w/o errors) received by the port (can
> be either PF or VF).
> The rx_packets_phy are the packets received by the physical port (this is the
> aggregation of the PF and all of the VFs).
> A gap between those means some packet has been lost, or as you said received w/
> errors.  We are indeed missing one counter here which is the rx_discard_phy
> which counts
> The number of received packets dropped due to lack of buffers on a physical
> port. This work is in progress.
> 
> There is another way to query this counter (and many others) for Mellanox
> devices by using linux ethtool: "ethtool -S <ifname>" (Mellanox devices keep
> their kernel module).
> The statistics in DPDK are shadow of the ethtool ones. You can read more about
> those counters in the community doc[1].
> w/ the ethtool statistics look for the discard counter and validate if it is
> increasing.
> 
> Assuming it does, we need to understand why you have such backpressure.
> Things to check:
> 1. is the PCI slot for your mlx5 device is indeed by 16x?
> 2. are you using scatter mode w/ large max_rx_pkt_len?
> 3. have you followed the mlx5 performance tunning guide[2]?
> 
> 
>> 
>> Below the detail stats of the receiver (which is a forwarder but it is not of
>> importance in this context) :
>> 
>> stats.count:
>> 31986429
>> stats.missed:
>> 0
>> stats.error:
>> 0
>> fd0.xstats:
>> rx_good_packets[0] = 31986429
>> tx_good_packets[1] = 31986429
>> rx_good_bytes[2] = 47979639204
>> tx_good_bytes[3] = 47851693488
>> rx_missed_errors[4] = 0
>> rx_errors[5] = 0
>> tx_errors[6] = 0
>> rx_mbuf_allocation_errors[7] = 0
>> rx_q0packets[8] = 4000025
>> rx_q0bytes[9] = 6000036068
>> rx_q0errors[10] = 0
>> rx_q1packets[11] = 4002151
>> rx_q1bytes[12] = 6003226500
>> rx_q1errors[13] = 0
>> rx_q2packets[14] = 3996758
>> rx_q2bytes[15] = 5995137000
>> rx_q2errors[16] = 0
>> rx_q3packets[17] = 3993614
>> rx_q3bytes[18] = 5990421000
>> rx_q3errors[19] = 0
>> rx_q4packets[20] = 3995758
>> rx_q4bytes[21] = 5993637000
>> rx_q4errors[22] = 0
>> rx_q5packets[23] = 3992126
>> rx_q5bytes[24] = 5988189000
>> rx_q5errors[25] = 0
>> rx_q6packets[26] = 4007488
>> rx_q6bytes[27] = 6011230568
>> rx_q6errors[28] = 0
>> rx_q7packets[29] = 3998509
>> rx_q7bytes[30] = 5997762068
>> rx_q7errors[31] = 0
>> tx_q0packets[32] = 4000025
>> tx_q0bytes[33] = 5984035968
>> tx_q1packets[34] = 4002151
>> tx_q1bytes[35] = 5987217896
>> tx_q2packets[36] = 3996758
>> tx_q2bytes[37] = 5979149968
>> tx_q3packets[38] = 3993614
>> tx_q3bytes[39] = 5974446544
>> tx_q4packets[40] = 3995758
>> tx_q4bytes[41] = 5977653968
>> tx_q5packets[42] = 3992126
>> tx_q5bytes[43] = 5972220496
>> tx_q6packets[44] = 4007488
>> tx_q6bytes[45] = 5995200616
>> tx_q7packets[46] = 3998509
>> tx_q7bytes[47] = 5981768032
>> rx_port_unicast_bytes[48] = 47851693488
>> rx_port_multicast_bytes[49] = 0
>> rx_port_broadcast_bytes[50] = 0
>> rx_port_unicast_packets[51] = 31986429
>> rx_port_multicast_packets[52] = 0
>> rx_port_broadcast_packets[53] = 0
>> tx_port_unicast_bytes[54] = 47851693488
>> tx_port_multicast_bytes[55] = 0
>> tx_port_broadcast_bytes[56] = 0
>> tx_port_unicast_packets[57] = 31986429
>> tx_port_multicast_packets[58] = 0
>> tx_port_broadcast_packets[59] = 0
>> rx_wqe_err[60] = 0
>> rx_crc_errors_phy[61] = 0
>> rx_in_range_len_errors_phy[62] = 0
>> rx_symbol_err_phy[63] = 0
>> tx_errors_phy[64] = 0
>> rx_out_of_buffer[65] = 0
>> tx_packets_phy[66] = 31986429
>> rx_packets_phy[67] = 36243270
>> tx_bytes_phy[68] = 47979639204
>> rx_bytes_phy[69] = 54364900704
>> 
>> 
>> Thanks,
>> Tom
> 
> [1] https://community.mellanox.com/docs/DOC-2532
> [2] https://doc.dpdk.org/guides/nics/mlx5.html


More information about the users mailing list