[dpdk-users] Mlx4/5 : Packets lost between phy and good counters

Shahaf Shuler shahafs at mellanox.com
Tue Jul 24 09:33:29 CEST 2018


Monday, July 23, 2018 2:14 PM, tom.barbette at uliege.be:
> Subject: Re: Mlx4/5 : Packets lost between phy and good counters
> 
> Hi Shahaf,
> 
> Thank you for the help !
> 
> I did not notice ethtool showed more stats, indeed it would be great to have
> them in DPDK. As you suggested, rx_discards_phy is increasing so packets
> are dropped there.
> 
> However, it is not due to a lack of buffer (if you meant queues/ring buffer as
> opposed to some mellanox internals) as the CPU is starving for work on any
> queues. We also ensured the CPU was not the problem by 1) using more
> CPU cores, 2) introducing on-purpose instructions and cache misses on the
> CPU, that did not lead to any performance loss.

I didn't say the backpressure comes from CPU, it is probably triggered by the NIC from some reason (the PCI and scatter check I requested from you were some simple sanity checks for possible reasons).

> 
> 1) Both cards on both machines are on a PCIe Gen 3 x 16 and acknowledged
> both by lspci and Mlx5 driver as it.
> 2) Disabling/enabling scatter mode in ethtool does not change performances,

Not through ethtool, by DPDK APIs. 

> but I don't think we're using it anyway (we do nothing special in DPDK for this
> ? Packets are always one segment)
> 3) We followed the performance guide(s) among other things, with the
> exception of CQE_COMPRESSION as we didn't find any "mst" reference.
> 
> We noticed that when using only one side of a port, that is one machine only
> doing TX, and the other doing RX (discarding packets, but still rewriting
> them), we do send/receive 100G (the numbers discussed before lead to a
> ~80G "bouncing" throughput cap).
> 
> This is still true with Connect-X 4 or 5, and with different (Intel) machines
> with different motherboards. Maybe the mlx5 perform slightly better
> (bouncing 84G) but there is still this cap, and it may be due to other
> parameters.
> 
> Interestingly, we found that this cap is somehow dependent on the card and
> not the port, as if we use the two ports of the PCIe card, forwarding from A
> to B and B to A at full speed, the throughput goes down to ~40G per port (so
> 80G total forwarding throughput), but if we use two different PCI express
> card, it is back to ~80G per side, so ~160G forwarding rate total (also leading
> to the conclusion that our problem is not CPU-based as with more PCIe card
> we have better perfs).

It looks like the bottleneck is on the PCI, and CQE_COMPRESSION configuration can be the reason for that (it is feature to save PCI utilization and critical to reach 100G w/ small frames). 
As this looks like a NIC/System configuration issue I suggest to open a ticket to Mellanox Support in order to look on your system and advise. 

> 
> Thanks,
> 
> 
> Tom
> 
> ----- Mail original -----
> > De: "Shahaf Shuler" <shahafs at mellanox.com>
> > À: "tom barbette" <tom.barbette at uliege.be>, users at dpdk.org
> > Cc: katsikas at kth.se, "Erez Ferber" <erezf at mellanox.com>
> > Envoyé: Dimanche 22 Juillet 2018 07:14:05
> > Objet: RE: Mlx4/5 : Packets lost between phy and good counters
> 
> > Hi Tom,
> >
> > Wednesday, July 18, 2018 6:41 PM, tom.barbette at uliege.be:
> >> Cc: katsikas at kth.se
> >> Subject: [dpdk-users] Mlx4/5 : Packets lost between phy and good
> >> counters
> >>
> >> Hi all,
> >>
> >> During a simple forwarding experiment using mlx4 (but we observed the
> >> same with mlx5) 100G NICs, we have a sender reporting more TX
> >> throughput than what the receiver is receiving, but the receiver does
> >> not report any packet loss... They are connected by a simple QSFP28
> >> direct attach cable. So where did the packet disappear?
> >>
> >> The only thing we could find is that rx_good_packets in xstats is
> >> lower than rx_packets_phy. rx_packets_phy is in line with what the
> >> sender is reporting, so I guess some of the "phy" are not "good". But
> >> no error counter, missed, mbuf_alloc, ... is giving as a clue why those
> packets are not "good".
> >>
> >> We tried with real traces and UDP crafted packets of various size,
> >> same problem.
> >>
> >> Any idea ?
> >
> > Yes, what you are experiencing is a packet drop due to backpressure
> > from the device.
> >
> > The rx_good_packets are the good packets (w/o errors) received by the
> > port (can be either PF or VF).
> > The rx_packets_phy are the packets received by the physical port (this
> > is the aggregation of the PF and all of the VFs).
> > A gap between those means some packet has been lost, or as you said
> > received w/ errors.  We are indeed missing one counter here which is
> > the rx_discard_phy which counts The number of received packets dropped
> > due to lack of buffers on a physical port. This work is in progress.
> >
> > There is another way to query this counter (and many others) for
> > Mellanox devices by using linux ethtool: "ethtool -S <ifname>"
> > (Mellanox devices keep their kernel module).
> > The statistics in DPDK are shadow of the ethtool ones. You can read
> > more about those counters in the community doc[1].
> > w/ the ethtool statistics look for the discard counter and validate if
> > it is increasing.
> >
> > Assuming it does, we need to understand why you have such
> backpressure.
> > Things to check:
> > 1. is the PCI slot for your mlx5 device is indeed by 16x?
> > 2. are you using scatter mode w/ large max_rx_pkt_len?
> > 3. have you followed the mlx5 performance tunning guide[2]?
> >
> >
> >>
> >> Below the detail stats of the receiver (which is a forwarder but it
> >> is not of importance in this context) :
> >>
> >> stats.count:
> >> 31986429
> >> stats.missed:
> >> 0
> >> stats.error:
> >> 0
> >> fd0.xstats:
> >> rx_good_packets[0] = 31986429
> >> tx_good_packets[1] = 31986429
> >> rx_good_bytes[2] = 47979639204
> >> tx_good_bytes[3] = 47851693488
> >> rx_missed_errors[4] = 0
> >> rx_errors[5] = 0
> >> tx_errors[6] = 0
> >> rx_mbuf_allocation_errors[7] = 0
> >> rx_q0packets[8] = 4000025
> >> rx_q0bytes[9] = 6000036068
> >> rx_q0errors[10] = 0
> >> rx_q1packets[11] = 4002151
> >> rx_q1bytes[12] = 6003226500
> >> rx_q1errors[13] = 0
> >> rx_q2packets[14] = 3996758
> >> rx_q2bytes[15] = 5995137000
> >> rx_q2errors[16] = 0
> >> rx_q3packets[17] = 3993614
> >> rx_q3bytes[18] = 5990421000
> >> rx_q3errors[19] = 0
> >> rx_q4packets[20] = 3995758
> >> rx_q4bytes[21] = 5993637000
> >> rx_q4errors[22] = 0
> >> rx_q5packets[23] = 3992126
> >> rx_q5bytes[24] = 5988189000
> >> rx_q5errors[25] = 0
> >> rx_q6packets[26] = 4007488
> >> rx_q6bytes[27] = 6011230568
> >> rx_q6errors[28] = 0
> >> rx_q7packets[29] = 3998509
> >> rx_q7bytes[30] = 5997762068
> >> rx_q7errors[31] = 0
> >> tx_q0packets[32] = 4000025
> >> tx_q0bytes[33] = 5984035968
> >> tx_q1packets[34] = 4002151
> >> tx_q1bytes[35] = 5987217896
> >> tx_q2packets[36] = 3996758
> >> tx_q2bytes[37] = 5979149968
> >> tx_q3packets[38] = 3993614
> >> tx_q3bytes[39] = 5974446544
> >> tx_q4packets[40] = 3995758
> >> tx_q4bytes[41] = 5977653968
> >> tx_q5packets[42] = 3992126
> >> tx_q5bytes[43] = 5972220496
> >> tx_q6packets[44] = 4007488
> >> tx_q6bytes[45] = 5995200616
> >> tx_q7packets[46] = 3998509
> >> tx_q7bytes[47] = 5981768032
> >> rx_port_unicast_bytes[48] = 47851693488 rx_port_multicast_bytes[49] =
> >> 0 rx_port_broadcast_bytes[50] = 0 rx_port_unicast_packets[51] =
> >> 31986429 rx_port_multicast_packets[52] = 0
> >> rx_port_broadcast_packets[53] = 0 tx_port_unicast_bytes[54] =
> >> 47851693488 tx_port_multicast_bytes[55] = 0
> >> tx_port_broadcast_bytes[56] = 0 tx_port_unicast_packets[57] =
> >> 31986429 tx_port_multicast_packets[58] = 0
> >> tx_port_broadcast_packets[59] = 0 rx_wqe_err[60] = 0
> >> rx_crc_errors_phy[61] = 0 rx_in_range_len_errors_phy[62] = 0
> >> rx_symbol_err_phy[63] = 0 tx_errors_phy[64] = 0 rx_out_of_buffer[65]
> >> = 0 tx_packets_phy[66] = 31986429 rx_packets_phy[67] = 36243270
> >> tx_bytes_phy[68] = 47979639204 rx_bytes_phy[69] = 54364900704
> >>
> >>
> >> Thanks,
> >> Tom
> >
> > [1] https://community.mellanox.com/docs/DOC-2532
> > [2]
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdo
> c
> >
> .dpdk.org%2Fguides%2Fnics%2Fmlx5.html&data=02%7C01%7Cshahafs
> %40mel
> >
> lanox.com%7Cfb4a024df18c43fa3ede08d5f08d662c%7Ca652971c7d2e4d9ba6
> a4d14
> >
> 9256f461b%7C0%7C0%7C636679412485381834&sdata=AqmeV36SzgCaN
> azE8PMna
> > sdycGqkW7w98v9WeNij9bw%3D&reserved=0


More information about the users mailing list