[PATCH] net/mlx5: enable PCI related counters

Honnappa Nagarahalli Honnappa.Nagarahalli at arm.com
Wed Feb 14 02:50:43 CET 2024



> On Feb 13, 2024, at 7:12 AM, Slava Ovsiienko <viacheslavo at nvidia.com> wrote:
> 
> Hi,
> 
> Regarding "dev_out_of_buffer" - it is global counter, relates to the whole device port,
> Including queues not managed by DPDK application - Mellanox/Nvidia NICs operate
> In "bifurcated mode" - there might be queues managed by kernel or another DPDK
> application. Not sure it makes a lot of sense, but I have no strong objections.
These are still helpful to debug in lab environment. But, it would be good to document these.

> 
> The PCI related counters are also global ones and reflect statistics, impacted by
> PCI activity of the whole physical device, including all the network ports located
> on the same NIC board (and, sometimes, by internal activity in BlueField).
> 
> As I said, no objections from my side:
> 
> Acked-by: Viacheslav Ovsiienko <viacheslavo at nvidia.com>
> 
> With best regards,
> Slava
> 
>> -----Original Message-----
>> From: Wathsala Vithanage <wathsala.vithanage at arm.com>
>> Sent: Friday, February 9, 2024 10:42 PM
>> To: NBU-Contact-Thomas Monjalon (EXTERNAL) <thomas at monjalon.net>;
>> Dariusz Sosnowski <dsosnowski at nvidia.com>; Slava Ovsiienko
>> <viacheslavo at nvidia.com>; Ori Kam <orika at nvidia.com>; Suanming Mou
>> <suanmingm at nvidia.com>; Matan Azrad <matan at nvidia.com>
>> Cc: dev at dpdk.org; nd at arm.com; Wathsala Vithanage
>> <wathsala.vithanage at arm.com>; Honnappa Nagarahalli
>> <honnappa.nagarahalli at arm.com>
>> Subject: [PATCH] net/mlx5: enable PCI related counters
>> 
>> Versions of Mellanox NICs starting from CX5 have device counters related to PCI.
>> These counters are helpful in debugging IO bottlenecks. For instance, the
>> outbound_pci_stalled_rd and outbound_pci_stalled_wr counters can help with
>> identifying NIC stalls due to insufficient PCI credits, which otherwise would have
>> required a PCI analyzer or a sophisticated PCI root port with a PMU.
>> Currently none of these are available in the MLX5 PMD even though ethtool is
>> capable of reading some of them.
>> Since PMD uses the same ioctl used by ethtool (SIOCETHTOOL) and reads via the
>> kernel driver it is possible to add support with ease.
>> There is one more PCI related counter and a device counter that aren't
>> implemented in the Linux driver at the moment. These two are named
>> outbound_pci_buffer_overflow and dev_out_of_buffer respectively. As per
>> Nvidia's documentation these two counters can tell the number of packets
>> dropped due to pci buffer overflow and the number of times the device owned
>> queue had not enough buffers allocated.
>> 
>> Signed-off-by: Wathsala Vithanage <wathsala.vithanage at arm.com>
>> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli at arm.com>
>> ---
>> .mailmap                                |  1 +
>> drivers/net/mlx5/linux/mlx5_ethdev_os.c | 33
>> +++++++++++++++++++++++++
>> 2 files changed, 34 insertions(+)
>> 
>> diff --git a/.mailmap b/.mailmap
>> index aa569ff456..f57415f7a1 100644
>> --- a/.mailmap
>> +++ b/.mailmap
>> @@ -1510,6 +1510,7 @@ Walter Heymans <walter.heymans at corigine.com>
>> Wang Sheng-Hui <shhuiw at gmail.com>  Wangyu (Eric)
>> <seven.wangyu at huawei.com>  Waterman Cao <waterman.cao at intel.com>
>> +Wathsala Vithanage <wathsala.vithanage at arm.com>
>> Weichun Chen <weichunx.chen at intel.com>
>> Wei Dai <wei.dai at intel.com>
>> Weifeng Li <liweifeng96 at 126.com>
>> diff --git a/drivers/net/mlx5/linux/mlx5_ethdev_os.c
>> b/drivers/net/mlx5/linux/mlx5_ethdev_os.c
>> index dd5a0c546d..8f1567f6a7 100644
>> --- a/drivers/net/mlx5/linux/mlx5_ethdev_os.c
>> +++ b/drivers/net/mlx5/linux/mlx5_ethdev_os.c
>> @@ -1574,6 +1574,39 @@ static const struct mlx5_counter_ctrl
>> mlx5_counters_init[] = {
>> .dpdk_name = "tx_vport_bytes",
>> .ctr_name = "vport_tx_bytes",
>> },
>> + /* Device counters */
>> + {
>> + .dpdk_name = "rx_pci_signal_integrity",
>> + .ctr_name = "rx_pci_signal_integrity",
>> + },
>> + {
>> + .dpdk_name = "tx_pci_signal_integrity",
>> + .ctr_name = "tx_pci_signal_integrity",
>> + },
>> + {
>> + .dpdk_name = "outbound_pci_buffer_overflow",
>> + .ctr_name = "outbound_pci_buffer_overflow",
>> + },
>> + {
>> + .dpdk_name = "outbound_pci_stalled_rd",
>> + .ctr_name = "outbound_pci_stalled_rd",
>> + },
>> + {
>> + .dpdk_name = "outbound_pci_stalled_wr",
>> + .ctr_name = "outbound_pci_stalled_wr",
>> + },
>> + {
>> + .dpdk_name = "outbound_pci_stalled_rd_events",
>> + .ctr_name = "outbound_pci_stalled_rd_events",
>> + },
>> + {
>> + .dpdk_name = "outbound_pci_stalled_wr_events",
>> + .ctr_name = "outbound_pci_stalled_wr_events",
>> + },
>> + {
>> + .dpdk_name = "dev_out_of_buffer",
>> + .ctr_name = "dev_out_of_buffer",
>> + },
>> };
>> 
>> static const unsigned int xstats_n = RTE_DIM(mlx5_counters_init);
>> --
>> 2.25.1
> 



More information about the dev mailing list