[PATCH] net/mlx5: enable PCI related counters

Slava Ovsiienko viacheslavo at nvidia.com
Tue Feb 13 14:12:48 CET 2024


Hi,

Regarding "dev_out_of_buffer" - it is global counter, relates to the whole device port,
Including queues not managed by DPDK application - Mellanox/Nvidia NICs operate
In "bifurcated mode" - there might be queues managed by kernel or another DPDK
application. Not sure it makes a lot of sense, but I have no strong objections.

The PCI related counters are also global ones and reflect statistics, impacted by
PCI activity of the whole physical device, including all the network ports located
on the same NIC board (and, sometimes, by internal activity in BlueField).

As I said, no objections from my side:

Acked-by: Viacheslav Ovsiienko <viacheslavo at nvidia.com>

With best regards,
Slava

> -----Original Message-----
> From: Wathsala Vithanage <wathsala.vithanage at arm.com>
> Sent: Friday, February 9, 2024 10:42 PM
> To: NBU-Contact-Thomas Monjalon (EXTERNAL) <thomas at monjalon.net>;
> Dariusz Sosnowski <dsosnowski at nvidia.com>; Slava Ovsiienko
> <viacheslavo at nvidia.com>; Ori Kam <orika at nvidia.com>; Suanming Mou
> <suanmingm at nvidia.com>; Matan Azrad <matan at nvidia.com>
> Cc: dev at dpdk.org; nd at arm.com; Wathsala Vithanage
> <wathsala.vithanage at arm.com>; Honnappa Nagarahalli
> <honnappa.nagarahalli at arm.com>
> Subject: [PATCH] net/mlx5: enable PCI related counters
> 
> Versions of Mellanox NICs starting from CX5 have device counters related to PCI.
> These counters are helpful in debugging IO bottlenecks. For instance, the
> outbound_pci_stalled_rd and outbound_pci_stalled_wr counters can help with
> identifying NIC stalls due to insufficient PCI credits, which otherwise would have
> required a PCI analyzer or a sophisticated PCI root port with a PMU.
> Currently none of these are available in the MLX5 PMD even though ethtool is
> capable of reading some of them.
> Since PMD uses the same ioctl used by ethtool (SIOCETHTOOL) and reads via the
> kernel driver it is possible to add support with ease.
> There is one more PCI related counter and a device counter that aren't
> implemented in the Linux driver at the moment. These two are named
> outbound_pci_buffer_overflow and dev_out_of_buffer respectively. As per
> Nvidia's documentation these two counters can tell the number of packets
> dropped due to pci buffer overflow and the number of times the device owned
> queue had not enough buffers allocated.
> 
> Signed-off-by: Wathsala Vithanage <wathsala.vithanage at arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli at arm.com>
> ---
>  .mailmap                                |  1 +
>  drivers/net/mlx5/linux/mlx5_ethdev_os.c | 33
> +++++++++++++++++++++++++
>  2 files changed, 34 insertions(+)
> 
> diff --git a/.mailmap b/.mailmap
> index aa569ff456..f57415f7a1 100644
> --- a/.mailmap
> +++ b/.mailmap
> @@ -1510,6 +1510,7 @@ Walter Heymans <walter.heymans at corigine.com>
> Wang Sheng-Hui <shhuiw at gmail.com>  Wangyu (Eric)
> <seven.wangyu at huawei.com>  Waterman Cao <waterman.cao at intel.com>
> +Wathsala Vithanage <wathsala.vithanage at arm.com>
>  Weichun Chen <weichunx.chen at intel.com>
>  Wei Dai <wei.dai at intel.com>
>  Weifeng Li <liweifeng96 at 126.com>
> diff --git a/drivers/net/mlx5/linux/mlx5_ethdev_os.c
> b/drivers/net/mlx5/linux/mlx5_ethdev_os.c
> index dd5a0c546d..8f1567f6a7 100644
> --- a/drivers/net/mlx5/linux/mlx5_ethdev_os.c
> +++ b/drivers/net/mlx5/linux/mlx5_ethdev_os.c
> @@ -1574,6 +1574,39 @@ static const struct mlx5_counter_ctrl
> mlx5_counters_init[] = {
>  		.dpdk_name = "tx_vport_bytes",
>  		.ctr_name = "vport_tx_bytes",
>  	},
> +	/* Device counters */
> +	{
> +		.dpdk_name = "rx_pci_signal_integrity",
> +		.ctr_name = "rx_pci_signal_integrity",
> +	},
> +	{
> +		.dpdk_name = "tx_pci_signal_integrity",
> +		.ctr_name = "tx_pci_signal_integrity",
> +	},
> +	{
> +		.dpdk_name = "outbound_pci_buffer_overflow",
> +		.ctr_name = "outbound_pci_buffer_overflow",
> +	},
> +	{
> +		.dpdk_name = "outbound_pci_stalled_rd",
> +		.ctr_name = "outbound_pci_stalled_rd",
> +	},
> +	{
> +		.dpdk_name = "outbound_pci_stalled_wr",
> +		.ctr_name = "outbound_pci_stalled_wr",
> +	},
> +	{
> +		.dpdk_name = "outbound_pci_stalled_rd_events",
> +		.ctr_name = "outbound_pci_stalled_rd_events",
> +	},
> +	{
> +		.dpdk_name = "outbound_pci_stalled_wr_events",
> +		.ctr_name = "outbound_pci_stalled_wr_events",
> +	},
> +	{
> +		.dpdk_name = "dev_out_of_buffer",
> +		.ctr_name = "dev_out_of_buffer",
> +	},
>  };
> 
>  static const unsigned int xstats_n = RTE_DIM(mlx5_counters_init);
> --
> 2.25.1



More information about the dev mailing list