mlx5: imissed / out_of_buffer counter always 0
Daniel Östman
daniel.ostman at ericsson.com
Mon Jun 5 16:00:23 CEST 2023
Hi Slava and Erez and thanks for your answers,
Regarding the firmware, I’ve also deployed in a different OpenShift cluster were I see the exact same issue but with a different Mellanox NIC:
Mellanox Technologies MT2892 Family - ConnectX-6 DX 2-port 100GbE QSFP56 PCIe Adapter
driver: mlx5_core
version: 5.0-0
firmware-version: 22.36.1010 (DEL0000000027)
From what I can see the firmware is relatively new on that one?
I tried setting dv_flow_en=0 (and saw that it was propagated to config->dv_flow_en) but it didn’t seem to help.
Erez, I’m not sure what you mean by shared or non-shared mode in this case, however it seems it could be related to the fact that the container is running in a separate network namespace. Because the hw_counter directory is available on the host (cluster node), but not in the pod container.
Best regards,
Daniel
From: Erez Ferber <erezferber at gmail.com>
Sent: Monday, 5 June 2023 12:29
To: Slava Ovsiienko <viacheslavo at nvidia.com>
Cc: Daniel Östman <daniel.ostman at ericsson.com>; users at dpdk.org; Matan Azrad <matan at nvidia.com>; maxime.coquelin at redhat.com; david.marchand at redhat.com
Subject: Re: mlx5: imissed / out_of_buffer counter always 0
Hi Daniel,
is the container running in shared or non-shared mode ?
For shared mode, I assume the kernel sysfs counters which DPDK relies on for imissed/out_of_buffer are not exposed.
Best regards,
Erez
On Fri, 2 Jun 2023 at 18:07, Slava Ovsiienko <viacheslavo at nvidia.com<mailto:viacheslavo at nvidia.com>> wrote:
Hi, Daniel
I would recommend to take the following action:
- update the firmware, 16.33.xxxx looks to be outdated a little bit. Please, try 16.35.1012 or later.
mlx5_glue->devx_obj_create might succeed with the newer FW.
- try to specify dv_flow_en=0 devarg, it forces mlx5 PMD to use rdma_core library for queue management
and kernel driver will be aware about Rx queues being created and attach them to the kernel counter set
With best regards,
Slava
From: Daniel Östman <daniel.ostman at ericsson.com<mailto:daniel.ostman at ericsson.com>>
Sent: Friday, June 2, 2023 3:59 PM
To: users at dpdk.org<mailto:users at dpdk.org>
Cc: Matan Azrad <matan at nvidia.com<mailto:matan at nvidia.com>>; Slava Ovsiienko <viacheslavo at nvidia.com<mailto:viacheslavo at nvidia.com>>; maxime.coquelin at redhat.com<mailto:maxime.coquelin at redhat.com>; david.marchand at redhat.com<mailto:david.marchand at redhat.com>
Subject: mlx5: imissed / out_of_buffer counter always 0
Hi,
I’m deploying a containerized DPDK application in an OpenShift Kubernetes environment using DPDK 21.11.3.
The application uses a Mellanox ConnectX-5 100G NIC through VFs.
The problem I have is that the ETH stats counter imissed (which seems to be mapped to “out_of_buffer” internally in mlx5 PMD driver) is 0 when I don’t expect it to be, i.e. when the application doesn’t read the packets fast enough.
Using GDB I can see that it tries to access the counter through /sys/class/infiniband/mlx5_99/ports/1/hw_counters/out_of_buffer but the hw_counters directory is missing so it will just return a zero value. I don’t know why it is missing.
When looking at mlx5_os_read_dev_stat() I can see that there is an alternative way of reading the counter, through mlx5_devx_cmd_queue_counter_query() but under the condition that priv->q_counters are set.
It doesn’t get set in my case because mlx5_glue->devx_obj_create() fails (errno 22) in mlx5_devx_cmd_queue_counter_alloc().
Have I missed something?
NIC info:
Mellanox Technologies MT27800 Family [ConnectX-5] - 100Gb 2-port QSFP28 MCX516A-CCHT
driver: mlx5_core
version: 5.0-0
firmware-version: 16.33.1048 (MT_0000000417)
Please let me know if I need to provide more information.
Best regards,
Daniel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mails.dpdk.org/archives/users/attachments/20230605/cee793c7/attachment-0001.htm>
More information about the users
mailing list