mlx5: imissed / out_of_buffer counter always 0

Maxime Coquelin maxime.coquelin at redhat.com
Wed Oct 4 15:49:28 CEST 2023


Hi Daniel, Erez & Slava,

My time to be sorry, I missed this email when coming back from vacation.

On 8/18/23 14:04, Daniel Östman wrote:
> Hi Maxime,
> 
> Sorry for the late reply, I've been on vacation.
> Please see my answer below.
> 
> / Daniel
> 
>> -----Original Message-----
>> From: Maxime Coquelin <maxime.coquelin at redhat.com>
>> Sent: Thursday, 22 June 2023 17:48
>> To: Daniel Östman <daniel.ostman at ericsson.com>; Erez Ferber
>> <erezferber at gmail.com>; Slava Ovsiienko <viacheslavo at nvidia.com>
>> Cc: users at dpdk.org; Matan Azrad <matan at nvidia.com>;
>> david.marchand at redhat.com
>> Subject: Re: mlx5: imissed / out_of_buffer counter always 0
>>
>> Hi,
>>
>> On 6/21/23 22:22, Maxime Coquelin wrote:
>>> Hi Daniel, all,
>>>
>>> On 6/5/23 16:00, Daniel Östman wrote:
>>>> Hi Slava and Erez and thanks for your answers,
>>>>
>>>> Regarding the firmware, I’ve also deployed in a different OpenShift
>>>> cluster were I see the exact same issue but with a different Mellanox
>>>> NIC:
>>>>
>>>> Mellanox Technologies MT2892 Family - ConnectX-6 DX 2-port 100GbE
>>>> QSFP56 PCIe Adapter
>>>>
>>>> driver: mlx5_core
>>>>
>>>> version: 5.0-0
>>>> firmware-version: 22.36.1010 (DEL0000000027)
>>>>
>>>>   From what I can see the firmware is relatively new on that one?
>>>
>>> With below configuration:
>>> - ConnectX-6 Dx MT2892
>>> - Kernel: 6.4.0-rc6
>>> - FW version: 22.35.1012 (MT_0000000528)
>>>
>>> The out-of-buffer counter is fetched via
>>> mlx5_devx_cmd_queue_counter_query():
>>>
>>> [pid  2942] ioctl(17, RDMA_VERBS_IOCTL, 0x7ffcb15bcd10) = 0 [pid
>>> 2942] write(1, "\n  ######################## NIC "..., 80) = 80 [pid
>>> 2942] write(1, "  RX-packets: 630997736  RX-miss"..., 70) = 70 [pid
>>> 2942] write(1, "  RX-errors: 0\n", 15) = 15 [pid  2942] write(1, "
>>> RX-nombuf:  0         \n", 25) = 25 [pid  2942] write(1, "
>>> TX-packets: 0          TX-erro"..., 60) = 60 [pid  2942] write(1,
>>> "\n", 1)           = 1 [pid  2942] write(1, "  Throughput (since last
>>> show)\n", 31) = 31 [pid  2942] write(1, "  Rx-pps:            0
>>> "..., 106) = 106 [pid  2942] write(1,"
>>> ##############################"..., 79) = 79
>>>
>>> It looks like we may miss some mlx5 kernel patches so that we can use
>>> mlx5_devx_cmd_queue_counter_query() with RHEL?
>>>
>>> Erez, Slava, any idea on the patches that could be missing?
>>
>> Above test was on baremetal as root, I get the same "working" behaviour on
>> RHEL as root.
>>
>> We managed to reproduce Daniel's with running the same within a container,
>> enabling debug logs we have this warning:
>>
>> mlx5_common: DevX create q counter set failed errno=121 status=0x2
>> syndrome=0x8975f1
>> mlx5_net: Port 0 queue counter object cannot be created by DevX - fall-back
>> to use the kernel driver global queue counter.
>>
>> Running the container as privileged solves the issue, and so does when
>> adding SYS_RAWIO capability to the container.
>>
>> Erez, Slava, is that expected to require SYS_RAWIO just to get a stat counter?

Erez & Slava, could it be possible to get the stats counters via devx
without requiring SYS_RAWIO?

>>
>> Daniel, could you try adding SYS_RAWIO to your pod to confirm you face the
>> same issue?
> 
> Yes I can confirm what you are seeing when running in a cluster with Openshift 4.12 (RHEL 8.6) and with SYS_RAWIO or running as privileged.
> But with privileged container I also need to run with UID 0 for it to work, is that what you are doing as well?

I don't have an OCP setup at hand right now to test it, but IIRC yes we
ran it with UID 0.

> In both these cases the counter can be successfully retrieved through the DevX interface.

Ok.

> However, when running in a cluster with Openshift 4.10 (RHEL 8.4) I can not get it to work with any of these two approaches.

I'm not sure this is Kernel related, as I tested on both RHEL-8.4.0 and
latest RHEL_8.4 and I can get que q counters via ioctl().

Maxime

>> Thanks in advance,
>> Maxime
>>> Regards,
>>> Maxime
>>>
>>>>
>>>> I tried setting dv_flow_en=0 (and saw that it was propagated to
>>>> config->dv_flow_en) but it didn’t seem to help.
>>>>
>>>> Erez, I’m not sure what you mean by shared or non-shared mode in this
>>>> case, however it seems it could be related to the fact that the
>>>> container is running in a separate network namespace. Because the
>>>> hw_counter directory is available on the host (cluster node), but not
>>>> in the pod container.
>>>>
>>>> Best regards,
>>>>
>>>> Daniel
>>>>
>>>> *From:*Erez Ferber <erezferber at gmail.com>
>>>> *Sent:* Monday, 5 June 2023 12:29
>>>> *To:* Slava Ovsiienko <viacheslavo at nvidia.com>
>>>> *Cc:* Daniel Östman <daniel.ostman at ericsson.com>; users at dpdk.org;
>>>> Matan Azrad <matan at nvidia.com>; maxime.coquelin at redhat.com;
>>>> david.marchand at redhat.com
>>>> *Subject:* Re: mlx5: imissed / out_of_buffer counter always 0
>>>>
>>>> Hi Daniel,
>>>>
>>>> is the container running in shared or non-shared mode ?
>>>>
>>>> For shared mode, I assume the kernel sysfs counters which DPDK relies
>>>> on for imissed/out_of_buffer are not exposed.
>>>>
>>>> Best regards,
>>>>
>>>> Erez
>>>>
>>>> On Fri, 2 Jun 2023 at 18:07, Slava Ovsiienko <viacheslavo at nvidia.com
>>>> <mailto:viacheslavo at nvidia.com>> wrote:
>>>>
>>>>      Hi, Daniel
>>>>
>>>>      I would recommend to take the following action:
>>>>
>>>>      - update the firmware, 16.33.xxxx looks to be outdated a little bit.
>>>>      Please, try 16.35.1012 or later.
>>>>         mlx5_glue->devx_obj_create might succeed with the newer FW.
>>>>
>>>>      - try to specify dv_flow_en=0 devarg, it forces mlx5 PMD to use
>>>>      rdma_core library for queue management
>>>>        and kernel driver will  be aware about Rx queues being created
>>>> and
>>>>      attach them to the kernel counter set
>>>>
>>>>      With best regards,
>>>>      Slava
>>>>
>>>>      *From:*Daniel Östman <daniel.ostman at ericsson.com
>>>>      <mailto:daniel.ostman at ericsson.com>>
>>>>      *Sent:* Friday, June 2, 2023 3:59 PM
>>>>      *To:* users at dpdk.org <mailto:users at dpdk.org>
>>>>      *Cc:* Matan Azrad <matan at nvidia.com <mailto:matan at nvidia.com>>;
>>>>      Slava Ovsiienko <viacheslavo at nvidia.com
>>>>      <mailto:viacheslavo at nvidia.com>>; maxime.coquelin at redhat.com
>>>>      <mailto:maxime.coquelin at redhat.com>; david.marchand at redhat.com
>>>>      <mailto:david.marchand at redhat.com>
>>>>      *Subject:* mlx5: imissed / out_of_buffer counter always 0
>>>>
>>>>      Hi,
>>>>
>>>>      I’m deploying a containerized DPDK application in an OpenShift
>>>>      Kubernetes environment using DPDK 21.11.3.
>>>>
>>>>      The application uses a Mellanox ConnectX-5 100G NIC through VFs.
>>>>
>>>>      The problem I have is that the ETH stats counter imissed (which
>>>>      seems to be mapped to “out_of_buffer” internally in mlx5 PMD
>>>> driver)
>>>>      is 0 when I don’t expect it to be, i.e. when the application
>>>> doesn’t
>>>>      read the packets fast enough.
>>>>
>>>>      Using GDB I can see that it tries to access the counter through
>>>>      /sys/class/infiniband/mlx5_99/ports/1/hw_counters/out_of_buffer
>>>> but
>>>>      the hw_counters directory is missing so it will just return a
>>>> zero
>>>>      value. I don’t know why it is missing.
>>>>
>>>>      When looking at mlx5_os_read_dev_stat() I can see that there is
>>>> an
>>>>      alternative way of reading the counter, through
>>>>      mlx5_devx_cmd_queue_counter_query() but under the condition that
>>>>      priv->q_counters are set.
>>>>
>>>>      It doesn’t get set in my case because
>>>> mlx5_glue->devx_obj_create()
>>>>      fails (errno 22) in mlx5_devx_cmd_queue_counter_alloc().
>>>>
>>>>      Have I missed something?
>>>>
>>>>      NIC info:
>>>>
>>>>      Mellanox Technologies MT27800 Family [ConnectX-5] - 100Gb 2-port
>>>>      QSFP28 MCX516A-CCHT
>>>>      driver: mlx5_core
>>>>      version: 5.0-0
>>>>      firmware-version: 16.33.1048 (MT_0000000417)
>>>>
>>>>      Please let me know if I need to provide more information.
>>>>
>>>>      Best regards,
>>>>
>>>>      Daniel
>>>>



More information about the users mailing list