<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
span.m-2922114280274956019hgkelc
{mso-style-name:m_-2922114280274956019hgkelc;}
span.EmailStyle19
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri",sans-serif;
mso-fareast-language:EN-US;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="en-SE" link="blue" vlink="purple" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal"><span lang="EN-US" style="mso-fareast-language:EN-US">Hi Slava and Erez and thanks for your answers,<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="mso-fareast-language:EN-US">Regarding the firmware, I’ve also deployed in a different OpenShift cluster were I see the exact same issue but with a different Mellanox NIC:<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="en-SE" style="mso-fareast-language:EN-US">Mellanox Technologies MT2892 Family - ConnectX-6 DX 2-port 100GbE QSFP56 PCIe Adapter<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="en-SE" style="mso-fareast-language:EN-US">driver: mlx5_core<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="en-SE" style="mso-fareast-language:EN-US">version: 5.0-0<br>
firmware-version: 22.36.1010 (DEL0000000027)<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="mso-fareast-language:EN-US">From what I can see the firmware is relatively new on that one?<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="mso-fareast-language:EN-US">I tried setting dv_flow_en=0 (and saw that it was propagated to config->dv_flow_en) but it didn’t seem to help.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="mso-fareast-language:EN-US">Erez, I’m not sure what you mean by shared or non-shared mode in this case, however it seems it could be related to the fact that the container is running in a separate network namespace.
Because the hw_counter directory is available on the host (cluster node), but not in the pod container.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="mso-fareast-language:EN-US">Best regards,<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="mso-fareast-language:EN-US">Daniel<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="en-SE" style="mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal"><b><span lang="EN-US">From:</span></b><span lang="EN-US"> Erez Ferber <erezferber@gmail.com>
<br>
<b>Sent:</b> Monday, 5 June 2023 12:29<br>
<b>To:</b> Slava Ovsiienko <viacheslavo@nvidia.com><br>
<b>Cc:</b> Daniel Östman <daniel.ostman@ericsson.com>; users@dpdk.org; Matan Azrad <matan@nvidia.com>; maxime.coquelin@redhat.com; david.marchand@redhat.com<br>
<b>Subject:</b> Re: mlx5: imissed / out_of_buffer counter always 0<o:p></o:p></span></p>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div id="gmail-:uv">
<p class="MsoNormal">Hi Daniel,<o:p></o:p></p>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">is the container running in shared or non-shared mode ? <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">For shared mode, I assume the kernel sysfs counters which DPDK relies on for imissed/out_of_buffer are not exposed.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Best regards,<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">Erez<o:p></o:p></p>
</div>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div>
<p class="MsoNormal">On Fri, 2 Jun 2023 at 18:07, Slava Ovsiienko <<a href="mailto:viacheslavo@nvidia.com">viacheslavo@nvidia.com</a>> wrote:<o:p></o:p></p>
</div>
<blockquote style="border:none;border-left:solid #CCCCCC 1.0pt;padding:0cm 0cm 0cm 6.0pt;margin-left:4.8pt;margin-right:0cm">
<div>
<div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">Hi, Daniel<o:p></o:p></span></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US"> <o:p></o:p></span></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;margin-bottom:12.0pt"><span lang="EN-US">I would recommend to take the following action:<o:p></o:p></span></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">- update the firmware, 16.33.xxxx looks to be outdated a little bit. Please, try 16.35.1012 or later.<br>
mlx5_glue->devx_obj_create might succeed with the newer FW.<o:p></o:p></span></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">- try to specify dv_flow_en=0 devarg, it forces mlx5 PMD to use rdma_core library for queue management<br>
and kernel driver will be aware about Rx queues being created and attach them to the kernel counter set<o:p></o:p></span></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US"> <o:p></o:p></span></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">With best regards,<br>
Slava<o:p></o:p></span></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US"> <o:p></o:p></span></p>
<div style="border:none;border-left:solid blue 1.5pt;padding:0cm 0cm 0cm 4.0pt">
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><b><span lang="EN-US">From:</span></b><span lang="EN-US"> Daniel Östman <<a href="mailto:daniel.ostman@ericsson.com" target="_blank">daniel.ostman@ericsson.com</a>>
<br>
<b>Sent:</b> Friday, June 2, 2023 3:59 PM<br>
<b>To:</b> <a href="mailto:users@dpdk.org" target="_blank">users@dpdk.org</a><br>
<b>Cc:</b> Matan Azrad <<a href="mailto:matan@nvidia.com" target="_blank">matan@nvidia.com</a>>; Slava Ovsiienko <<a href="mailto:viacheslavo@nvidia.com" target="_blank">viacheslavo@nvidia.com</a>>;
<a href="mailto:maxime.coquelin@redhat.com" target="_blank">maxime.coquelin@redhat.com</a>;
<a href="mailto:david.marchand@redhat.com" target="_blank">david.marchand@redhat.com</a><br>
<b>Subject:</b> mlx5: imissed / out_of_buffer counter always 0<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US"> <o:p></o:p></span></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">Hi,<o:p></o:p></span></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US"> <o:p></o:p></span></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">I’m deploying a c</span><span class="m-2922114280274956019hgkelc"><span lang="EN">ontainerized
</span></span><span lang="EN-US">DPDK application in an OpenShift Kubernetes environment using DPDK 21.11.3.<o:p></o:p></span></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">The application uses a Mellanox ConnectX-5 100G NIC through VFs.
<o:p></o:p></span></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US"> <o:p></o:p></span></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">The problem I have is that the ETH stats counter imissed (which seems to be mapped to “out_of_buffer” internally in mlx5 PMD driver) is 0 when I don’t expect
it to be, i.e. when the application doesn’t read the packets fast enough.<o:p></o:p></span></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US"> <o:p></o:p></span></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">Using GDB I can see that it tries to access the counter through /sys/class/infiniband/mlx5_99/ports/1/hw_counters/out_of_buffer but the hw_counters directory
is missing so it will just return a zero value. I don’t know why it is missing.<o:p></o:p></span></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">When looking at mlx5_os_read_dev_stat() I can see that there is an alternative way of reading the counter, through mlx5_devx_cmd_queue_counter_query() but under
the condition that priv->q_counters are set.<o:p></o:p></span></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">It doesn’t get set in my case because mlx5_glue->devx_obj_create() fails (errno 22) in mlx5_devx_cmd_queue_counter_alloc().<o:p></o:p></span></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US"> <o:p></o:p></span></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">Have I missed something?<o:p></o:p></span></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US"> <o:p></o:p></span></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">NIC info:<o:p></o:p></span></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">Mellanox Technologies MT27800 Family [ConnectX-5] - 100Gb 2-port QSFP28 MCX516A-CCHT<br>
driver: mlx5_core<br>
version: 5.0-0<br>
firmware-version: 16.33.1048 (MT_0000000417)<o:p></o:p></span></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US"> <o:p></o:p></span></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">Please let me know if I need to provide more information.<o:p></o:p></span></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US"> <o:p></o:p></span></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">Best regards,<o:p></o:p></span></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US">Daniel<o:p></o:p></span></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span lang="EN-US"> <o:p></o:p></span></p>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</body>
</html>