<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:Helvetica;
panose-1:2 11 6 4 2 2 2 2 2 4;}
@font-face
{font-family:Wingdings;
panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Consolas;
panose-1:2 11 6 9 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
pre
{mso-style-priority:99;
mso-style-link:"HTML Preformatted Char";
margin:0cm;
font-size:10.0pt;
font-family:"Courier New";}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
{mso-style-priority:34;
margin-top:0cm;
margin-right:0cm;
margin-bottom:0cm;
margin-left:36.0pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
span.HTMLPreformattedChar
{mso-style-name:"HTML Preformatted Char";
mso-style-priority:99;
mso-style-link:"HTML Preformatted";
font-family:Consolas;}
span.EmailStyle21
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
mso-ligatures:none;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:2.0cm 42.5pt 2.0cm 3.0cm;}
div.WordSection1
{page:WordSection1;}
/* List Definitions */
@list l0
{mso-list-id:942418429;
mso-list-type:hybrid;
mso-list-template-ids:-288820670 1315614256 67698691 67698693 67698689 67698691 67698693 67698689 67698691 67698693;}
@list l0:level1
{mso-level-start-at:0;
mso-level-number-format:bullet;
mso-level-text:-;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;
font-family:"Calibri",sans-serif;
mso-fareast-font-family:Calibri;}
@list l0:level2
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;
font-family:"Courier New";}
@list l0:level3
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;
font-family:Wingdings;}
@list l0:level4
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;
font-family:Symbol;}
@list l0:level5
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;
font-family:"Courier New";}
@list l0:level6
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;
font-family:Wingdings;}
@list l0:level7
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;
font-family:Symbol;}
@list l0:level8
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;
font-family:"Courier New";}
@list l0:level9
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;
font-family:Wingdings;}
ol
{margin-bottom:0cm;}
ul
{margin-bottom:0cm;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="#0563C1" vlink="#954F72" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal">Hi, Samar<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<ul style="margin-top:0cm" type="disc">
<li class="MsoListParagraph" style="margin-left:0cm;mso-list:l0 level1 lfo1">Did you start queues in secondary process?<o:p></o:p></li><li class="MsoListParagraph" style="margin-left:0cm;mso-list:l0 level1 lfo1">Only one process in any moment of time manages the queue, no shared (between process) queue data sending is allowed.<o:p></o:p></li><li class="MsoListParagraph" style="margin-left:0cm;mso-list:l0 level1 lfo1">From you description it looks like there is no completions seen (in CQE), I would recommend checking the UAR/Doorbell mapping in secondary process<o:p></o:p></li></ul>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">With best regards,<o:p></o:p></p>
<p class="MsoNormal">Slava<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div style="border:none;border-left:solid blue 1.5pt;padding:0cm 0cm 0cm 4.0pt">
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal"><b>From:</b> Samar Yadav <samar.yadav@broadcom.com> <br>
<b>Sent:</b> Friday, April 5, 2024 8:02 PM<br>
<b>To:</b> dev@dpdk.org; users@dpdk.org<br>
<b>Cc:</b> Matan Azrad <matan@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; Suanming Mou <suanmingm@nvidia.com>; Mukul Sinha <mukul.sinha@broadcom.com>; Tathagat Priyadarshi <tathagat.priyadarshi@broadcom.com>; Srinivasa
Srikanth Srikanth Podila <srinivasa-srikanth.podila@broadcom.com>; Vipin PR <vipin.pr@broadcom.com><br>
<b>Subject:</b> DPDK Secondary process not able to xmit packets with MLX5 VF<o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal">Hi all, <o:p></o:p></p>
<div>
<pre><span style="color:black">We are using 2 Mellanox VFs with DPDK v22.11 but seeing an issue when dpdk rte_proc_secondary process is trying to xmit packets out. Please note DPDK rte_proc_primary process is able to successfully xmit packets out. Issue seems to be in check_cqe as it always returns MLX5_CQE_STATUS_HW_OWN.<o:p></o:p></span></pre>
<pre><i><span style="color:black"><br>admin@10-50-54-244:~$ lspci | grep "Mellanox"<br>00:07.0 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]<br>00:08.0 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]</span></i><span style="color:black"><o:p></o:p></span></pre>
<pre><span style="color:black"><o:p> </o:p></span></pre>
<pre><span style="color:black">In our application.<o:p></o:p></span></pre>
<pre><span style="color:black">proc0 -> is DPDK rte_proc_primary which initializes the necessary shared memory data structures.<o:p></o:p></span></pre>
<pre><span style="color:black">proc1 -> is DPDK rte_proc_secondary which attaches to pre-initialized shared memory.<o:p></o:p></span></pre>
<pre><span style="color:black"><o:p> </o:p></span></pre>
<pre><span style="color:black">proc0(rte_proc_primary) uses port0(</span><i><span style="font-family:"Arial",sans-serif;color:black">00:07.0</span></i><span style="color:black">) to xmit packets out - works fine as expected.<o:p></o:p></span></pre>
<pre><span style="color:black">But proc1(rte_proc_secondary) uses port1(</span><i><span style="font-family:"Arial",sans-serif;color:black">00:08.0)</span></i><span style="color:black"> to xmit packets out - doesn't work as the packet is not seen on the wire.<o:p></o:p></span></pre>
<pre><span style="color:black"><o:p> </o:p></span></pre>
<pre><span style="color:black">code snippet for below gdb outputs<o:p></o:p></span></pre>
<pre style="text-wrap: wrap"><span style="color:black">mlx5_tx.c<o:p></o:p></span></pre>
<pre style="text-wrap: wrap"><span style="color:black">180 */<br>181 void<br>182 mlx5_tx_handle_completion(struct mlx5_txq_data *__rte_restrict txq,<br>183 unsigned int olx __rte_unused)<br>184 {<br>185 unsigned int count = MLX5_TX_COMP_MAX_CQE;<br>186 volatile struct mlx5_cqe *last_cqe = NULL;<br>187 bool ring_doorbell = false;<br>188 int ret;<br>189 <br>190 do { <br>191 volatile struct mlx5_cqe *cqe;<br>192 <br>193 cqe = &txq->cqes[txq->cq_ci & txq->cqe_m];<br>194 ret = check_cqe(cqe, txq->cqe_s, txq->cq_ci);<br>195 if (unlikely(ret != MLX5_CQE_STATUS_SW_OWN)) { <br>196 if (likely(ret != MLX5_CQE_STATUS_ERR)) { <br>197 /* No new CQEs in completion queue. */<br>198 MLX5_ASSERT(ret == MLX5_CQE_STATUS_HW_OWN);<br>199 break;<br>200 }<o:p></o:p></span></pre>
<pre style="text-wrap: wrap"><span style="color:black"><o:p> </o:p></span></pre>
<pre style="text-wrap: wrap"><span style="color:black">mlx5_common.h<o:p></o:p></span></pre>
<pre style="text-wrap: wrap"><span style="color:black">195 static __rte_always_inline enum mlx5_cqe_status <br>196 check_cqe(volatile struct mlx5_cqe *cqe, const uint16_t cqes_n,<br>197 const uint16_t ci)<br>198 {<br>199 const uint16_t idx = ci & cqes_n;<br>200 const uint8_t op_own = cqe->op_own;<br>201 const uint8_t op_owner = MLX5_CQE_OWNER(op_own);<br>202 const uint8_t op_code = MLX5_CQE_OPCODE(op_own);<br>203 <br>204 if (unlikely((op_owner != (!!(idx))) || (op_code == MLX5_CQE_INVALID)))<br>205 return MLX5_CQE_STATUS_HW_OWN;<br>206 rte_io_rmb();<br>207 if (unlikely(op_code == MLX5_CQE_RESP_ERR ||<br>208 op_code == MLX5_CQE_REQ_ERR))<br>209 return MLX5_CQE_STATUS_ERR;<br>210 return MLX5_CQE_STATUS_SW_OWN;<br>211 }<o:p></o:p></span></pre>
<p><b><u><span style="font-size:10.0pt;font-family:"Courier New";color:black">proc1(non-working process)</span></u></b><u><span style="font-size:10.0pt;font-family:"Courier New";color:black">:</span></u><span style="font-size:10.0pt;font-family:"Courier New";color:black">
we have noticed the </span><span style="font-size:10.0pt;font-family:"Arial",sans-serif;color:black">cq_ci remains 0 and doesn't increase.
</span><span style="font-size:10.0pt;font-family:"Courier New";color:black"><o:p></o:p></span></p>
</div>
<blockquote style="margin-left:30.0pt;margin-right:0cm">
<div>
<p><span style="font-size:10.0pt;font-family:"Courier New";color:black">Thread 1 "se_dp" hit Breakpoint 1, mlx5_tx_handle_completion (txq=0x6000496c72c0, olx=127)<br>
at ../../../../../../service_engine/dpdk-2211/drivers/net/mlx5/mlx5_tx.c:184<br>
184 in ../../../../../../service_engine/dpdk-2211/drivers/net/mlx5/mlx5_tx.c<br>
(gdb) n<br>
185 in ../../../../../../service_engine/dpdk-2211/drivers/net/mlx5/mlx5_tx.c<br>
(gdb) n<br>
186 in ../../../../../../service_engine/dpdk-2211/drivers/net/mlx5/mlx5_tx.c<br>
(gdb) n<br>
187 in ../../../../../../service_engine/dpdk-2211/drivers/net/mlx5/mlx5_tx.c<br>
(gdb) n<br>
193 in ../../../../../../service_engine/dpdk-2211/drivers/net/mlx5/mlx5_tx.c<br>
(gdb) n<br>
194 in ../../../../../../service_engine/dpdk-2211/drivers/net/mlx5/mlx5_tx.c<br>
(gdb) n<br>
195 in ../../../../../../service_engine/dpdk-2211/drivers/net/mlx5/mlx5_tx.c<br>
(gdb) info locals<br>
cqe = 0x60004962b000<br>
count = 2<br>
last_cqe = 0x0<br>
ring_doorbell = false<br>
ret = -2<br>
(gdb) p *txq<br>
$1 = {elts_head = 35, elts_tail = 0, elts_comp = 32, elts_s = 1024, elts_m = 1023, wqe_ci = 35,<br>
wqe_pi = 0, wqe_s = 4096, wqe_m = 4095, wqe_comp = 32, wqe_thres = 512, cq_ci = 0, cq_pi = 1,<br>
cqe_s = 64, cqe_m = 63, elts_n = 10, cqe_n = 6, wqe_n = 12, tso_en = 1, tunnel_en = 0, swp_en = 0,<br>
vlan_en = 0, db_nc = 0, db_heu = 0, rt_timestamp = 0, wait_on_time = 0, fast_free = 0,<br>
inlen_send = 18, inlen_empw = 0, inlen_mode = 18, qp_num_8s = 340992, offloads = 32815, mr_ctrl = {<br>
dev_gen_ptr = 0x60004c2d62b4, cur_gen = 0, mru = 0, head = 0, cache = {{start = 0, end = 0,<br>
lkey = 0}, {start = 0, end = 0, lkey = 0}, {start = 0, end = 0, lkey = 0}, {start = 0,<br>
end = 0, lkey = 0}, {start = 0, end = 0, lkey = 0}, {start = 0, end = 0, lkey = 0}, {<br>
start = 0, end = 0, lkey = 0}, {start = 0, end = 0, lkey = 0}}, cache_bh = {len = 1,<br>
size = 256, table = 0x6000496c5d40}}, wqes = 0x60004c255000, wqes_end = 0x60004c295000,<br>
fcqs = 0x60004c295dc0, cqes = 0x60004962b000, qp_db = 0x60004c295004, cq_db = 0x60004962c000,<br>
port_id = 1, idx = 0, rt_timemask = 0, ts_mask = 0, ts_offset = -1, sh = 0x60004b865880, stats = {<br>
opackets = 35, obytes = 2228, oerrors = 0}, stats_reset = {opackets = 0, obytes = 0, oerrors = 0},<br>
uar_data = {db = 0x0}, elts = 0x6000496c7448}<o:p></o:p></span></p>
</div>
<div>
<p><span style="font-size:10.0pt;font-family:"Courier New";color:black"><o:p> </o:p></span></p>
</div>
</blockquote>
<div>
<pre><span style="color:black">and check_cqe always returns MLX5_CQE_STATUS_HW_OWN<o:p></o:p></span></pre>
</div>
<blockquote style="margin-left:30.0pt;margin-right:0cm">
<pre><span style="color:black">(gdb)<br>194 in ../../../../../../service_engine/dpdk-2211/drivers/net/mlx5/mlx5_tx.c<br>(gdb) s<br>check_cqe (ci=0, cqes_n=64, cqe=0x60004962b000) at ../../../../../../service_engine/dpdk-2211/drivers/common/mlx5/mlx5_common.h:199<br>199 ../../../../../../service_engine/dpdk-2211/drivers/common/mlx5/mlx5_common.h: No such file or directory.<br>(gdb) n<br>200 in ../../../../../../service_engine/dpdk-2211/drivers/common/mlx5/mlx5_common.h<br>(gdb)<br>201 in ../../../../../../service_engine/dpdk-2211/drivers/common/mlx5/mlx5_common.h<br>(gdb)<br>202 in ../../../../../../service_engine/dpdk-2211/drivers/common/mlx5/mlx5_common.h<br>(gdb)<br>204 in ../../../../../../service_engine/dpdk-2211/drivers/common/mlx5/mlx5_common.h<br>(gdb) n<br>205 in ../../../../../../service_engine/dpdk-2211/drivers/common/mlx5/mlx5_common.h<br>(gdb) info locals<br>idx = 0<br>op_own = 241 '\361'<br>op_owner = 1 '\001'<br>op_code = 15 '\017'<o:p></o:p></span></pre>
</blockquote>
<div>
<pre><span style="color:black">Because of <i>check_cqe</i> return being <i>MLX5_CQE_STATUS_HW_OWN</i> , we break in line 199 in <i>mlx5_tx_handle_completion</i> and <i>ring_doorbell</i> remains <i>false</i> forever.<o:p></o:p></span></pre>
<div>
<pre><span style="color:black">Below are the logs from mlx5_txq_devx_obj_new which is called by proc0(rte_proc_primary) for port 1<o:p></o:p></span></pre>
</div>
<blockquote style="margin-left:30.0pt;margin-right:0cm">
<div>
<p style="margin:0cm;font-variant-numeric:normal;font-variant-east-asian:normal;font-variant-alternates:normal;font-kerning:auto;font-feature-settings:normal;font-stretch:normal">
<span style="font-size:10.0pt;font-family:"Helvetica",sans-serif;color:black">ppriv: 0x60004b8316c0 ,ppriv->uar_table: 0x60004b8316c8, txq_ctrl->uar_mmap_offset:0, ppriv->uar_table[txq_data->idx]:0x7f6b2d211800, txq_data->idx: 0, txq_data->db_nc:0<o:p></o:p></span></p>
</div>
</blockquote>
<div>
<pre><span style="color:black">and logs from txq_uar_init_secondary which gets called by proc1(rte_proc_secondary) for port 1<o:p></o:p></span></pre>
</div>
<blockquote style="margin-left:30.0pt;margin-right:0cm">
<div>
<pre><span style="color:black">priv: 0x60004b8352c0, priv->sh: 0x60004b865880, priv->sh->pppriv: 0x60004b8316c0<o:p></o:p></span></pre>
</div>
<div>
<pre><span style="color:black">txq_ctrl:0x6000496c71c0 priv:0x60004b8352c0<o:p></o:p></span></pre>
</div>
<div>
<pre><span style="color:black">primary_ppriv->uar_table: 0x60004b8316c8 ,uar_va:7f6b2d211800 offset:800 addr:0x7f6b3fe47800<o:p></o:p></span></pre>
</div>
<div>
<pre><span style="color:black">ppriv:0x60004962a180 ppriv->uar_table[txq->idx]:0x7f6b3fe47800, txq->idx:0<o:p></o:p></span></pre>
</div>
</blockquote>
<pre><span style="color:black"><o:p> </o:p></span></pre>
<pre><span style="color:black">Now for the working cases all the counters are incrementing as expected.<o:p></o:p></span></pre>
<pre><b><u><span style="color:black">proc0(rte_proc_primary - working case)</span></u></b><span style="color:black">: </span><span style="font-family:"Arial",sans-serif;color:black">cq_ci, cq_pi and other counters are as expected.</span><span style="color:black"><o:p></o:p></span></pre>
</div>
<blockquote style="margin-left:30.0pt;margin-right:0cm">
<div>
<pre><span style="color:black">Thread 1 "se_dp" hit Breakpoint 1, mlx5_tx_handle_completion (txq=0x60004b898940, olx=127) at ../../../../../../service_engine/dpdk-2211/drivers/net/mlx5/mlx5_tx.c:184<br>184 in ../../../../../../service_engine/dpdk-2211/drivers/net/mlx5/mlx5_tx.c<br>(gdb) n<br>185 in ../../../../../../service_engine/dpdk-2211/drivers/net/mlx5/mlx5_tx.c<br>(gdb) p *txq<br>$2 = {elts_head = 960, elts_tail = 931, elts_comp = 931, elts_s = 1024, elts_m = 1023, wqe_ci = 960, wqe_pi = 930, wqe_s = 4096, wqe_m = 4095, wqe_comp = 931, wqe_thres = 512, cq_ci = 28, cq_pi = 28, cqe_s = 64,<br> cqe_m = 63, elts_n = 10, cqe_n = 6, wqe_n = 12, tso_en = 1, tunnel_en = 0, swp_en = 0, vlan_en = 0, db_nc = 0, db_heu = 0, rt_timestamp = 0, wait_on_time = 0, fast_free = 0, inlen_send = 18, inlen_empw = 0,<br> inlen_mode = 18, qp_num_8s = 865280, offloads = 32815, mr_ctrl = {dev_gen_ptr = 0x600049a000f4, cur_gen = 0, mru = 0, head = 0, cache = {{start = 0, end = 0, lkey = 0}, {start = 0, end = 0, lkey = 0}, {<br> start = 0, end = 0, lkey = 0}, {start = 0, end = 0, lkey = 0}, {start = 0, end = 0, lkey = 0}, {start = 0, end = 0, lkey = 0}, {start = 0, end = 0, lkey = 0}, {start = 0, end = 0, lkey = 0}}, cache_bh = {<br> len = 1, size = 256, table = 0x60004b8973c0}}, wqes = 0x600049655000, wqes_end = 0x600049695000, fcqs = 0x600049697100, cqes = 0x600049696000, qp_db = 0x600049695004, cq_db = 0x600049697000, port_id = 0,<br> idx = 0, rt_timemask = 0, ts_mask = 0, ts_offset = -1, sh = 0x60004be00c40, stats = {opackets = 960, obytes = 73222, oerrors = 0}, stats_reset = {opackets = 0, obytes = 0, oerrors = 0}, uar_data = {db = 0x0},<br> elts = 0x60004b898ac8}<br>(gdb)<o:p></o:p></span></pre>
</div>
</blockquote>
<div>
<pre><span style="color:black"><o:p> </o:p></span></pre>
</div>
<div>
<pre><span style="color:black">Few questions: <o:p></o:p></span></pre>
</div>
<blockquote style="margin-left:30.0pt;margin-right:0cm">
<div>
<pre><span style="color:black">1. Why isn't the cqi counter increasing in proc1(rte_proc_secondary)? Does it mean the mlx backend hardware is not consuming the packets?<o:p></o:p></span></pre>
</div>
</blockquote>
<blockquote style="margin-left:30.0pt;margin-right:0cm">
<div>
<pre><span style="color:black">2. Why is the check_cqe stuck at MLX5_CQE_STATUS_HW_OWN in proc1(rte_proc_secondary) ?<o:p></o:p></span></pre>
<pre><span style="color:black"><o:p> </o:p></span></pre>
</div>
</blockquote>
<blockquote style="margin-left:30.0pt;margin-right:0cm">
<div>
<pre><span style="color:black">Thanks,<o:p></o:p></span></pre>
</div>
</blockquote>
<blockquote style="margin-left:30.0pt;margin-right:0cm">
<pre><span style="color:black">Samar<o:p></o:p></span></pre>
</blockquote>
</div>
<p class="MsoNormal"><br>
<span style="font-size:10.0pt;color:black;background:white">This electronic communication and the information and any files transmitted with it, or attached to it, are confidential and are intended solely for the use of the individual or entity to whom it is
addressed and may contain information that is confidential, legally privileged, protected by privacy laws, or otherwise restricted from disclosure to anyone else. If you are not the intended recipient or the person responsible for delivering the e-mail to
the intended recipient, you are hereby notified that any use, copying, distributing, dissemination, forwarding, printing, or copying of this e-mail is strictly prohibited. If you received this e-mail in error, please return the e-mail to the sender, delete
it from your computer, and destroy any printed copy of it.</span><o:p></o:p></p>
</div>
</div>
</body>
</html>