<html>
    <head>
      <base href="https://bugs.dpdk.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8" class="bz_new_table">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_UNCONFIRMED "
   title="UNCONFIRMED - Segmentation fault encountered in MPRQ vectorized mode"
   href="https://bugs.dpdk.org/show_bug.cgi?id=1776">1776</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>Segmentation fault encountered in MPRQ vectorized mode
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>DPDK
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>22.11
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>x86
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>Linux
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>UNCONFIRMED
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>critical
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>Normal
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>ethdev
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>dev@dpdk.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>canary.overflow@gmail.com
          </td>
        </tr>

        <tr>
          <th>Target Milestone</th>
          <td>---
          </td>
        </tr></table>
      <p>
        <div class="bz_comment_block">
          <pre class="bz_comment_text">I have been encountering segmentation fault when running DPDK in MPRQ
vectorized mode. To reproduce the issue on testpmd, run with the following
parameters:

dpdk-testpmd -l 1-5 -n 4 -a
0000:1f:00.0,rxq_comp_en=1,rxq_pkt_pad_en=1,rxqs_min_mprq=1,mprq_en=1,mprq_log_stride_num=6,mprq_log_stride_size=9,mprq_max_memcpy_len=64,rx_vec_en=1
-- -i --rxd=8192 --max-pkt-len=9000 --rxq=1 --total-num-mbufs=16384
--mbuf-size=3000 --enable-drop-en –-enable-scatter

This segmentation fault goes away when I disable vectorization (rx_vec_en=0).
(Note that the segmentation fault does not occur in forward-mode=rxonly). The
segmentation fault also seems to happen with higher chances when there is a
rxnombuf.

The backtrace of the segmentation fault was:
#0  0x0000000001c34912 in __rte_pktmbuf_free_extbuf ()
#1  0x0000000001c36a10 in rte_pktmbuf_detach ()
#2  0x0000000001c4a9ec in rxq_copy_mprq_mbuf_v ()
#3  0x0000000001c4d63b in rxq_burst_mprq_v ()
#4  0x0000000001c4d7a7 in mlx5_rx_burst_mprq_vec ()
#5  0x000000000050be66 in rte_eth_rx_burst ()
#6  0x000000000050c53d in pkt_burst_io_forward ()
#7  0x00000000005427b4 in run_pkt_fwd_on_lcore ()
#8  0x000000000054289b in start_pkt_forward_on_core ()
#9  0x0000000000a473c9 in eal_thread_loop ()
#10 0x00007ffff60061ca in start_thread () from /lib64/libpthread.so.0
#11 0x00007ffff5c72e73 in clone () from /lib64/libc.so.6

*Note that the addresses may not be exact as I've added some log statements and
attempted fixes previously (they were commented out when I obtained this
backtrace).

Upon some investigation, I noticed that in DPDK’s source codes
drivers/net/mlx5/mlx5_rxtx_vec.c (function rxq_copy_mprq_mbuf_v()), there is a
possibility where the consumed stride exceeds the stride number (64 in this
case) which should not be happening. I'm suspecting that there's some CQE
misalignment here upon encountering rxnombuf.

rxq_copy_mprq_mbuf_v(...) {
    ...
    if(rxq->consumed_strd == strd_n) {   
        // replenish WQE
    }
    ...
    strd_cnt = (elts[i]->pkt_len / strd_sz) + 
               ((elts[i]->pkt_len % strd_sz) ? 1 : 0);

    rxq_code = mprq_buf_to_pkt(rxq, elts[i], elts[i]->pkt_len, buf,
rxq->consumed_strd, strd_cnt);
    rxq->consumed_strd += strd_cnt;       // encountering cases where
rxq->consumed_strd > strd_n
    ...
}

In addition, there were also cases in mprq_buf_to_pkt() where the allocated seg
address is exactly the same as the pkt (elts[i]) address passed in which should
not happen.

mprq_buf_to_pkt(...) {
    ...
    if(hdrm_overlap > 0) {   
        MLX5_ASSERT(rxq->strd_scatter_en);
        struct rte_mbuf *seg = rte_pktmbuf_alloc(rxq->mp);
        if (unlikely(seg == NULL)) return MLX5_RXQ_CODE_NOMBUF;
        SET_DATA_OFF(seg, 0);

        // added debug statement
        // saw instances where pkt = seg
        DRV_LOG(DEBUG, "pkt %p seg %p", (void *)pkt, (void *)seg);
        rte_memcpy(rte_pktmbuf_mtod(seg, void *), RTE_PTR_ADD(addr, len -
hdrm_overlap), hdrm_overlap);
        ...
    }
}

I have tried upgrading my DPDK version to 24.11 but the segmentation fault
still persists.
          </pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are the assignee for the bug.</li>
      </ul>
      <div itemscope itemtype="http://schema.org/EmailMessage">
        <div itemprop="action" itemscope itemtype="http://schema.org/ViewAction">
          
          <link itemprop="url" href="https://bugs.dpdk.org/show_bug.cgi?id=1776">
          <meta itemprop="name" content="View bug">
        </div>
        <meta itemprop="description" content="Bugzilla bug update notification">
      </div>
    </body>
</html>