[PATCH] net/idpf: handle Tx of mbuf segments larger than 16k
Burakov, Anatoly
anatoly.burakov at intel.com
Fri Mar 6 14:45:00 CET 2026
On 3/3/2026 4:00 PM, Bruce Richardson wrote:
> Recent rework of the Tx single-queue path in idpf aligned that path with
> that of other drivers, meaning it now supports segments of size greater
> than 16k. Rework the split-queue path to similarly support those large
> segments.
>
> Fixes: 770f4dfe0f79 ("net/idpf: support basic Tx data path")
> Cc: stable at dpdk.org
>
> Signed-off-by: Bruce Richardson <bruce.richardson at intel.com>
> ---
<snip>
> uint64_t cd_qw0 = 0, cd_qw1 = 0;
> nb_ctx = idpf_set_tso_ctx(ol_flags, tx_pkt, &tx_offload, txq,
> &cd_qw0, &cd_qw1);
>
> - /* Calculate the number of TX descriptors needed for
> - * each packet. For TSO packets, use ci_calc_pkt_desc as
> - * the mbuf data size might exceed max data size that hw allows
> - * per tx desc.
> + /* Calculate the number of TX descriptors needed for each packet.
> + * For TSO packets, use ci_calc_pkt_desc as the mbuf data size
> + * might exceed the max data size that hw allows per tx desc.
> */
> - if (ol_flags & RTE_MBUF_F_TX_TCP_SEG)
> + if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))
This looks like a drive-by fix for an unrelated issue. That particular
code was introduced here:
2904020f8313 ("net/intel: add common function to calculate needed descs")
There are other drivers that check TSO flags but only look at TCP_SEG
but not UDP_SEG - should they all look for both? Perhaps this should be
looked at and fixed across all our PMD's that support TSO.
(to be clear, this is a general question, I'm not implying these changes
must be part of this patchset)
> nb_used = ci_calc_pkt_desc(tx_pkt) + nb_ctx;
> else
> nb_used = tx_pkt->nb_segs + nb_ctx;
>
> + if (txq->nb_tx_free <= txq->tx_free_thresh) {
> + /* TODO: Need to refine
> + * 1. free and clean: Better to decide a clean destination instead of
> + * loop times. And don't free mbuf when RS got immediately, free when
> + * transmit or according to the clean destination.
> + * Now, just ignore the RE write back, free mbuf when get RS
> + * 2. out-of-order rewrite back haven't be supported, SW head and HW head
> + * need to be separated.
> + **/
> + nb_to_clean = 2 * txq->tx_rs_thresh;
> + while (nb_to_clean--)
> + idpf_split_tx_free(txq->complq);
> + }
> +
> + if (txq->nb_tx_free < nb_used)
> + break;
> +
> if (ol_flags & CI_TX_CKSUM_OFFLOAD_MASK)
> cmd_dtype = IDPF_TXD_FLEX_FLOW_CMD_CS_EN;
>
> @@ -959,30 +959,52 @@ idpf_dp_splitq_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
> ctx_desc[0] = cd_qw0;
> ctx_desc[1] = cd_qw1;
>
> - tx_id++;
> - if (tx_id == txq->nb_tx_desc)
> + if (++tx_id == txq->nb_tx_desc)
> tx_id = 0;
> }
>
> + cmd_dtype |= IDPF_TX_DESC_DTYPE_FLEX_FLOW_SCHE;
> + struct rte_mbuf *m_seg = tx_pkt;
> do {
> - txd = &txr[tx_id];
> - txn = &sw_ring[txe->next_id];
> - txe->mbuf = tx_pkt;
> + uint64_t buf_dma_addr = rte_mbuf_data_iova(m_seg);
> + uint16_t slen = m_seg->data_len;
> +
> + txe->mbuf = m_seg;
CodeRabbit picked up on something here, and I think it's worth highlighting.
When we're splitting segments, we assign txe->mbuf to the first segment...
<snip>
> + txe = &sw_ring[sw_id];
> + /* sub-descriptor slots do not own the mbuf */
> + txe->mbuf = NULL;
...then set subsequent segments to NULL...
> + }
>
> - /* Setup TX descriptor */
> - txd->buf_addr =
> - rte_cpu_to_le_64(rte_mbuf_data_iova(tx_pkt));
> - cmd_dtype |= IDPF_TX_DESC_DTYPE_FLEX_FLOW_SCHE;
> + /* Write the final (or only) descriptor for this segment */
> + txd = &txr[tx_id];
> + txd->buf_addr = rte_cpu_to_le_64(buf_dma_addr);
> txd->qw1.cmd_dtype = cmd_dtype;
> - txd->qw1.rxr_bufsize = tx_pkt->data_len;
> + txd->qw1.rxr_bufsize = slen;
> txd->qw1.compl_tag = sw_id;
...and we're supposed to write the final descriptor here, but we've
stored the mbuf pointer in the *first* descriptor, not in the *last*
one, which means when this descriptor gets to processing completions,
the mbuf pointer of that descriptor will be NULL? Is that intended?
> - tx_id++;
> - if (tx_id == txq->nb_tx_desc)
> + if (++tx_id == txq->nb_tx_desc)
> tx_id = 0;
> sw_id = txe->next_id;
> - txe = txn;
> - tx_pkt = tx_pkt->next;
> - } while (tx_pkt);
> + txe = &sw_ring[sw_id];
> + m_seg = m_seg->next;
> + } while (m_seg);
>
> /* fill the last descriptor with End of Packet (EOP) bit */
> txd->qw1.cmd_dtype |= IDPF_TXD_FLEX_FLOW_CMD_EOP;
--
Thanks,
Anatoly
More information about the dev
mailing list