[PATCH v2 0/3] net/zxdh: optimize Rx/Tx path performance
Stephen Hemminger
stephen at networkplumber.org
Thu Apr 23 21:23:57 CEST 2026
On Thu, 23 Apr 2026 09:18:15 +0800
Junlong Wang <wang.junlong1 at zte.com.cn> wrote:
> v2:
> - zxdh_rxtx.c, pkt_padding(): modifyed the return value of pkt_padding();
> - zxdh_rxtx.c, zxdh_recv_single_pkts(): modifyed When zxdh_init_mbuf() fails
> the loop does "continue" and free mbufs;
> - zxdh_rxtx.c, refill_desc_unwrap(): Add rte_io_wmb() before writing flags
> in the refill_que_descs();
> - zxdh_queue.h, zxdh_queue_enable_intr(): Remove unnecessary function of zxdh_queue_enable_intr;
> - zxdh_ethdev.c, zxdh_init_queue(): changed the hdr_mz NULL check logic;
>
> - zxdh_rxtx.c, zxdh_xmit_pkts_simple()、zxdh_recv_single_pkts(): add stats.bytes count;
> - zxdh_rxtx.c, zxdh_init_mbuf():remove rte_pktmbuf_dump(stdout, rxm, 40);
> - zxdh_ethdev.c, zxdh_dev_free_mbufs(): using rte_pktmbuf_free() to free mbufs;
> - Splitting into separate patches, structure reorganization and sw_ring removal、
> RX recv optimize、Tx xmit optimize、Tx;
>
> v1:
> This patch optimizes the ZXDH PMD's receive and transmit path for better
> performance through several improvements:
>
> - Add simple TX/RX burst functions (zxdh_xmit_pkts_simple and
> zxdh_recv_single_pkts) for single-segment packet scenarios.
> - Remove RX software ring (sw_ring) to reduce memory allocation and
> copy.
> - Optimize descriptor management with prefetching and simplified
> cleanup.
> - Reorganize structure fields for better cache locality.
>
> These changes reduce CPU cycles and memory bandwidth consumption,
> resulting in improved packet processing throughput.
>
> Junlong Wang (3):
> net/zxdh: optimize queue structure to improve performance
> net/zxdh: optimize Rx recv pkts performance
> net/zxdh: optimize Tx xmit pkts performance
>
> drivers/net/zxdh/zxdh_ethdev.c | 88 ++---
> drivers/net/zxdh/zxdh_ethdev_ops.c | 24 +-
> drivers/net/zxdh/zxdh_ethdev_ops.h | 4 +
> drivers/net/zxdh/zxdh_pci.c | 2 +-
> drivers/net/zxdh/zxdh_queue.c | 31 +-
> drivers/net/zxdh/zxdh_queue.h | 125 +++----
> drivers/net/zxdh/zxdh_rxtx.c | 520 +++++++++++++++++++++--------
> drivers/net/zxdh/zxdh_rxtx.h | 27 +-
> 8 files changed, 527 insertions(+), 294 deletions(-)
>
Here is the full AI review of this patchset
```
Review of v2 series: net/zxdh queue/Rx/Tx optimization
Overall: needs a v3. Findings below, organized by patch.
Patch 1/3 (optimize queue structure)
------------------------------------
Warning: zxdh_queue_notify() in zxdh_queue.h now hardcodes the bit-31
avail-wrap into notify_data unconditionally, removing the
zxdh_pci_with_feature(hw, ZXDH_F_RING_PACKED) gate that the previous
out-of-line version in zxdh_pci.c had. The series also adds a
vq_split member to the virtqueue union, which suggests split-ring
support is planned. Once that exists, this helper will corrupt notify
data for split rings. Either reinstate the feature gate or keep the
dispatch going through VTPCI_OPS()->notify_queue().
Warning: rx_queue_intr_enable/rx_queue_intr_disable dev_ops and the
zxdh_queue_enable_intr() helper are removed. The commit log talks
about cache locality and sw_ring removal but not this. Please split
it into its own patch with a justification, or at minimum call it out
in the commit message.
Minor: fail_q_alloc now does "if (hdr_mz) rte_memzone_free(hdr_mz);".
rte_memzone_free() accepts NULL; the guard is unnecessary.
Minor: The new "if (hdr_mz == NULL)" check inside the VTNET_TQ branch
of zxdh_init_queue() is unreachable. hdr_mz was already validated
earlier in the function.
Minor: Doxygen close "**/" used in several places where "*/" is the
correct terminator.
Patch 2/3 (optimize Rx recv pkts performance)
---------------------------------------------
Error: Double-free in zxdh_recv_single_pkts(). Both error paths
inside zxdh_init_mbuf() already call rte_pktmbuf_free(rxm), but the
caller also frees rxm on return < 0:
if (unlikely(zxdh_init_mbuf(rxm, len, hw, &vq->rxq) < 0)) {
rte_pktmbuf_free(rxm); /* already freed inside */
continue;
}
Drop either the caller's free or the callees' frees, not both.
Warning: zxdh_set_rxtx_funcs() silently drops the
ZXDH_NET_F_MRG_RXBUF negotiation check. The previous version
returned -1 if MRG_RXBUF was not advertised; the new version selects a
burst function unconditionally. The multi-seg path
zxdh_recv_pkts_packed() reads header->type_hdr.num_buffers, which is
only meaningful when MRG_RXBUF is negotiated with the peer.
Warning: xstats "full", "norefill", "multicast_packets",
"broadcast_packets" (rx) and "norefill", "multicast_packets",
"broadcast_packets" (tx) are removed from the name tables. If these
counters were never being updated, say so in the commit log. If they
were, multicast_packets/broadcast_packets in particular are
operator-facing counters and this is a user-visible regression.
Warning: zxdh_scattered_rx() reads eth_dev->data->min_rx_buf_size,
which is populated during rx_queue_setup(). Depending on when
zxdh_set_rxtx_funcs() runs, "min_rx_buf_size - RTE_PKTMBUF_HEADROOM"
can underflow (uint16_t wraps) if min_rx_buf_size is 0 at that point.
Minor: Open-coded rte_pktmbuf_mtod(). Both the new zxdh_init_mbuf()
and the modified zxdh_recv_pkts_packed() use
(char *)rxm->buf_addr + RTE_PKTMBUF_HEADROOM
where rte_pktmbuf_mtod(rxm, struct zxdh_net_hdr_ul *) expresses the
same intent. data_off equals RTE_PKTMBUF_HEADROOM here because the
refill path aligns the hardware write target to
buf_iova + RTE_PKTMBUF_HEADROOM.
Minor: zxdh_init_mbuf() zeroes rxm->ol_flags, rxm->vlan_tci, and
rxm->vlan_tci_outer. All three are already cleared by
rte_pktmbuf_reset() on alloc from the pool.
Minor: rxq_get_vq(q) is a trivial one-line macro aliasing "q->vq"
with no functional benefit. Either drop it or apply it consistently.
Patch 3/3 (optimize Tx xmit pkts performance)
---------------------------------------------
Error: zxdh_xmit_pkts_simple() does not write txdp->id. tx_bunch()
and tx1() write addr, len, and flags but leave id untouched. The new
zxdh_xmit_fast_flush() reads "id = desc[used_idx].id" as the
chain-terminator for its inner do-while loop
("while (curr_id != id)").
Descriptors submitted by the simple path therefore carry stale ids:
either 0 at cold start from memzone init, or the self-index written
by a previous flush pass. Because the flush rewrites
desc[used_idx].id = used_idx during processing, after one full warmup
cycle every desc[i].id == i and the inner do-while happens to exit
after one iteration. But on a cold ring, or any ring whose
descriptors were left with non-self ids by a preceding append-path
burst, the inner loop keeps iterating, freeing cookies and advancing
used_idx across descriptors the backend has not marked used, until
used_idx wraps back to 0. That corrupts vq_free_cnt and
vq_used_cons_idx accounting.
Fix: set txdp->id = avail_idx + i in tx_bunch/tx1 so the invariant is
explicit rather than relying on the flush's self-rewrite side effect.
Warning: zxdh_xmit_pkts_prepare() drops the
rte_net_intel_cksum_prepare() call. If the driver still advertises
L4 checksum offload, pseudo-header checksum preparation becomes the
application's responsibility. That's a user-visible contract change
and needs justification in the commit log, or should be paired with a
matching capability change.
Warning: zxdh_xmit_enqueue_append() now sets dxp->cookie = NULL for
the head slot and stores cookies per descriptor via dep[idx].cookie.
This works with the new per-descriptor free in
zxdh_xmit_fast_flush() (rte_pktmbuf_free_seg), but any residual code
path still reading vq_descx[head_id].cookie will see NULL. Worth a
comment documenting the new invariant.
Minor: Extra initialization. In pkt_padding(),
struct zxdh_net_hdr_dl *hdr = NULL;
is immediately overwritten by the rte_pktmbuf_prepend() return. In
submit_to_backend_simple(),
struct rte_mbuf *m = NULL;
is overwritten on first use inside the loop. Drop both initializers.
Minor: "mainpart = (nb_pkts & ((uint32_t)~N_PER_LOOP_MASK));" — the
uint32_t cast is pointless. nb_pkts is uint16_t and N_PER_LOOP_MASK
is a small integer constant.
Minor: submit_to_backend_simple() uses "*(tx_pkts + i + j)" where
"tx_pkts[i + j]" reads more naturally and matches style elsewhere.
Minor: tx_bunch() is named to imply a variable batch but is hardcoded
to N_PER_LOOP iterations for single-segment packets only. A one-line
comment noting that the simple path handles single-segment only
(selected when TX_OFFLOAD_MULTI_SEGS is off) would help.
```
More information about the dev
mailing list