[PATCH v2 0/3] net/zxdh: optimize Rx/Tx path performance

Stephen Hemminger stephen at networkplumber.org
Thu Apr 23 21:23:57 CEST 2026


On Thu, 23 Apr 2026 09:18:15 +0800
Junlong Wang <wang.junlong1 at zte.com.cn> wrote:

> v2:
>   - zxdh_rxtx.c, pkt_padding(): modifyed the return value of pkt_padding();
>   - zxdh_rxtx.c, zxdh_recv_single_pkts(): modifyed When zxdh_init_mbuf() fails
>     the loop does "continue" and free mbufs;
>   - zxdh_rxtx.c, refill_desc_unwrap(): Add rte_io_wmb() before writing flags
>     in the refill_que_descs();
>   - zxdh_queue.h, zxdh_queue_enable_intr(): Remove unnecessary function of zxdh_queue_enable_intr;
>   - zxdh_ethdev.c, zxdh_init_queue(): changed the hdr_mz NULL check logic;
> 
>   - zxdh_rxtx.c, zxdh_xmit_pkts_simple()、zxdh_recv_single_pkts(): add stats.bytes count;
>   - zxdh_rxtx.c, zxdh_init_mbuf():remove  rte_pktmbuf_dump(stdout, rxm, 40);
>   - zxdh_ethdev.c, zxdh_dev_free_mbufs(): using rte_pktmbuf_free() to free mbufs;
>   - Splitting into separate patches, structure reorganization and sw_ring removal、
>     RX recv optimize、Tx xmit optimize、Tx;
> 
> v1:
>   This patch optimizes the ZXDH PMD's receive and transmit path for better
>   performance through several improvements:
> 
> - Add simple TX/RX burst functions (zxdh_xmit_pkts_simple and
>   zxdh_recv_single_pkts) for single-segment packet scenarios.
> - Remove RX software ring (sw_ring) to reduce memory allocation and
>   copy.
> - Optimize descriptor management with prefetching and simplified
>   cleanup.
> - Reorganize structure fields for better cache locality.
> 
>   These changes reduce CPU cycles and memory bandwidth consumption,
>   resulting in improved packet processing throughput.
> 
> Junlong Wang (3):
>   net/zxdh: optimize queue structure to improve performance
>   net/zxdh: optimize Rx recv pkts performance
>   net/zxdh: optimize Tx xmit pkts performance
> 
>  drivers/net/zxdh/zxdh_ethdev.c     |  88 ++---
>  drivers/net/zxdh/zxdh_ethdev_ops.c |  24 +-
>  drivers/net/zxdh/zxdh_ethdev_ops.h |   4 +
>  drivers/net/zxdh/zxdh_pci.c        |   2 +-
>  drivers/net/zxdh/zxdh_queue.c      |  31 +-
>  drivers/net/zxdh/zxdh_queue.h      | 125 +++----
>  drivers/net/zxdh/zxdh_rxtx.c       | 520 +++++++++++++++++++++--------
>  drivers/net/zxdh/zxdh_rxtx.h       |  27 +-
>  8 files changed, 527 insertions(+), 294 deletions(-)
> 

Here is the full AI review of this patchset
```
Review of v2 series: net/zxdh queue/Rx/Tx optimization

Overall: needs a v3. Findings below, organized by patch.


Patch 1/3 (optimize queue structure)
------------------------------------

Warning: zxdh_queue_notify() in zxdh_queue.h now hardcodes the bit-31
avail-wrap into notify_data unconditionally, removing the
zxdh_pci_with_feature(hw, ZXDH_F_RING_PACKED) gate that the previous
out-of-line version in zxdh_pci.c had.  The series also adds a
vq_split member to the virtqueue union, which suggests split-ring
support is planned.  Once that exists, this helper will corrupt notify
data for split rings.  Either reinstate the feature gate or keep the
dispatch going through VTPCI_OPS()->notify_queue().

Warning: rx_queue_intr_enable/rx_queue_intr_disable dev_ops and the
zxdh_queue_enable_intr() helper are removed.  The commit log talks
about cache locality and sw_ring removal but not this.  Please split
it into its own patch with a justification, or at minimum call it out
in the commit message.

Minor: fail_q_alloc now does "if (hdr_mz) rte_memzone_free(hdr_mz);".
rte_memzone_free() accepts NULL; the guard is unnecessary.

Minor: The new "if (hdr_mz == NULL)" check inside the VTNET_TQ branch
of zxdh_init_queue() is unreachable.  hdr_mz was already validated
earlier in the function.

Minor: Doxygen close "**/" used in several places where "*/" is the
correct terminator.


Patch 2/3 (optimize Rx recv pkts performance)
---------------------------------------------

Error: Double-free in zxdh_recv_single_pkts().  Both error paths
inside zxdh_init_mbuf() already call rte_pktmbuf_free(rxm), but the
caller also frees rxm on return < 0:

        if (unlikely(zxdh_init_mbuf(rxm, len, hw, &vq->rxq) < 0)) {
                rte_pktmbuf_free(rxm);   /* already freed inside */
                continue;
        }

Drop either the caller's free or the callees' frees, not both.

Warning: zxdh_set_rxtx_funcs() silently drops the
ZXDH_NET_F_MRG_RXBUF negotiation check.  The previous version
returned -1 if MRG_RXBUF was not advertised; the new version selects a
burst function unconditionally.  The multi-seg path
zxdh_recv_pkts_packed() reads header->type_hdr.num_buffers, which is
only meaningful when MRG_RXBUF is negotiated with the peer.

Warning: xstats "full", "norefill", "multicast_packets",
"broadcast_packets" (rx) and "norefill", "multicast_packets",
"broadcast_packets" (tx) are removed from the name tables.  If these
counters were never being updated, say so in the commit log.  If they
were, multicast_packets/broadcast_packets in particular are
operator-facing counters and this is a user-visible regression.

Warning: zxdh_scattered_rx() reads eth_dev->data->min_rx_buf_size,
which is populated during rx_queue_setup().  Depending on when
zxdh_set_rxtx_funcs() runs, "min_rx_buf_size - RTE_PKTMBUF_HEADROOM"
can underflow (uint16_t wraps) if min_rx_buf_size is 0 at that point.

Minor: Open-coded rte_pktmbuf_mtod().  Both the new zxdh_init_mbuf()
and the modified zxdh_recv_pkts_packed() use

        (char *)rxm->buf_addr + RTE_PKTMBUF_HEADROOM

where rte_pktmbuf_mtod(rxm, struct zxdh_net_hdr_ul *) expresses the
same intent.  data_off equals RTE_PKTMBUF_HEADROOM here because the
refill path aligns the hardware write target to
buf_iova + RTE_PKTMBUF_HEADROOM.

Minor: zxdh_init_mbuf() zeroes rxm->ol_flags, rxm->vlan_tci, and
rxm->vlan_tci_outer.  All three are already cleared by
rte_pktmbuf_reset() on alloc from the pool.

Minor: rxq_get_vq(q) is a trivial one-line macro aliasing "q->vq"
with no functional benefit.  Either drop it or apply it consistently.


Patch 3/3 (optimize Tx xmit pkts performance)
---------------------------------------------

Error: zxdh_xmit_pkts_simple() does not write txdp->id.  tx_bunch()
and tx1() write addr, len, and flags but leave id untouched.  The new
zxdh_xmit_fast_flush() reads "id = desc[used_idx].id" as the
chain-terminator for its inner do-while loop
("while (curr_id != id)").

Descriptors submitted by the simple path therefore carry stale ids:
either 0 at cold start from memzone init, or the self-index written
by a previous flush pass.  Because the flush rewrites
desc[used_idx].id = used_idx during processing, after one full warmup
cycle every desc[i].id == i and the inner do-while happens to exit
after one iteration.  But on a cold ring, or any ring whose
descriptors were left with non-self ids by a preceding append-path
burst, the inner loop keeps iterating, freeing cookies and advancing
used_idx across descriptors the backend has not marked used, until
used_idx wraps back to 0.  That corrupts vq_free_cnt and
vq_used_cons_idx accounting.

Fix: set txdp->id = avail_idx + i in tx_bunch/tx1 so the invariant is
explicit rather than relying on the flush's self-rewrite side effect.

Warning: zxdh_xmit_pkts_prepare() drops the
rte_net_intel_cksum_prepare() call.  If the driver still advertises
L4 checksum offload, pseudo-header checksum preparation becomes the
application's responsibility.  That's a user-visible contract change
and needs justification in the commit log, or should be paired with a
matching capability change.

Warning: zxdh_xmit_enqueue_append() now sets dxp->cookie = NULL for
the head slot and stores cookies per descriptor via dep[idx].cookie.
This works with the new per-descriptor free in
zxdh_xmit_fast_flush() (rte_pktmbuf_free_seg), but any residual code
path still reading vq_descx[head_id].cookie will see NULL.  Worth a
comment documenting the new invariant.

Minor: Extra initialization.  In pkt_padding(),

        struct zxdh_net_hdr_dl *hdr = NULL;

is immediately overwritten by the rte_pktmbuf_prepend() return.  In
submit_to_backend_simple(),

        struct rte_mbuf *m = NULL;

is overwritten on first use inside the loop.  Drop both initializers.

Minor: "mainpart = (nb_pkts & ((uint32_t)~N_PER_LOOP_MASK));" — the
uint32_t cast is pointless.  nb_pkts is uint16_t and N_PER_LOOP_MASK
is a small integer constant.

Minor: submit_to_backend_simple() uses "*(tx_pkts + i + j)" where
"tx_pkts[i + j]" reads more naturally and matches style elsewhere.

Minor: tx_bunch() is named to imply a variable batch but is hardcoded
to N_PER_LOOP iterations for single-segment packets only.  A one-line
comment noting that the simple path handles single-segment only
(selected when TX_OFFLOAD_MULTI_SEGS is off) would help.
```


More information about the dev mailing list