|SUCCESS| [v3] net/ice: add NEON-optimised Rx/Tx vector paths

dpdklab at iol.unh.edu dpdklab at iol.unh.edu
Thu Apr 9 03:37:51 CEST 2026


Test-Label: ai-code-review
Test-Status: SUCCESS
https://dpdk.org/patch/162909

_AI Code Review_

=== Patch Review: series.patch (via Claude) ===
Review date: 2026-04-09

I'll review this DPDK patch for correctness bugs, coding style compliance, and guideline violations.

git sha: Not available in patch (patchwork format)
Author: Jay Wang <jay.wang2 at arm.com>
Subject: [PATCH v3] net/ice: add NEON-optimised Rx/Tx vector paths

This patch adds NEON-optimized receive and transmit paths to the Intel
ice driver for ARM64 platforms.

> diff --git a/.mailmap b/.mailmap
> index beccc84425..dfe92b0399 100644
> --- a/.mailmap
> +++ b/.mailmap

[ ... ]

> diff --git a/drivers/net/intel/ice/ice_ethdev.h b/drivers/net/intel/ice/ice_ethdev.h
> index 4b3718f715..f6fd3bf106 100644
> --- a/drivers/net/intel/ice/ice_ethdev.h
> +++ b/drivers/net/intel/ice/ice_ethdev.h

[ ... ]

> diff --git a/drivers/net/intel/ice/ice_rxtx.c b/drivers/net/intel/ice/ice_rxtx.c
> index 31b74be9ba..b34231c212 100644
> --- a/drivers/net/intel/ice/ice_rxtx.c
> +++ b/drivers/net/intel/ice/ice_rxtx.c
> @@ -2515,7 +2515,9 @@ ice_dev_supported_ptypes_get(struct rte_eth_dev *dev, size_t *no_of_elements)
>  	    ad->rx_func_type == ICE_RX_AVX512 ||
>  	    ad->rx_func_type == ICE_RX_AVX512_SCATTERED ||
>  	    ad->rx_func_type == ICE_RX_AVX512_OFFLOAD ||
> -	    ad->rx_func_type == ICE_RX_AVX512_SCATTERED_OFFLOAD)
> +	    ad->rx_func_type == ICE_RX_AVX512_SCATTERED_OFFLOAD ||
> +	    ad->rx_func_type == ICE_RX_NEON ||
> +	    ad->rx_func_type == ICE_RX_NEON_SCATTERED)
>  		return ptypes;

Does this code need RTE_ARCH_ARM64 guards around the NEON checks? The
AVX512 checks are x86-specific, but are these enum values always defined
regardless of architecture?

> @@ -3356,6 +3358,26 @@ static const struct ci_rx_path_info ice_rx_path_infos[] = {
>  		}
>  	},
>  #endif
> +#elif defined(RTE_ARCH_ARM64)
> +	[ICE_RX_NEON] = {
> +		.pkt_burst = ice_recv_pkts_vec,
> +		.info = "Vector Neon",
> +		.features = {
> +			.rx_offloads = ICE_RX_VECTOR_OFFLOAD_OFFLOADS,
> +			.simd_width = RTE_VECT_SIMD_128,
> +			.bulk_alloc = true
> +		}
> +	},
> +	[ICE_RX_NEON_SCATTERED] = {
> +		.pkt_burst = ice_recv_scattered_pkts_vec,
> +		.info = "Vector Neon Scattered",
> +		.features = {
> +			.rx_offloads = ICE_RX_VECTOR_OFFLOAD_OFFLOADS,
> +			.simd_width = RTE_VECT_SIMD_128,
> +			.scattered = true,
> +			.bulk_alloc = true
> +		}
> +	},
>  #endif

[ ... ]

> diff --git a/drivers/net/intel/ice/ice_rxtx.h b/drivers/net/intel/ice/ice_rxtx.h
> index 77ed41f9fd..999b6b30d6 100644
> --- a/drivers/net/intel/ice/ice_rxtx.h
> +++ b/drivers/net/intel/ice/ice_rxtx.h
> @@ -261,6 +261,12 @@ const uint32_t *ice_dev_supported_ptypes_get(struct rte_eth_dev *dev,
>  void ice_select_rxd_to_pkt_fields_handler(struct ci_rx_queue *rxq,
>  					  uint32_t rxdid);
>  
> +uint16_t ice_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
> +				uint16_t nb_pkts);
> +uint16_t ice_recv_scattered_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
> +				uint16_t nb_pkts);
> +uint16_t ice_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
> +				uint16_t nb_pkts);

Are these function declarations needed in the header when they're
architecture-specific? The x86 vector implementations don't expose these
in the header. Should these be guarded with RTE_ARCH_ARM64 or declared
in a private header?

> diff --git a/drivers/net/intel/ice/ice_rxtx_vec_neon.c b/drivers/net/intel/ice/ice_rxtx_vec_neon.c
> new file mode 100644
> index 0000000000..8ec5942541
> --- /dev/null
> +++ b/drivers/net/intel/ice/ice_rxtx_vec_neon.c
> @@ -0,0 +1,761 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2026 Intel Corporation
> + * Copyright(c) 2026 Arm Limited
> + */
> +
> +#include "ice_rxtx_vec_common.h"
> +
> +#include "../common/rx_vec_arm.h"
> +
> +#include <rte_vect.h>

Does this include order violate the coding style? Per the style guide,
system includes should precede local includes. The order should be:
rte_vect.h before ice_rxtx_vec_common.h and rx_vec_arm.h.

> +static __rte_always_inline uint32x4_t
> +ice_flex_rxd_to_fdir_flags_vec(const uint32x4_t fdir_id0_3)
> +{
> +#define FDID_MIS_MAGIC 0xFFFFFFFFu

Should this local macro be undefined at the end of the function to avoid
namespace pollution?

> +static __rte_always_inline void
> +ice_flex_rxd_to_olflags_v(struct ci_rx_queue *rxq, uint64x2_t descs[4],
> +	struct rte_mbuf **rx_pkts)

Does this indentation follow the style guide? The continuation line for
rx_pkts should be aligned differently. Per DPDK style, function
parameters should be indented by one tab plus alignment.

> +	/**
> +	 * mask everything except checksum, RSS and VLAN flags
> +	 * bit fields defined in enum ice_rx_flex_desc_status_error_0_bits
> +	 * bit7:4 for checksum.
> +	 * bit12 for RSS indication.
> +	 * bit13 for VLAN indication.
> +	 */

> +	/**
> +	 * map the checksum, rss and vlan fields to the checksum, rss
> +	 * and vlan flag.
> +	 */

> +	/**
> +	 * extract status_error0 field from 4 descriptors,
> +	 * and mask out everything else not in desc_mask
> +	 */

> +	/**
> +	 * shift each 32-bit lane right by 4 so that we can use
> +	 * the checksum bit as an index into cksum_flags
> +	 */

> +	/**
> +	 * shift left by 1 bit because we shift right by 1 bit
> +	 * in cksum_flags
> +	 */

> +	/**
> +	 * we need to mask out the redundant bits introduced by RSS or
> +	 * VLAN fields.
> +	 */

> +	/**
> +	 * At this point, we have the 4 sets of flags in the low 16-bits
> +	 * of each 32-bit value in flags.
> +	 * We want to extract these, and merge them with the mbuf init data
> +	 * so we can do a single 16-byte write to the mbuf to set the flags
> +	 * and all the other initialization fields. Extracting the
> +	 * appropriate flags means that we have to do a shift and blend for
> +	 * each mbuf before we do the write.
> +	 */

Are these multi-line comments properly formatted? Several violate the
requirement that multi-line comments should have opening /* and closing
*/ on separate lines.

> +static __rte_always_inline void
> +ice_rxq_rearm(struct ci_rx_queue *rxq)
> +{
> +	ci_rxq_rearm(rxq);
> +}

Does this trivial wrapper function provide any value? It's a one-line
function that just calls another function with the same parameters.

> +	const uint8x16_t cksum_flags = {
> +		(RTE_MBUF_F_RX_OUTER_L4_CKSUM_GOOD >> 20 | RTE_MBUF_F_RX_L4_CKSUM_GOOD |
> +		 RTE_MBUF_F_RX_IP_CKSUM_GOOD) >> 1,

Does this code have excessive line length? This initialization spans
multiple lines but could be formatted more clearly with the expressions
broken at different points for readability.

> +	uint16x8_t crc_adjust = {
> +		0, 0,			/* ignore pkt_type field */
> +		rxq->crc_len,	/* sub crc on pkt_len */
> +		0,				/* ignore high-16bits of pkt_len */
> +		rxq->crc_len,	/* sub crc on data_len */
> +		0, 0, 0			/* ignore non-length fields */
> +	};

Does this comment alignment use tabs? Comments should be aligned with
tabs, not spaces.

> +	/* mask to shuffle from flex descriptor to mbuf */
> +	const uint8x16_t shuf_msk = {
> +		0xFF, 0xFF,		/* pkt_type set as unknown */
> +		0xFF, 0xFF,		/* pkt_type set as unknown */
> +		4, 5,			/* octet 4~5, low bits pkt_len */
> +		0xFF, 0xFF,		/* skip high 16 bits pkt_len, zero out */
> +		4, 5,			/* octet 4~5, 16 bits data_len */
> +		10, 11,			/* octet 10~11, 16 bits vlan_macip */
> +		0xFF, 0xFF,		/* rss hash parsed separately */
> +		0xFF, 0xFF,
> +	};

Does this trailing comma in the initializer comply with DPDK style?

> +	/* 4 packets DD mask */
> +	const uint16x8_t dd_check = {
> +		0x0001, 0x0001, 0x0001, 0x0001,
> +		0, 0, 0, 0
> +	};
> +
> +	/* 4 packets EOP mask */
> +	const uint8x16_t eop_check = {
> +		0x2, 0, 0x2, 0, 0x2, 0, 0x2, 0,
> +		0, 0, 0, 0, 0, 0, 0, 0
> +	};

> +	/* check to see if there is actually a packet available */
> +	if (!(rxdp->wb.status_error0 &
> +	      rte_cpu_to_le_32(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S)))
> +		return 0;

Does this code use implicit zero comparison? Per the coding style,
explicit comparison should be used: "== 0".

> +		uint16x8_t sterr_tmp1, sterr_tmp2;
> +		uint64x2_t mbp1, mbp2;
> +		uint16x8_t staterr;
> +		uint16x8_t tmp;
> +		uint64_t stat;

Does this variable declaration order follow reverse Christmas tree
ordering as required by DPDK style?

> +#ifndef	RTE_NET_INTEL_USE_16BYTE_DESC

Does this preprocessor directive have unnecessary whitespace? There's a
tab character before RTE_NET_INTEL_USE_16BYTE_DESC.

> +		/**
> +		 * needs to load 2nd 16B of each desc for RSS hash parsing,
> +		 * will cause performance drop to get into this context.
> +		 */

Is this multi-line comment properly formatted? The opening /** should be
on its own line.

> +			/**
> +			 * to shift the 32b RSS hash value to the
> +			 * highest 32b of each 128b before mask
> +			 */

> +			/**
> +			 * the staterr values are not in order, even though
> +			 * the count of DD bits doesn't care. However, for
> +			 * end of packet tracking, we do care, so shuffle.
> +			 * Previously: descs[3] descs[1] descs[2] descs[0]
> +			 * Shuffled: descs[0] descs[1] descs[2] descs[3]
> +			 */

Are these multi-line comments using the correct style?

> +		/**
> +		 * move the statue bit (bit0) into the sign bit (bit15)
> +		 * of each 16-bit lane
> +		 */

Does "statue" need to be "status"? This appears to be a typo.

> +		/**
> +		 * reinterpret staterr as a signed 16-bit and
> +		 * arithmetic-shift-right by 15
> +		 * each lane becomes 0xFFFF if original DD bit was 1, otherwise 0.
> +		 * then interpret back to unsigned u16 vector
> +		 */

> +		/**
> +		 * reinterpret u16x8 vector as u64x2, and fetch the low u64
> +		 * which contains the first four 16-bit lanes, and invert
> +		 * all bits.
> +		 */

> +		if (unlikely(stat == 0)) {

Does this code use explicit zero comparison correctly per the style
guide?

> +	/* check no split flags in both previous and current bursts */
> +	if (!rxq->pkt_first_seg &&
> +		split_fl64[0] == 0 && split_fl64[1] == 0 &&
> +		split_fl64[2] == 0 && split_fl64[3] == 0)

Does this condition use implicit pointer comparison? The style guide
requires "== NULL" for pointer comparisons.

> +	if (!rxq->pkt_first_seg) {

Does this code use implicit pointer comparison? Should be
"== NULL".

> +static __rte_always_inline void
> +ice_vtx1(volatile struct ci_tx


More information about the test-report mailing list