|SUCCESS| [v4,2/2] node: lookup with RISC-V vector extension

dpdklab at iol.unh.edu dpdklab at iol.unh.edu
Tue Mar 31 11:43:14 CEST 2026


Test-Label: ai-code-review
Test-Status: SUCCESS
https://dpdk.org/patch/162784

_AI Code Review_

=== Patch Review: series.patch (via Claude) ===
Review date: 2026-03-31

I'll review this DPDK patch against the coding standards and guidelines.

```
commit c6f7a8e2c4a1b3d5e9f0a2c6d8e1f3a5b7c9d1e3
Author: Sun Yuechi <sunyuechi at iscas.ac.cn>

[PATCH v4 1/2] eal/riscv: set default SIMD bitwidth to 128

Enable vector code paths on RISC-V platforms with V extension.

> diff --git a/lib/eal/riscv/include/rte_vect.h b/lib/eal/riscv/include/rte_vect.h
> index a4357e266a..4d16082449 100644
> --- a/lib/eal/riscv/include/rte_vect.h
> +++ b/lib/eal/riscv/include/rte_vect.h
> @@ -19,7 +19,7 @@
>  extern "C" {
>  #endif
>  
> -#define RTE_VECT_DEFAULT_SIMD_BITWIDTH RTE_VECT_SIMD_DISABLED
> +#define RTE_VECT_DEFAULT_SIMD_BITWIDTH RTE_VECT_SIMD_128

This change enables vector operations unconditionally for all RISC-V
platforms. Should this be conditional on RTE_RISCV_FEATURE_V being defined,
similar to how patch 2/2 checks for the V extension at compile time? This
could cause issues on RISC-V platforms without the V extension.

[ ... ]

commit a7d9e1f2b3c4d5e6f7a8b9c0d1e2f3a4b5c6d7e8
Author: Sun Yuechi <sunyuechi at iscas.ac.cn>

[PATCH v4 2/2] node: lookup with RISC-V vector extension

Implement ip4_lookup_node_process_vec function for RISC-V architecture
using RISC-V Vector Extension instruction set.

> diff --git a/lib/node/ip4_lookup_rvv.h b/lib/node/ip4_lookup_rvv.h
> new file mode 100644
> index 0000000000..0d9db2e19b
> --- /dev/null
> +++ b/lib/node/ip4_lookup_rvv.h
> @@ -0,0 +1,164 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright (c) 2025 Institute of Software Chinese Academy of Sciences (ISCAS).
> + */
> +
> +#ifndef __INCLUDE_IP4_LOOKUP_RVV_H__
> +#define __INCLUDE_IP4_LOOKUP_RVV_H__

The header guard naming doesn't follow DPDK convention. According to the
coding style guide, header guards should match the file path. Should this be
_IP4_LOOKUP_RVV_H_ instead of __INCLUDE_IP4_LOOKUP_RVV_H__?

> +static __rte_always_inline vuint32m8_t
> +bswap32_vec(vuint32m8_t v, size_t vl)
> +{
> +	vuint32m8_t low16 = __riscv_vor_vv_u32m8(
> +		__riscv_vsll_vx_u32m8(__riscv_vand_vx_u32m8(v, 0xFF, vl), 24, vl),
> +		__riscv_vsll_vx_u32m8(__riscv_vand_vx_u32m8(v, 0xFF00, vl), 8, vl),
> +		vl);
> +
> +	vuint32m8_t high16 = __riscv_vor_vv_u32m8(
> +		__riscv_vsrl_vx_u32m8(__riscv_vand_vx_u32m8(v, 0xFF0000, vl), 8, vl),
> +		__riscv_vsrl_vx_u32m8(v, 24, vl),
> +		vl);
> +
> +	return __riscv_vor_vv_u32m8(low16, high16, vl);
> +}

The function should have a brief comment explaining its purpose. Does this
code correctly handle endianness conversion for all target platforms?

> +static __rte_always_inline void
> +rte_lpm_lookup_vec(const struct rte_lpm *lpm, const uint32_t *ips,
> +			uint32_t *hop, size_t vl, uint32_t defv)

The indentation of the second parameter line appears incorrect. According to
DPDK coding style, continuation lines should align with the opening
parenthesis of the function call.

> +{
> +	/* Load IP addresses (network byte order) */
> +	vuint32m8_t v_ip = bswap32_vec(__riscv_vle32_v_u32m8(ips, vl), vl);
> +
> +	vuint32m8_t v_tbl24_byte_offset = __riscv_vsll_vx_u32m8(
> +			__riscv_vsrl_vx_u32m8(v_ip, 8, vl), 2, vl);

Does this alignment assume the input ips array has proper alignment for
vector loads? Should there be an alignment check or documentation about
alignment requirements?

> +	vuint32m8_t vtbl_entry = __riscv_vluxei32_v_u32m8(
> +		(const uint32_t *)lpm->tbl24, v_tbl24_byte_offset, vl);

Can this unaligned indexed load cause performance issues or faults on certain
RISC-V implementations? The tbl24 table access pattern may not be optimal for
vector operations.

> +	vbool4_t mask = __riscv_vmseq_vx_u32m8_b4(
> +		__riscv_vand_vx_u32m8(vtbl_entry, RTE_LPM_VALID_EXT_ENTRY_BITMASK, vl),
> +		RTE_LPM_VALID_EXT_ENTRY_BITMASK, vl);
> +
> +	vuint32m8_t vtbl8_index = __riscv_vsll_vx_u32m8(
> +		__riscv_vadd_vv_u32m8(
> +			__riscv_vsll_vx_u32m8(
> +				__riscv_vand_vx_u32m8(vtbl_entry, 0x00FFFFFF, vl), 8, vl),
> +			__riscv_vand_vx_u32m8(v_ip, 0x000000FF, vl), vl),
> +		2, vl);
> +
> +	vtbl_entry = __riscv_vluxei32_v_u32m8_mu(
> +		mask, vtbl_entry, (const uint32_t *)(lpm->tbl8), vtbl8_index, vl);

Does this masked indexed load access tbl8 with potentially out-of-bounds
indices when mask bits are false? The vtbl8_index calculation occurs
unconditionally before the masked load.

> +/* Can be increased further for VLEN > 256 */
> +#define RVV_MAX_BURST 64U

Is this hard-coded limit appropriate for all RISC-V implementations? Should
this be computed based on actual VLEN at runtime or compile-time?

> +static uint16_t
> +ip4_lookup_node_process_vec(struct rte_graph *graph, struct rte_node *node,
> +			void **objs, uint16_t nb_objs)

Indentation issue: continuation line should align with opening parenthesis.

> +{
[ ... ]
> +	/* Temporary arrays for batch processing */
> +	uint32_t ips[RVV_MAX_BURST];
> +	uint32_t res[RVV_MAX_BURST];
> +	rte_edge_t next_hops[RVV_MAX_BURST];

These stack-allocated arrays consume 64 * 4 + 64 * 4 + 64 * 2 = 640 bytes.
Is this stack usage acceptable for all DPDK threading models? Should these be
dynamically allocated or use per-lcore storage?

> +		size_t vl = __riscv_vsetvl_e32m8(RTE_MIN(n_left_from, RVV_MAX_BURST));
> +
> +		/* Extract IP addresses and metadata from current batch */
> +		for (size_t i = 0; i < vl; i++) {
> +			struct rte_ipv4_hdr *ipv4_hdr =
> +				rte_pktmbuf_mtod_offset(pkts[i], struct rte_ipv4_hdr *,
> +						sizeof(struct rte_ether_hdr));

Does this code assume the Ethernet header is always present and at a fixed
offset? Should there be validation of packet length and header offsets?

> +			ips[i] = ipv4_hdr->dst_addr;
> +			node_mbuf_priv1(pkts[i], dyn)->cksum = ipv4_hdr->hdr_checksum;
> +			node_mbuf_priv1(pkts[i], dyn)->ttl = ipv4_hdr->time_to_live;
> +		}
> +
> +		/* Perform LPM lookup */
> +		rte_lpm_lookup_vec(lpm, ips, res, vl, drop_nh);
> +
> +		for (size_t i = 0; i < vl; i++) {
> +			/* Update statistics */
> +			if ((res[i] >> 16) == (drop_nh >> 16))
> +				NODE_INCREMENT_XSTAT_ID(node, 0, 1, 1);

According to DPDK coding style, should this comparison avoid implicit boolean
conversion? The condition should be explicitly compared: if (x != 0) rather
than if (x).

> +			/* Extract next hop and next node */
> +			node_mbuf_priv1(pkts[i], dyn)->nh = res[i] & 0xFFFF;
> +			next_hops[i] = res[i] >> 16;

Should these magic numbers 0xFFFF and 16 be defined as named constants for
clarity and maintainability?

> +		if (unlikely(fix_spec)) {
> +			/* Copy successfully speculated packets before this batch */
> +			memcpy(to_next, from, last_spec * sizeof(from[0]));

Does this memcpy() handle overlapping memory correctly if to_next and from
point to overlapping regions? Should this use rte_memcpy() for better
performance?

> +			from += last_spec;
> +			to_next += last_spec;
> +			held += last_spec;
> +			last_spec = 0;
> +
> +			/* Process each packet in current batch individually */
> +			for (size_t i = 0; i < vl; i++) {
> +				if (next_index == next_hops[i]) {
> +					*to_next++ = from[i];
> +					held++;
> +				} else {
> +					rte_node_enqueue_x1(graph, node, next_hops[i], from[i]);
> +				}
> +			}

Does this scalar fallback path for misspeculated packets provide sufficient
performance compared to keeping everything vectorized? Could the
misspeculation handling be optimized?

> +	held += last_spec;
> +	memcpy(to_next, from, last_spec * sizeof(from[0]));

Same memcpy() question as above regarding potential overlaps and whether
rte_memcpy() should be used.

> +	rte_node_next_stream_put(graph, node, next_index, held);
> +
> +	return nb_objs;
> +}
> +#endif

Missing blank line before #endif according to DPDK coding style.

```


More information about the test-report mailing list