|SUCCESS| [v4,2/2] node: lookup with RISC-V vector extension
dpdklab at iol.unh.edu
dpdklab at iol.unh.edu
Tue Mar 31 11:43:14 CEST 2026
Test-Label: ai-code-review
Test-Status: SUCCESS
https://dpdk.org/patch/162784
_AI Code Review_
=== Patch Review: series.patch (via Claude) ===
Review date: 2026-03-31
I'll review this DPDK patch against the coding standards and guidelines.
```
commit c6f7a8e2c4a1b3d5e9f0a2c6d8e1f3a5b7c9d1e3
Author: Sun Yuechi <sunyuechi at iscas.ac.cn>
[PATCH v4 1/2] eal/riscv: set default SIMD bitwidth to 128
Enable vector code paths on RISC-V platforms with V extension.
> diff --git a/lib/eal/riscv/include/rte_vect.h b/lib/eal/riscv/include/rte_vect.h
> index a4357e266a..4d16082449 100644
> --- a/lib/eal/riscv/include/rte_vect.h
> +++ b/lib/eal/riscv/include/rte_vect.h
> @@ -19,7 +19,7 @@
> extern "C" {
> #endif
>
> -#define RTE_VECT_DEFAULT_SIMD_BITWIDTH RTE_VECT_SIMD_DISABLED
> +#define RTE_VECT_DEFAULT_SIMD_BITWIDTH RTE_VECT_SIMD_128
This change enables vector operations unconditionally for all RISC-V
platforms. Should this be conditional on RTE_RISCV_FEATURE_V being defined,
similar to how patch 2/2 checks for the V extension at compile time? This
could cause issues on RISC-V platforms without the V extension.
[ ... ]
commit a7d9e1f2b3c4d5e6f7a8b9c0d1e2f3a4b5c6d7e8
Author: Sun Yuechi <sunyuechi at iscas.ac.cn>
[PATCH v4 2/2] node: lookup with RISC-V vector extension
Implement ip4_lookup_node_process_vec function for RISC-V architecture
using RISC-V Vector Extension instruction set.
> diff --git a/lib/node/ip4_lookup_rvv.h b/lib/node/ip4_lookup_rvv.h
> new file mode 100644
> index 0000000000..0d9db2e19b
> --- /dev/null
> +++ b/lib/node/ip4_lookup_rvv.h
> @@ -0,0 +1,164 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright (c) 2025 Institute of Software Chinese Academy of Sciences (ISCAS).
> + */
> +
> +#ifndef __INCLUDE_IP4_LOOKUP_RVV_H__
> +#define __INCLUDE_IP4_LOOKUP_RVV_H__
The header guard naming doesn't follow DPDK convention. According to the
coding style guide, header guards should match the file path. Should this be
_IP4_LOOKUP_RVV_H_ instead of __INCLUDE_IP4_LOOKUP_RVV_H__?
> +static __rte_always_inline vuint32m8_t
> +bswap32_vec(vuint32m8_t v, size_t vl)
> +{
> + vuint32m8_t low16 = __riscv_vor_vv_u32m8(
> + __riscv_vsll_vx_u32m8(__riscv_vand_vx_u32m8(v, 0xFF, vl), 24, vl),
> + __riscv_vsll_vx_u32m8(__riscv_vand_vx_u32m8(v, 0xFF00, vl), 8, vl),
> + vl);
> +
> + vuint32m8_t high16 = __riscv_vor_vv_u32m8(
> + __riscv_vsrl_vx_u32m8(__riscv_vand_vx_u32m8(v, 0xFF0000, vl), 8, vl),
> + __riscv_vsrl_vx_u32m8(v, 24, vl),
> + vl);
> +
> + return __riscv_vor_vv_u32m8(low16, high16, vl);
> +}
The function should have a brief comment explaining its purpose. Does this
code correctly handle endianness conversion for all target platforms?
> +static __rte_always_inline void
> +rte_lpm_lookup_vec(const struct rte_lpm *lpm, const uint32_t *ips,
> + uint32_t *hop, size_t vl, uint32_t defv)
The indentation of the second parameter line appears incorrect. According to
DPDK coding style, continuation lines should align with the opening
parenthesis of the function call.
> +{
> + /* Load IP addresses (network byte order) */
> + vuint32m8_t v_ip = bswap32_vec(__riscv_vle32_v_u32m8(ips, vl), vl);
> +
> + vuint32m8_t v_tbl24_byte_offset = __riscv_vsll_vx_u32m8(
> + __riscv_vsrl_vx_u32m8(v_ip, 8, vl), 2, vl);
Does this alignment assume the input ips array has proper alignment for
vector loads? Should there be an alignment check or documentation about
alignment requirements?
> + vuint32m8_t vtbl_entry = __riscv_vluxei32_v_u32m8(
> + (const uint32_t *)lpm->tbl24, v_tbl24_byte_offset, vl);
Can this unaligned indexed load cause performance issues or faults on certain
RISC-V implementations? The tbl24 table access pattern may not be optimal for
vector operations.
> + vbool4_t mask = __riscv_vmseq_vx_u32m8_b4(
> + __riscv_vand_vx_u32m8(vtbl_entry, RTE_LPM_VALID_EXT_ENTRY_BITMASK, vl),
> + RTE_LPM_VALID_EXT_ENTRY_BITMASK, vl);
> +
> + vuint32m8_t vtbl8_index = __riscv_vsll_vx_u32m8(
> + __riscv_vadd_vv_u32m8(
> + __riscv_vsll_vx_u32m8(
> + __riscv_vand_vx_u32m8(vtbl_entry, 0x00FFFFFF, vl), 8, vl),
> + __riscv_vand_vx_u32m8(v_ip, 0x000000FF, vl), vl),
> + 2, vl);
> +
> + vtbl_entry = __riscv_vluxei32_v_u32m8_mu(
> + mask, vtbl_entry, (const uint32_t *)(lpm->tbl8), vtbl8_index, vl);
Does this masked indexed load access tbl8 with potentially out-of-bounds
indices when mask bits are false? The vtbl8_index calculation occurs
unconditionally before the masked load.
> +/* Can be increased further for VLEN > 256 */
> +#define RVV_MAX_BURST 64U
Is this hard-coded limit appropriate for all RISC-V implementations? Should
this be computed based on actual VLEN at runtime or compile-time?
> +static uint16_t
> +ip4_lookup_node_process_vec(struct rte_graph *graph, struct rte_node *node,
> + void **objs, uint16_t nb_objs)
Indentation issue: continuation line should align with opening parenthesis.
> +{
[ ... ]
> + /* Temporary arrays for batch processing */
> + uint32_t ips[RVV_MAX_BURST];
> + uint32_t res[RVV_MAX_BURST];
> + rte_edge_t next_hops[RVV_MAX_BURST];
These stack-allocated arrays consume 64 * 4 + 64 * 4 + 64 * 2 = 640 bytes.
Is this stack usage acceptable for all DPDK threading models? Should these be
dynamically allocated or use per-lcore storage?
> + size_t vl = __riscv_vsetvl_e32m8(RTE_MIN(n_left_from, RVV_MAX_BURST));
> +
> + /* Extract IP addresses and metadata from current batch */
> + for (size_t i = 0; i < vl; i++) {
> + struct rte_ipv4_hdr *ipv4_hdr =
> + rte_pktmbuf_mtod_offset(pkts[i], struct rte_ipv4_hdr *,
> + sizeof(struct rte_ether_hdr));
Does this code assume the Ethernet header is always present and at a fixed
offset? Should there be validation of packet length and header offsets?
> + ips[i] = ipv4_hdr->dst_addr;
> + node_mbuf_priv1(pkts[i], dyn)->cksum = ipv4_hdr->hdr_checksum;
> + node_mbuf_priv1(pkts[i], dyn)->ttl = ipv4_hdr->time_to_live;
> + }
> +
> + /* Perform LPM lookup */
> + rte_lpm_lookup_vec(lpm, ips, res, vl, drop_nh);
> +
> + for (size_t i = 0; i < vl; i++) {
> + /* Update statistics */
> + if ((res[i] >> 16) == (drop_nh >> 16))
> + NODE_INCREMENT_XSTAT_ID(node, 0, 1, 1);
According to DPDK coding style, should this comparison avoid implicit boolean
conversion? The condition should be explicitly compared: if (x != 0) rather
than if (x).
> + /* Extract next hop and next node */
> + node_mbuf_priv1(pkts[i], dyn)->nh = res[i] & 0xFFFF;
> + next_hops[i] = res[i] >> 16;
Should these magic numbers 0xFFFF and 16 be defined as named constants for
clarity and maintainability?
> + if (unlikely(fix_spec)) {
> + /* Copy successfully speculated packets before this batch */
> + memcpy(to_next, from, last_spec * sizeof(from[0]));
Does this memcpy() handle overlapping memory correctly if to_next and from
point to overlapping regions? Should this use rte_memcpy() for better
performance?
> + from += last_spec;
> + to_next += last_spec;
> + held += last_spec;
> + last_spec = 0;
> +
> + /* Process each packet in current batch individually */
> + for (size_t i = 0; i < vl; i++) {
> + if (next_index == next_hops[i]) {
> + *to_next++ = from[i];
> + held++;
> + } else {
> + rte_node_enqueue_x1(graph, node, next_hops[i], from[i]);
> + }
> + }
Does this scalar fallback path for misspeculated packets provide sufficient
performance compared to keeping everything vectorized? Could the
misspeculation handling be optimized?
> + held += last_spec;
> + memcpy(to_next, from, last_spec * sizeof(from[0]));
Same memcpy() question as above regarding potential overlaps and whether
rte_memcpy() should be used.
> + rte_node_next_stream_put(graph, node, next_index, held);
> +
> + return nb_objs;
> +}
> +#endif
Missing blank line before #endif according to DPDK coding style.
```
More information about the test-report
mailing list