[PATCH v3] node: lookup with RISC-V vector extension
Stephen Hemminger
stephen at networkplumber.org
Mon May 4 18:21:44 CEST 2026
On Sat, 28 Mar 2026 21:53:27 +0800
sunyuechi <sunyuechi at iscas.ac.cn> wrote:
> On 2/6/26 4:16 PM, Sun Yuechi wrote:
>
> > Implement ip4_lookup_node_process_vec function for RISC-V architecture
> > using RISC-V Vector Extension instruction set
> >
> > Signed-off-by: Sun Yuechi <sunyuechi at iscas.ac.cn>
> > Signed-off-by: Zijian <zijian.oerv at isrc.iscas.ac.cn>
> > ---
> > doc/guides/rel_notes/release_26_03.rst | 4 +
> > lib/eal/riscv/include/rte_vect.h | 2 +-
> > lib/node/ip4_lookup.c | 5 +-
> > lib/node/ip4_lookup_rvv.h | 167 +++++++++++++++++++++++++
> > 4 files changed, 176 insertions(+), 2 deletions(-)
> > create mode 100644 lib/node/ip4_lookup_rvv.h
>
> ping
>
There as no ack yet.
Ran it through AI for review and it had lots of feedback.
The only item worth noting is the naming of rte_lpm_lookup_vec
which should match other arch.
---
This series adds RISC-V Vector Extension (RVV) support to the IPv4 LPM
lookup node. Patch 1/2 is a clean one-liner enabling the default SIMD
bitwidth on RISC-V; cross-checked against the arm/ppc/x86 conventions
in lib/eal/*/include/rte_vect.h, the change is correct and consistent
with how those architectures handle the same define. No findings on
patch 1/2.
Findings on patch 2/2 below.
[PATCH v4 2/2] node: lookup with RISC-V vector extension
========================================================
Warnings
--------
* lib/node/ip4_lookup_rvv.h:14: the static inline helper is named
rte_lpm_lookup_vec(). The rte_lpm_* prefix is reserved for the LPM
library's API namespace (see lib/lpm/rte_lpm*.h). Defining a static
inline with that prefix in a node-library private header is
misleading -- it implies a public LPM API where there is none.
For comparison, the SVE bulk lookup at lib/lpm/rte_lpm_sve.h:16 uses
__rte_lpm_lookup_vec (double underscore, internal) and lives in the
LPM library proper, exposed through rte_lpm.h's #undef/#define
rte_lpm_lookup_bulk override. The NEON and SSE node paths
(lib/node/ip4_lookup_neon.h:114, lib/node/ip4_lookup_sse.h:116) do
not define their own helpers at all -- they call the public
rte_lpm_lookupx4() from the LPM library.
Other static helpers in lib/node/ use the node_* prefix
(e.g. node_mbuf_priv1, node_mbuf_priv2 in lib/node/node_private.h).
Two suggested options, in order of preference:
1. Move the bulk lookup into lib/lpm/rte_lpm_rvv.h as
__rte_lpm_lookup_vec() with the same signature pattern as the SVE
version, and have lib/lpm/rte_lpm.h conditionally override
rte_lpm_lookup_bulk for the RVV case. The node path then becomes
a plain rte_lpm_lookup_bulk() call and the implementation is
reusable by other consumers (FIB, l3fwd, etc.).
2. Keep the helper local to the node header but rename it -- e.g.
ip4_lookup_rvv_lpm_lookup() or just lpm_lookup_vec() -- so it
does not occupy the rte_lpm_* namespace.
Info
----
* lib/node/ip4_lookup_rvv.h: unlike ip4_lookup_neon.h, the RVV path
does no prefetching of upcoming mbufs or packet headers. NEON
prefetches both the next-line of objs[] and the next four packets'
L3 headers. On RISC-V cores with hardware prefetchers this may be
a wash, but on cores without one the per-iteration vl-wide gather
over pkts[i] and the IPv4 header reads may stall. Worth measuring.
* lib/node/ip4_lookup_rvv.h: the per-mbuf metadata is written in two
passes -- cksum/ttl in the first loop, nh in the second. The NEON
path packs all three into a uint64_t and writes once via
node_mbuf_priv1(mbuf, dyn)->u = ...; (the overload struct is laid
out as { uint16_t nh; uint16_t ttl; uint32_t cksum; } in
rte_node_mbuf_dynfield.h:48). A single 64-bit store per mbuf would
halve the store traffic to the dynfield region.
* The release-notes entry is correctly placed under "New Features".
Consider mentioning the dependency on RTE_RISCV_FEATURE_V (i.e.
that this only activates when toolchain/-march reports the V
extension), so users on non-V RISC-V builds know why they don't
see a perf change.
Notes from cross-checking (no action needed)
--------------------------------------------
- The bswap32_vec() open-coded byte reversal is correct for the
little-endian RISC-V configuration DPDK targets (rte_byteorder.h
defines RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN unconditionally for
riscv).
- The byte-offset arithmetic for vluxei32 into tbl24 and tbl8 matches
the scalar lookup in lib/lpm/rte_lpm.h:295-320 (entry index *
sizeof(uint32_t) via <<2; tbl8 group_idx * 256 + ip_low). The
static_assert at rte_lpm.h:121 guarantees
sizeof(rte_lpm_tbl_entry) == 4.
- The mu (mask-undisturbed) policy on the second vluxei32 correctly
mirrors the scalar's "only follow tbl8 when VALID_EXT bit is set",
and per the V spec masked-off elements raise no exceptions, so the
unconditional pre-computation of vtbl8_index for masked-off lanes
is safe even when those lanes contain garbage offsets.
- vbool4_t is the correct mask type for SEW=32, LMUL=8 (ratio 4).
- RVV_MAX_BURST=64 with the outer `while (n_left_from > 0)` loop
correctly chunks the full nb_objs (up to RTE_GRAPH_BURST_SIZE=256)
through repeated vsetvl calls.
- The miss-counting heuristic `(res[i] >> 16) == (drop_nh >> 16)`
matches what NEON does at lib/node/ip4_lookup_neon.h:117-120; it
diverges from the scalar's "rc != 0" only when a user's LPM table
legitimately resolves to the drop next-node, which is the same
behavior already present in the existing vector paths.
More information about the dev
mailing list