[PATCH v3] node: lookup with RISC-V vector extension

Stephen Hemminger stephen at networkplumber.org
Mon May 4 18:21:44 CEST 2026


On Sat, 28 Mar 2026 21:53:27 +0800
sunyuechi <sunyuechi at iscas.ac.cn> wrote:

> On 2/6/26 4:16 PM, Sun Yuechi wrote:
> 
> > Implement ip4_lookup_node_process_vec function for RISC-V architecture
> > using RISC-V Vector Extension instruction set
> >
> > Signed-off-by: Sun Yuechi <sunyuechi at iscas.ac.cn>
> > Signed-off-by: Zijian <zijian.oerv at isrc.iscas.ac.cn>
> > ---
> >   doc/guides/rel_notes/release_26_03.rst |   4 +
> >   lib/eal/riscv/include/rte_vect.h       |   2 +-
> >   lib/node/ip4_lookup.c                  |   5 +-
> >   lib/node/ip4_lookup_rvv.h              | 167 +++++++++++++++++++++++++
> >   4 files changed, 176 insertions(+), 2 deletions(-)
> >   create mode 100644 lib/node/ip4_lookup_rvv.h  
> 
> ping
> 

There as no ack yet.
Ran it through AI for review and it had lots of feedback.
The only item worth noting is the naming of rte_lpm_lookup_vec
which should match other arch.
---


This series adds RISC-V Vector Extension (RVV) support to the IPv4 LPM
lookup node.  Patch 1/2 is a clean one-liner enabling the default SIMD
bitwidth on RISC-V; cross-checked against the arm/ppc/x86 conventions
in lib/eal/*/include/rte_vect.h, the change is correct and consistent
with how those architectures handle the same define.  No findings on
patch 1/2.

Findings on patch 2/2 below.


[PATCH v4 2/2] node: lookup with RISC-V vector extension
========================================================

Warnings
--------

* lib/node/ip4_lookup_rvv.h:14: the static inline helper is named
  rte_lpm_lookup_vec().  The rte_lpm_* prefix is reserved for the LPM
  library's API namespace (see lib/lpm/rte_lpm*.h).  Defining a static
  inline with that prefix in a node-library private header is
  misleading -- it implies a public LPM API where there is none.

  For comparison, the SVE bulk lookup at lib/lpm/rte_lpm_sve.h:16 uses
  __rte_lpm_lookup_vec (double underscore, internal) and lives in the
  LPM library proper, exposed through rte_lpm.h's #undef/#define
  rte_lpm_lookup_bulk override.  The NEON and SSE node paths
  (lib/node/ip4_lookup_neon.h:114, lib/node/ip4_lookup_sse.h:116) do
  not define their own helpers at all -- they call the public
  rte_lpm_lookupx4() from the LPM library.

  Other static helpers in lib/node/ use the node_* prefix
  (e.g. node_mbuf_priv1, node_mbuf_priv2 in lib/node/node_private.h).

  Two suggested options, in order of preference:

  1. Move the bulk lookup into lib/lpm/rte_lpm_rvv.h as
     __rte_lpm_lookup_vec() with the same signature pattern as the SVE
     version, and have lib/lpm/rte_lpm.h conditionally override
     rte_lpm_lookup_bulk for the RVV case.  The node path then becomes
     a plain rte_lpm_lookup_bulk() call and the implementation is
     reusable by other consumers (FIB, l3fwd, etc.).

  2. Keep the helper local to the node header but rename it -- e.g.
     ip4_lookup_rvv_lpm_lookup() or just lpm_lookup_vec() -- so it
     does not occupy the rte_lpm_* namespace.

Info
----

* lib/node/ip4_lookup_rvv.h: unlike ip4_lookup_neon.h, the RVV path
  does no prefetching of upcoming mbufs or packet headers.  NEON
  prefetches both the next-line of objs[] and the next four packets'
  L3 headers.  On RISC-V cores with hardware prefetchers this may be
  a wash, but on cores without one the per-iteration vl-wide gather
  over pkts[i] and the IPv4 header reads may stall.  Worth measuring.

* lib/node/ip4_lookup_rvv.h: the per-mbuf metadata is written in two
  passes -- cksum/ttl in the first loop, nh in the second.  The NEON
  path packs all three into a uint64_t and writes once via
  node_mbuf_priv1(mbuf, dyn)->u = ...; (the overload struct is laid
  out as { uint16_t nh; uint16_t ttl; uint32_t cksum; } in
  rte_node_mbuf_dynfield.h:48).  A single 64-bit store per mbuf would
  halve the store traffic to the dynfield region.

* The release-notes entry is correctly placed under "New Features".
  Consider mentioning the dependency on RTE_RISCV_FEATURE_V (i.e.
  that this only activates when toolchain/-march reports the V
  extension), so users on non-V RISC-V builds know why they don't
  see a perf change.


Notes from cross-checking (no action needed)
--------------------------------------------

- The bswap32_vec() open-coded byte reversal is correct for the
  little-endian RISC-V configuration DPDK targets (rte_byteorder.h
  defines RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN unconditionally for
  riscv).

- The byte-offset arithmetic for vluxei32 into tbl24 and tbl8 matches
  the scalar lookup in lib/lpm/rte_lpm.h:295-320 (entry index *
  sizeof(uint32_t) via <<2; tbl8 group_idx * 256 + ip_low).  The
  static_assert at rte_lpm.h:121 guarantees
  sizeof(rte_lpm_tbl_entry) == 4.

- The mu (mask-undisturbed) policy on the second vluxei32 correctly
  mirrors the scalar's "only follow tbl8 when VALID_EXT bit is set",
  and per the V spec masked-off elements raise no exceptions, so the
  unconditional pre-computation of vtbl8_index for masked-off lanes
  is safe even when those lanes contain garbage offsets.

- vbool4_t is the correct mask type for SEW=32, LMUL=8 (ratio 4).

- RVV_MAX_BURST=64 with the outer `while (n_left_from > 0)` loop
  correctly chunks the full nb_objs (up to RTE_GRAPH_BURST_SIZE=256)
  through repeated vsetvl calls.

- The miss-counting heuristic `(res[i] >> 16) == (drop_nh >> 16)`
  matches what NEON does at lib/node/ip4_lookup_neon.h:117-120; it
  diverges from the scalar's "rc != 0" only when a user's LPM table
  legitimately resolves to the drop next-node, which is the same
  behavior already present in the existing vector paths.


More information about the dev mailing list