[PATCH v3] node: lookup with RISC-V vector extension

Stephen Hemminger stephen at networkplumber.org
Mon Mar 30 22:54:56 CEST 2026

Previous message (by thread): [PATCH v3] node: lookup with RISC-V vector extension
Next message (by thread): [PATCH v3] node: lookup with RISC-V vector extension
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri,  6 Feb 2026 16:16:35 +0800
Sun Yuechi <sunyuechi at iscas.ac.cn> wrote:

> Implement ip4_lookup_node_process_vec function for RISC-V architecture
> using RISC-V Vector Extension instruction set
> 
> Signed-off-by: Sun Yuechi <sunyuechi at iscas.ac.cn>
> Signed-off-by: Zijian <zijian.oerv at isrc.iscas.ac.cn>
> ---

Since RISC-V changes do not seem to get looked at, did AI review and it
found several things that need addressing.

Review: [PATCH v3] node: lookup with RISC-V vector extension

Errors
------

1. Macro redefinition of RTE_LPM_LOOKUP_SUCCESS and
   RTE_LPM_VALID_EXT_ENTRY_BITMASK (ip4_lookup_rvv.h lines 8-9).

   ip4_lookup.c already includes <rte_lpm.h> (line 14) before
   including ip4_lookup_rvv.h. Both rte_lpm.h and this header
   #define RTE_LPM_LOOKUP_SUCCESS and RTE_LPM_VALID_EXT_ENTRY_BITMASK,
   which will produce compiler warnings for macro redefinition.

   Remove both #defines from ip4_lookup_rvv.h — the values are
   already available from rte_lpm.h.

   (The upstream rte_lpm_rvv.h has the same issue, but that file is
   included from rte_lpm.h itself, so the include ordering is
   different. In the node header the double-define is unavoidable
   without removing them.)

2. RTE_VECT_DEFAULT_SIMD_BITWIDTH change is too broad
   (rte_vect.h: SIMD_DISABLED -> SIMD_128).

   This change affects every DPDK subsystem that calls
   rte_vect_get_max_simd_bitwidth() on RISC-V, not just the node
   library. It globally enables SIMD code paths across all libraries
   and drivers that gate on this value. This should be a separate
   patch with its own justification and testing, not bundled with a
   node-library feature patch. If a RISC-V platform cannot actually
   execute 128-bit vector operations at runtime, this default would
   cause failures.

Warnings
--------

3. Duplicated LPM lookup logic instead of using rte_lpm_lookupx4().

   The NEON and SSE implementations call rte_lpm_lookupx4() which is
   already vectorized for RISC-V via rte_lpm_rvv.h upstream. The
   new rte_lpm_lookup_vec() in ip4_lookup_rvv.h reimplements the
   same tbl24/tbl8 lookup logic. While the wider LMUL (m8 vs m1)
   enables larger batch sizes, duplicating LPM internals means any
   future LPM bug fix or optimization must be applied in two places.

   Consider either:
   (a) Using rte_lpm_lookupx4() in a loop (as NEON/SSE do) with a
       scalar tail, or
   (b) Adding a variable-length bulk lookup to the LPM library
       itself (e.g., extending rte_lpm_lookup_bulk to use RVV
       internally) so the node code can call it without duplicating
       table access logic.

4. No prefetching of packet data.

   The NEON and SSE implementations prefetch both mbuf object lines
   and packet data (Ethernet + IP headers) for upcoming batches.
   This implementation has no prefetch calls at all. For large
   bursts the L1 miss penalty on the rte_pktmbuf_mtod_offset access
   in the IP extraction loop could be significant. Consider adding
   rte_prefetch0 for the next batch's packet headers.

5. Stack arrays sized for VLEN > 256 may be excessive.

   RVV_MAX_BURST is 64, giving 3 * 64 * 4 = 768 bytes of stack
   arrays (ips, res, next_hops). The comment says "can be increased
   further for VLEN > 256" but the current value already exceeds
   what any in-tree RISC-V platform uses today. 32 would be more
   conservative and still handle VLEN=256 (m8 gives 64 elements at
   e32 with VLEN=256, so 64 is correct for that). This is minor but
   worth noting for stack-constrained lcore contexts.

Info
----

6. The patch is a nice addition bringing RISC-V vector support to
   the node library. The use of vsetvl for natural tail handling
   (no scalar remainder loop needed) is a good RVV idiom.

7. The fix_spec logic uses bitwise OR accumulation across the batch
   rather than the all-equal AND chain used by NEON. Both are
   correct — the OR detects any mismatch. The NEON approach detects
   exact same next-hop for all four, while the RVV approach detects
   any difference from next_index. The RVV approach is actually
   slightly more precise since it checks against the speculated
   index rather than checking all-same.

Previous message (by thread): [PATCH v3] node: lookup with RISC-V vector extension
Next message (by thread): [PATCH v3] node: lookup with RISC-V vector extension
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the dev mailing list