[RFC PATCH dpdk 0/3] graph: deferred enqueue API for simplified node processing

Robin Jarry rjarry at redhat.com
Thu Feb 5 10:26:32 CET 2026


This series introduces a deferred enqueue API for the graph library that
simplifies node development while maintaining performance.

The current node implementations use a manual speculation pattern where
each node pre-allocates destination buffer slots, tracks which packets
diverge from the speculated edge, and handles fixups at the end. This
results in complex boilerplate code with multiple local variables
(to_next, from, held, last_spec), memcpy calls, and stream get/put
operations repeated across every node.

The new rte_node_enqueue_deferred() API handles this automatically:
- Tracks runs of consecutive packets going to the same edge
- Flushes runs in bulk when the edge changes
- Uses rte_node_next_stream_move() (pointer swap) when all packets
  go to the same destination
- Preserves last_edge across invocations for cross-batch speculation

The deferred state is stored in the node's fast-path cache line 1,
alongside xstat_off, keeping frequently accessed data together.

Performance was measured with l3fwd forwarding between two ports of an
Intel E810-XXV 2x25G NIC (1 RX queue per port). Two graph worker threads
ran on hyper threads of the same physical core on an Intel Xeon Silver
4316 CPU @ 2.30GHz.

Results:
- Baseline (manual speculation): 37.0 Mpps
- Deferred API:                  36.2 Mpps (-2.2%)

The slight overhead comes from per-packet edge comparisons. However,
this is offset by:
- 826 fewer lines of code across 13 node implementations
- Reduced instruction cache pressure from simpler code paths
- Elimination of per-node speculation boilerplate
- Easier development of new nodes

Robin Jarry (3):
  graph: optimize rte_node_enqueue_next to batch by edge
  graph: add deferred enqueue API for batch processing
  node: use deferred enqueue API in process functions

 app/graph/ip4_output_hook.c         |  35 +-------
 lib/graph/graph_populate.c          |   1 +
 lib/graph/rte_graph_worker_common.h |  90 ++++++++++++++++++-
 lib/node/interface_tx_feature.c     | 105 +++-------------------
 lib/node/ip4_local.c                |  36 +-------
 lib/node/ip4_lookup.c               |  37 +-------
 lib/node/ip4_lookup_fib.c           |  36 +-------
 lib/node/ip4_lookup_neon.h          | 100 ++-------------------
 lib/node/ip4_lookup_sse.h           | 100 ++-------------------
 lib/node/ip4_rewrite.c              | 120 +++----------------------
 lib/node/ip6_lookup.c               |  95 ++------------------
 lib/node/ip6_lookup_fib.c           |  34 +-------
 lib/node/ip6_rewrite.c              | 118 +++----------------------
 lib/node/pkt_cls.c                  | 130 +++-------------------------
 lib/node/udp4_input.c               |  42 +--------
 15 files changed, 170 insertions(+), 909 deletions(-)

-- 
2.52.0



More information about the dev mailing list