[EXTERNAL] [PATCH dpdk] graph: replace circular buffer with priority-based bitmap scheduling
Kiran Kumar Kokkilagadda
kirankumark at marvell.com
Wed Jun 10 12:51:48 CEST 2026
From: Robin Jarry <rjarry at redhat.com>
Date: Tuesday, 19 May 2026 at 3:43 PM
To: dev at dpdk.org <dev at dpdk.org>; Jerin Jacob <jerinj at marvell.com>; Kiran Kumar Kokkilagadda <kirankumark at marvell.com>; Nithin Kumar Dabilpuram <ndabilpuram at marvell.com>; Zhirun Yan <yanzhirun_163 at 163.com>
Cc: Vladimir Medvedkin <vladimir.medvedkin at intel.com>; Christophe Fontaine <cfontain at redhat.com>; David Marchand <david.marchand at redhat.com>; Konstantin Ananyev <konstantin.ananyev at huawei.com>; Maxime Leroy <maxime at leroys.fr>
Subject: [EXTERNAL] [PATCH dpdk] graph: replace circular buffer with priority-based bitmap scheduling
Replace the FIFO circular buffer used to track pending nodes with
a bitmap and a priority-sorted schedule table. Each node can now have
a scheduling priority (int16_t, default 0, lower value means visited
first). Source nodes are forced to INT16_MIN so they always run first.
At graph creation time, nodes are sorted by (priority, id) and assigned
a bit position (sched_idx). During the walk, a bitmap tracks which nodes
have pending objects. Scanning from the lowest bit naturally visits
nodes in priority order.
This improves batching in fan-out-then-converge topologies. When
eth_input classifies packets to both mpls_input and ipv4_input, the old
FIFO order could process ipv4_input before mpls_input, causing
ipv4_input to be visited twice (once before and once after the MPLS
label is popped). With mpls_input at a higher priority (lower value), it
runs first and its output accumulates in ipv4_input which is then
visited only once with all packets.
The bitmap set operation is idempotent (OR on an already-set bit is
a no-op) which removes the need for the idx == 0 guards that the
circular buffer required to avoid duplicate enqueue.
Suggested-by: Vladimir Medvedkin <vladimir.medvedkin at intel.com>
Signed-off-by: Robin Jarry <rjarry at redhat.com>
Cc: Christophe Fontaine <cfontain at redhat.com>
Cc: David Marchand <david.marchand at redhat.com>
Cc: Konstantin Ananyev <konstantin.ananyev at huawei.com>
Cc: Maxime Leroy <maxime at leroys.fr>
---
doc/guides/prog_guide/graph_lib.rst | 37 +-
.../prog_guide/img/graph_mem_layout.svg | 1823 +++++++----------
lib/graph/graph.c | 19 +-
lib/graph/graph_debug.c | 12 +-
lib/graph/graph_populate.c | 117 +-
lib/graph/graph_private.h | 27 +-
lib/graph/node.c | 2 +
lib/graph/rte_graph.h | 1 +
lib/graph/rte_graph_model_mcore_dispatch.h | 34 +-
lib/graph/rte_graph_model_rtc.h | 65 +-
lib/graph/rte_graph_worker.h | 2 +-
lib/graph/rte_graph_worker_common.h | 81 +-
12 files changed, 984 insertions(+), 1236 deletions(-)
diff --git a/doc/guides/prog_guide/graph_lib.rst b/doc/guides/prog_guide/graph_lib.rst
index 8409e7666e85..9c6d8679b686 100644
--- a/doc/guides/prog_guide/graph_lib.rst
+++ b/doc/guides/prog_guide/graph_lib.rst
@@ -117,13 +117,22 @@ next_node[]:
The dynamic array to store the downstream nodes connected to this node. Downstream
node should not be current node itself or a source node.
+priority:
+^^^^^^^^^
+
+The scheduling priority of the node (``int16_t``, default 0). Nodes with lower
+priority values are visited first during the graph walk. This allows control
+over the order in which pending nodes are processed, which can improve packet
+batching in topologies where multiple paths converge on the same node.
+
Source node:
^^^^^^^^^^^^
Source nodes are static nodes created using ``RTE_NODE_REGISTER`` by passing
``flags`` as ``RTE_NODE_SOURCE_F``.
-While performing the graph walk, the ``process()`` function of all the source
-nodes will be called first. So that these nodes can be used as input nodes for a graph.
+Source nodes are automatically assigned the lowest possible priority
+(``INT16_MIN``) so that their ``process()`` function is always called first
+during the graph walk. This ensures they act as input nodes for a graph.
nb_xstats:
^^^^^^^^^^
@@ -396,12 +405,26 @@ Graph object memory layout
Understanding the memory layout helps to debug the graph library and
improve the performance if needed.
-Graph object consists of a header, circular buffer to store the pending stream
-when walking over the graph, variable-length memory to store the ``rte_node`` objects,
-and variable-length memory to store the xstat reported by each ``rte_node``.
+A graph object consists of a header, a scheduling table mapping bit positions to
+node offsets, pending and source bitmaps for tracking which nodes need
+processing, variable-length memory to store the ``rte_node`` objects, and
+variable-length memory to store the xstat reported by each ``rte_node``.
-The graph_nodes_mem_create() creates and populate this memory. The functions
-such as ``rte_graph_walk()`` and ``rte_node_enqueue_*`` use this memory
+Nodes are sorted by ``(priority, node_id)`` at graph creation time and each
+node is assigned a bit position in the pending bitmap. During the graph walk,
+the bitmap is scanned from the lowest bit, so nodes with lower priority values
+are visited first. Source nodes are always assigned the lowest priority
+(``INT16_MIN``) to ensure they run before any processing node.
+
+This priority-based ordering improves batching in fan-out-then-converge
+topologies. For example, if ``eth_input`` classifies packets to both
+``mpls_input`` and ``ipv4_input``, giving ``mpls_input`` a lower priority value
+ensures it runs first. Its output accumulates in ``ipv4_input`` which is then
+visited only once with all packets, instead of being visited twice (before and
+after MPLS label popping).
+
+The ``graph_fp_mem_create()`` function creates and populates this memory. The
+functions such as ``rte_graph_walk()`` and ``rte_node_enqueue_*`` use this memory
to enable fastpath services.
diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 6911ea8abeed..6dc1402e6bd0 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -334,20 +334,6 @@ graph_mem_fixup_secondary(struct rte_graph *graph)
return graph_mem_fixup_node_ctx(graph);
}
-static bool
-graph_src_node_avail(struct graph *graph)
-{
- struct graph_node *graph_node;
-
- STAILQ_FOREACH(graph_node, &graph->node_list, next)
- if ((graph_node->node->flags & RTE_NODE_SOURCE_F) &&
- (graph_node->node->lcore_id == RTE_MAX_LCORE ||
- graph->lcore_id == graph_node->node->lcore_id))
- return true;
-
- return false;
-}
-
RTE_EXPORT_SYMBOL(rte_graph_model_mcore_dispatch_core_bind)
int
rte_graph_model_mcore_dispatch_core_bind(rte_graph_t id, int lcore)
@@ -375,9 +361,8 @@ rte_graph_model_mcore_dispatch_core_bind(rte_graph_t id, int lcore)
graph->graph->dispatch.lcore_id = graph->lcore_id;
graph->socket = rte_lcore_to_socket_id(lcore);
- /* check the availability of source node */
- if (!graph_src_node_avail(graph))
- graph->graph->head = 0;
+ /* Rebuild source bitmap with only source nodes bound to this lcore */
+ graph_src_bitmap_rebuild(graph);
return 0;
diff --git a/lib/graph/graph_debug.c b/lib/graph/graph_debug.c
index e3b8cccdc1f0..8e99fa1b0fb8 100644
--- a/lib/graph/graph_debug.c
+++ b/lib/graph/graph_debug.c
@@ -15,8 +15,8 @@ graph_dump(FILE *f, struct graph *g)
fprintf(f, "graph <%s>\n", g->name);
fprintf(f, " id=%" PRIu32 "\n", g->id);
- fprintf(f, " cir_start=%" PRIu32 "\n", g->cir_start);
- fprintf(f, " cir_mask=%" PRIu32 "\n", g->cir_mask);
+ fprintf(f, " sched_table_off=%" PRIu32 "\n", g->sched_table_off);
+ fprintf(f, " nb_sched_words=%" PRIu16 "\n", g->nb_sched_words);
fprintf(f, " addr=%p\n", g);
fprintf(f, " graph=%p\n", g->graph);
fprintf(f, " mem_sz=%zu\n", g->mem_sz);
@@ -63,14 +63,14 @@ rte_graph_obj_dump(FILE *f, struct rte_graph *g, bool all)
fprintf(f, "graph <%s> @ %p\n", g->name, g);
fprintf(f, " id=%" PRIu32 "\n", g->id);
- fprintf(f, " head=%" PRId32 "\n", (int32_t)g->head);
- fprintf(f, " tail=%" PRId32 "\n", (int32_t)g->tail);
- fprintf(f, " cir_mask=0x%" PRIx32 "\n", g->cir_mask);
fprintf(f, " nb_nodes=%" PRId32 "\n", g->nb_nodes);
+ fprintf(f, " nb_sched_words=%" PRIu16 "\n", g->nb_sched_words);
fprintf(f, " socket=%d\n", g->socket);
fprintf(f, " fence=0x%" PRIx64 "\n", g->fence);
fprintf(f, " nodes_start=0x%" PRIx32 "\n", g->nodes_start);
- fprintf(f, " cir_start=%p\n", g->cir_start);
+ fprintf(f, " sched_table=%p\n", g->sched_table);
+ fprintf(f, " pending=%p\n", g->pending);
+ fprintf(f, " src_pending=%p\n", g->src_pending);
rte_graph_foreach_node(count, off, g, n) {
if (!all && n->idx == 0)
diff --git a/lib/graph/graph_populate.c b/lib/graph/graph_populate.c
index 026daecb2122..45bc7704fede 100644
--- a/lib/graph/graph_populate.c
+++ b/lib/graph/graph_populate.c
@@ -3,6 +3,8 @@
*/
+#include <stdlib.h>
+
#include <rte_common.h>
#include <rte_errno.h>
#include <rte_malloc.h>
@@ -15,19 +17,27 @@ static size_t
graph_fp_mem_calc_size(struct graph *graph)
{
struct graph_node *graph_node;
- rte_node_t val;
+ uint16_t nwords;
size_t sz;
/* Graph header */
sz = sizeof(struct rte_graph);
- /* Source nodes list */
- sz += sizeof(rte_graph_off_t) * graph->src_node_count;
- /* Circular buffer for pending streams of size number of nodes */
- val = rte_align32pow2(graph->node_count * sizeof(rte_graph_off_t));
- sz = RTE_ALIGN(sz, val);
- graph->cir_start = sz;
- graph->cir_mask = rte_align32pow2(graph->node_count) - 1;
- sz += val;
+
+ /* Schedule table: node offset indexed by sched_idx */
+ sz = RTE_ALIGN(sz, RTE_CACHE_LINE_SIZE);
+ graph->sched_table_off = sz;
+ sz += sizeof(rte_graph_off_t) * graph->node_count;
+
+ /* Pending and source pending bitmaps */
+ nwords = (graph->node_count + 63) / 64;
+ graph->nb_sched_words = nwords;
+ sz = RTE_ALIGN(sz, RTE_CACHE_LINE_SIZE);
+ graph->pending_off = sz;
+ sz += sizeof(uint64_t) * nwords;
+ sz = RTE_ALIGN(sz, RTE_CACHE_LINE_SIZE);
+ graph->src_pending_off = sz;
+ sz += sizeof(uint64_t) * nwords;
+
/* Fence */
sz += sizeof(RTE_GRAPH_FENCE);
sz = RTE_ALIGN(sz, RTE_CACHE_LINE_SIZE);
@@ -54,20 +64,44 @@ graph_fp_mem_calc_size(struct graph *graph)
}
static void
-graph_header_popluate(struct graph *_graph)
+graph_header_populate(struct graph *_graph)
{
struct rte_graph *graph = _graph->graph;
- graph->tail = 0;
- graph->head = (int32_t)-_graph->src_node_count;
- graph->cir_mask = _graph->cir_mask;
graph->nb_nodes = _graph->node_count;
- graph->cir_start = RTE_PTR_ADD(graph, _graph->cir_start);
+ graph->nb_sched_words = _graph->nb_sched_words;
+ graph->sched_table = RTE_PTR_ADD(graph, _graph->sched_table_off);
+ graph->pending = RTE_PTR_ADD(graph, _graph->pending_off);
+ graph->src_pending = RTE_PTR_ADD(graph, _graph->src_pending_off);
graph->nodes_start = _graph->nodes_start;
graph->socket = _graph->socket;
graph->id = _graph->id;
memcpy(graph->name, _graph->name, RTE_GRAPH_NAMESIZE);
graph->fence = RTE_GRAPH_FENCE;
+
+ memset(graph->pending, 0, sizeof(uint64_t) * _graph->nb_sched_words);
+ memset(graph->src_pending, 0, sizeof(uint64_t) * _graph->nb_sched_words);
+}
+
+static int16_t
+graph_node_effective_priority(const struct graph_node *gn)
+{
+ if (gn->node->flags & RTE_NODE_SOURCE_F)
+ return INT16_MIN;
+ return gn->node->priority;
+}
+
+static int
+graph_node_priority_cmp(const void *a, const void *b)
+{
+ const struct graph_node *const *na = a;
+ const struct graph_node *const *nb = b;
+ int16_t pa = graph_node_effective_priority(*na);
+ int16_t pb = graph_node_effective_priority(*nb);
+
+ if (pa != pb)
+ return (int)pa - (int)pb;
+ return (int)(*na)->node->id - (int)(*nb)->node->id;
}
static void
@@ -76,15 +110,26 @@ graph_nodes_populate(struct graph *_graph)
rte_graph_off_t xstat_off = _graph->xstats_start;
rte_graph_off_t off = _graph->nodes_start;
struct rte_graph *graph = _graph->graph;
- struct graph_node *graph_node;
+ struct graph_node **sorted, *graph_node;
rte_edge_t count, nb_edges;
rte_node_t pid;
+ uint32_t n;
- STAILQ_FOREACH(graph_node, &_graph->node_list, next) {
+ /* Build a sorted array of graph_node pointers by (priority, id) */
+ sorted = calloc(_graph->node_count, sizeof(*sorted));
+ RTE_VERIFY(sorted != NULL);
+ n = 0;
+ STAILQ_FOREACH(graph_node, &_graph->node_list, next)
+ sorted[n++] = graph_node;
+ qsort(sorted, n, sizeof(*sorted), graph_node_priority_cmp);
+
+ for (n = 0; n < _graph->node_count; n++) {
+ graph_node = sorted[n];
struct rte_node *node = RTE_PTR_ADD(graph, off);
memset(node, 0, sizeof(*node));
node->fence = RTE_GRAPH_FENCE;
node->off = off;
+ node->sched_idx = n;
if (graph_pcap_is_enable()) {
node->process = graph_pcap_dispatch;
node->original_process = graph_node->node->process;
@@ -123,8 +168,14 @@ graph_nodes_populate(struct graph *_graph)
off += sizeof(struct rte_node *) * nb_edges;
off = RTE_ALIGN(off, RTE_CACHE_LINE_SIZE);
node->next = off;
+
+ /* Fill the schedule table */
+ graph->sched_table[n] = node->off;
+
__rte_node_stream_alloc(graph, node);
}
+
+ free(sorted);
}
struct rte_node *
@@ -179,12 +230,11 @@ graph_node_nexts_populate(struct graph *_graph)
}
static int
-graph_src_nodes_offset_populate(struct graph *_graph)
+graph_src_bitmap_populate(struct graph *_graph)
{
struct rte_graph *graph = _graph->graph;
struct graph_node *graph_node;
struct rte_node *node;
- int32_t head = -1;
const char *name;
STAILQ_FOREACH(graph_node, &_graph->node_list, next) {
@@ -195,7 +245,7 @@ graph_src_nodes_offset_populate(struct graph *_graph)
SET_ERR_JMP(EINVAL, fail, "%s not found", name);
__rte_node_stream_alloc(graph, node);
- graph->cir_start[head--] = node->off;
+ __rte_node_pending_set(graph->src_pending, node);
}
}
@@ -204,17 +254,42 @@ graph_src_nodes_offset_populate(struct graph *_graph)
return -rte_errno;
}
+void
+graph_src_bitmap_rebuild(struct graph *_graph)
+{
+ struct rte_graph *graph = _graph->graph;
+ struct graph_node *graph_node;
+ struct rte_node *node;
+ const char *name;
+
+ memset(graph->src_pending, 0,
+ sizeof(uint64_t) * graph->nb_sched_words);
+
+ STAILQ_FOREACH(graph_node, &_graph->node_list, next) {
+ if (!(graph_node->node->flags & RTE_NODE_SOURCE_F))
+ continue;
+ if (graph_node->node->lcore_id != RTE_MAX_LCORE &&
+ graph_node->node->lcore_id != _graph->lcore_id)
+ continue;
+ name = graph_node->node->name;
+ node = graph_node_name_to_ptr(graph, name);
+ if (node == NULL)
+ continue;
+ __rte_node_pending_set(graph->src_pending, node);
+ }
+}
+
static int
graph_fp_mem_populate(struct graph *graph)
{
int rc;
- graph_header_popluate(graph);
+ graph_header_populate(graph);
if (graph_pcap_is_enable())
graph_pcap_init(graph);
graph_nodes_populate(graph);
rc = graph_node_nexts_populate(graph);
- rc |= graph_src_nodes_offset_populate(graph);
+ rc |= graph_src_bitmap_populate(graph);
return rc;
}
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 26cdc6637192..df6f83b20261 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -49,6 +49,7 @@ struct node {
STAILQ_ENTRY(node) next; /**< Next node in the list. */
char name[RTE_NODE_NAMESIZE]; /**< Name of the node. */
uint64_t flags; /**< Node configuration flag. */
+ int16_t priority; /**< Scheduling priority. */
unsigned int lcore_id;
/**< Node runs on the Lcore ID used for mcore dispatch model. */
rte_node_process_t process; /**< Node process function. */
@@ -98,19 +99,23 @@ struct graph {
const struct rte_memzone *mz;
/**< Memzone to store graph data. */
rte_graph_off_t nodes_start;
- /**< Node memory start offset in graph reel. */
+ /**< Node memory start offset in graph memory. */
rte_graph_off_t xstats_start;
- /**< Node xstats memory start offset in graph reel. */
+ /**< Node xstats memory start offset in graph memory. */
rte_node_t src_node_count;
/**< Number of source nodes in a graph. */
struct rte_graph *graph;
/**< Pointer to graph data. */
rte_node_t node_count;
/**< Total number of nodes. */
- uint32_t cir_start;
- /**< Circular buffer start offset in graph reel. */
- uint32_t cir_mask;
- /**< Circular buffer mask for wrap around. */
+ uint32_t sched_table_off;
+ /**< Schedule table start offset in graph memory. */
+ uint32_t pending_off;
+ /**< Pending bitmap start offset in graph memory. */
+ uint32_t src_pending_off;
+ /**< Source pending bitmap start offset in graph memory. */
+ uint16_t nb_sched_words;
+ /**< Number of uint64_t words in pending bitmaps. */
rte_graph_t id;
/**< Graph identifier. */
rte_graph_t parent_id;
@@ -378,6 +383,16 @@ int graph_fp_mem_create(struct graph *graph);
*/
int graph_fp_mem_destroy(struct graph *graph);
+/**
+ * @internal
+ *
+ * Rebuild the source pending bitmap based on lcore affinity.
+ *
+ * @param graph
+ * Pointer to the internal graph object.
+ */
+void graph_src_bitmap_rebuild(struct graph *graph);
+
/* Lookup functions */
/**
* @internal
diff --git a/lib/graph/node.c b/lib/graph/node.c
index e3359fe490a5..b5599143b37b 100644
--- a/lib/graph/node.c
+++ b/lib/graph/node.c
@@ -153,6 +153,7 @@ __rte_node_register(const struct rte_node_register *reg)
if (rte_strscpy(node->name, reg->name, RTE_NODE_NAMESIZE) < 0)
goto free_xstat;
node->flags = reg->flags;
+ node->priority = reg->priority;
node->process = reg->process;
node->init = reg->init;
node->fini = reg->fini;
@@ -216,6 +217,7 @@ node_clone(struct node *node, const char *name)
/* Clone the source node */
reg->flags = node->flags;
+ reg->priority = node->priority;
reg->process = node->process;
reg->init = node->init;
reg->fini = node->fini;
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index 7e433f466129..6cd32ec22284 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -496,6 +496,7 @@ struct rte_node_register {
char name[RTE_NODE_NAMESIZE]; /**< Name of the node. */
uint64_t flags; /**< Node configuration flag. */
#define RTE_NODE_SOURCE_F (1ULL << 0) /**< Node type is source. */
+ int16_t priority; /**< Scheduling priority (lower = visited first, default 0). */
This will break the ABI. Please run ABI check and see/fix.
rte_node_process_t process; /**< Node process function. */
rte_node_init_t init; /**< Node init function. */
rte_node_fini_t fini; /**< Node fini function. */
diff --git a/lib/graph/rte_graph_model_mcore_dispatch.h b/lib/graph/rte_graph_model_mcore_dispatch.h
index f9ff3daa88ec..50a473564b56 100644
--- a/lib/graph/rte_graph_model_mcore_dispatch.h
+++ b/lib/graph/rte_graph_model_mcore_dispatch.h
@@ -77,9 +77,13 @@ int rte_graph_model_mcore_dispatch_node_lcore_affinity_set(const char *name,
unsigned int lcore_id);
/**
- * Perform graph walk on the circular buffer and invoke the process function
+ * Perform graph walk on the pending bitmap and invoke the process function
* of the nodes and collect the stats.
*
+ * Nodes are visited in scheduling order (lowest priority value first).
+ * Source nodes are seeded into the pending bitmap at the start of each walk.
+ * Nodes with different lcore affinity are dispatched to their target lcore.
+ *
* @param graph
* Graph pointer returned from rte_graph_lookup function.
*
@@ -88,20 +92,28 @@ int rte_graph_model_mcore_dispatch_node_lcore_affinity_set(const char *name,
static inline void
rte_graph_walk_mcore_dispatch(struct rte_graph *graph)
{
- const rte_graph_off_t *cir_start = graph->cir_start;
- const rte_node_t mask = graph->cir_mask;
- uint32_t head = graph->head;
+ const uint16_t nwords = graph->nb_sched_words;
struct rte_node *node;
+ uint16_t word, bit;
if (graph->dispatch.wq != NULL)
__rte_graph_mcore_dispatch_sched_wq_process(graph);
- while (likely(head != graph->tail)) {
- node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
+ /* Seed pending bitmap with source nodes bound to this lcore */
+ for (word = 0; word < nwords; word++)
+ graph->pending[word] |= graph->src_pending[word];
- /* skip the src nodes which not bind with current worker */
- if ((int32_t)head < 1 && node->dispatch.lcore_id != graph->dispatch.lcore_id)
- continue;
+ for (;;) {
+ /* find first word with any pending bit */
+ for (word = 0; word < nwords; word++)
+ if (graph->pending[word])
+ break;
+ if (word == nwords)
+ break; /* no more pending nodes */
+
+ bit = rte_ctz64(graph->pending[word]);
+ graph->pending[word] &= ~(1ULL << bit);
+ node = __rte_graph_pending_node(graph, word, bit);
/* Schedule the node until all task/objs are done */
if (node->dispatch.lcore_id != RTE_MAX_LCORE &&
@@ -111,11 +123,7 @@ rte_graph_walk_mcore_dispatch(struct rte_graph *graph)
continue;
__rte_node_process(graph, node);
-
- head = likely((int32_t)head > 0) ? head & mask : head;
}
-
- graph->tail = 0;
}
#ifdef __cplusplus
diff --git a/lib/graph/rte_graph_model_rtc.h b/lib/graph/rte_graph_model_rtc.h
index 4b6236e301e3..38feb3e1ca88 100644
--- a/lib/graph/rte_graph_model_rtc.h
+++ b/lib/graph/rte_graph_model_rtc.h
@@ -6,9 +6,12 @@
#include "rte_graph_worker_common.h"
/**
- * Perform graph walk on the circular buffer and invoke the process function
+ * Perform graph walk on the pending bitmap and invoke the process function
* of the nodes and collect the stats.
*
+ * Nodes are visited in scheduling order (lowest priority value first).
+ * Source nodes are seeded into the pending bitmap at the start of each walk.
+ *
* @param graph
* Graph pointer returned from rte_graph_lookup function.
*
@@ -17,30 +20,52 @@
static inline void
rte_graph_walk_rtc(struct rte_graph *graph)
{
- const rte_graph_off_t *cir_start = graph->cir_start;
- const rte_node_t mask = graph->cir_mask;
- uint32_t head = graph->head;
+ const uint16_t nwords = graph->nb_sched_words;
struct rte_node *node;
+ uint16_t word, bit;
/*
- * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
- * on the pending streams (cir_start -> (cir_start + mask) -> cir_start)
- * in a circular buffer fashion.
+ * Nodes are assigned a bit position (sched_idx) sorted by (priority,
+ * node_id) at graph creation time. Source nodes are forced to INT16_MIN
+ * priority so they always come first.
*
- * +-----+ <= cir_start - head [number of source nodes]
- * | |
- * | ... | <= source nodes
- * | |
- * +-----+ <= cir_start [head = 0] [tail = 0]
- * | |
- * | ... | <= pending streams
- * | |
- * +-----+ <= cir_start + mask
+ * sched_table[] maps bit positions to node offsets:
+ *
+ * pending[] sched_table[]
+ * +----------+ +------------------+
+ * | word 0 | ---> | src_node_0 | bit 0 (prio=INT16_MIN)
+ * | 1100...1 | | src_node_1 | bit 1 (prio=INT16_MIN)
+ * | | | mpls_input | bit 2 (prio=-10)
+ * | | | ipv4_input | bit 3 (prio=0)
+ * | | | ... |
+ * +----------+ +------------------+
+ * | word 1 | ---> | ip4_rewrite | bit 64 (prio=10)
+ * | ... | | ... |
+ * +----------+ +------------------+
+ *
+ * Walk: for each word, find lowest set bit (rte_ctz64), process that
+ * node, clear the bit, re-read the word (processing may have set new
+ * bits), repeat.
+ *
+ * After each node is processed, restart scanning from word 0 since
+ * processing may set bits in any word, including earlier ones.
*/
- while (likely(head != graph->tail)) {
- node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
+
+ /* Seed pending bitmap with source nodes */
+ for (word = 0; word < nwords; word++)
+ graph->pending[word] |= graph->src_pending[word];
+
+ for (;;) {
+ /* find first word with any pending bit */
+ for (word = 0; word < nwords; word++)
+ if (graph->pending[word])
+ break;
+ if (word == nwords)
+ break; /* no more pending nodes */
+
+ bit = rte_ctz64(graph->pending[word]);
+ graph->pending[word] &= ~(1ULL << bit);
+ node = __rte_graph_pending_node(graph, word, bit);
__rte_node_process(graph, node);
- head = likely((int32_t)head > 0) ? head & mask : head;
}
- graph->tail = 0;
}
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
index b0f952a82cc9..e513d7a655d9 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker.h
@@ -14,7 +14,7 @@ extern "C" {
#endif
/**
- * Perform graph walk on the circular buffer and invoke the process function
+ * Perform graph walk on the pending bitmap and invoke the process function
* of the nodes and collect the stats.
*
* @param graph
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 4ab53a533e4c..e52a37ce5e84 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -49,15 +49,14 @@ SLIST_HEAD(rte_graph_rq_head, rte_graph);
*/
struct __rte_cache_aligned rte_graph {
/* Fast path area. */
- uint32_t tail; /**< Tail of circular buffer. */
- uint32_t head; /**< Head of circular buffer. */
- uint32_t cir_mask; /**< Circular buffer wrap around mask. */
rte_node_t nb_nodes; /**< Number of nodes in the graph. */
- rte_graph_off_t *cir_start; /**< Pointer to circular buffer. */
rte_graph_off_t nodes_start; /**< Offset at which node memory starts. */
+ rte_graph_off_t *sched_table; /**< Node offset indexed by sched_idx. */
+ uint64_t *pending; /**< Bitmap of pending nodes. */
+ uint64_t *src_pending; /**< Bitmap of source nodes (constant). */
+ uint16_t nb_sched_words; /**< Number of uint64_t words in pending bitmaps. */
uint8_t model; /**< graph model */
- uint8_t reserved1; /**< Reserved for future use. */
- uint16_t reserved2; /**< Reserved for future use. */
+ /* 26 bytes padding */
union {
/* Fast schedule area for mcore dispatch model */
struct {
@@ -98,6 +97,7 @@ struct __rte_cache_aligned rte_node {
rte_node_t id; /**< Node identifier. */
rte_node_t parent_id; /**< Parent Node identifier. */
rte_edge_t nb_edges; /**< Number of edges from this node. */
+ uint16_t sched_idx; /**< Bit position in pending bitmap. */
uint32_t realloc_count; /**< Number of times realloced. */
char parent[RTE_NODE_NAMESIZE]; /**< Parent node name. */
@@ -132,7 +132,7 @@ struct __rte_cache_aligned rte_node {
}; /**< Node Context. */
uint16_t size; /**< Total number of objects available. */
uint16_t idx; /**< Number of objects used. */
- rte_graph_off_t off; /**< Offset of node in the graph reel. */
+ rte_graph_off_t off; /**< Offset of node in the graph memory. */
uint64_t total_cycles; /**< Cycles spent in this node. */
uint64_t total_calls; /**< Calls done to this node. */
uint64_t total_objs; /**< Objects processed by this node. */
@@ -187,12 +187,12 @@ void __rte_node_stream_alloc_size(struct rte_graph *graph,
/**
* @internal
*
- * Enqueue a given node to the tail of the graph reel.
+ * Process a node's pending objects and collect stats.
*
* @param graph
* Pointer Graph object.
* @param node
- * Pointer to node object to be enqueued.
+ * Pointer to node object to be processed.
*/
static __rte_always_inline void
__rte_node_process(struct rte_graph *graph, struct rte_node *node)
@@ -220,21 +220,42 @@ __rte_node_process(struct rte_graph *graph, struct rte_node *node)
/**
* @internal
*
- * Enqueue a given node to the tail of the graph reel.
+ * Get a pointer to a node from the scheduling table.
*
* @param graph
* Pointer Graph object.
+ * @param word
+ * Offset in the pending bitmap.
+ * @param bit
+ * Bit number.
+ *
+ * @return
+ * Pointer to the node.
+ */
+static __rte_always_inline struct rte_node *
+__rte_graph_pending_node(struct rte_graph *graph, uint16_t word, uint16_t bit)
+{
+ const uint16_t index = (word * sizeof(*graph->pending) * CHAR_BIT) + bit;
+ const rte_graph_off_t node_offset = graph->sched_table[index];
+ return RTE_PTR_ADD(graph, node_offset);
+}
+
+/**
+ * @internal
+ *
+ * Mark a node as pending in the graph scheduling bitmap.
+ *
+ * @param bitmap
+ * Either graph->pending or graph->src_pending.
* @param node
- * Pointer to node object to be enqueued.
+ * Pointer to node object to be marked pending.
*/
static __rte_always_inline void
-__rte_node_enqueue_tail_update(struct rte_graph *graph, struct rte_node *node)
+__rte_node_pending_set(uint64_t *bitmap, struct rte_node *node)
{
- uint32_t tail;
-
- tail = graph->tail;
- graph->cir_start[tail++] = node->off;
- graph->tail = tail & graph->cir_mask;
+ const uint16_t word = node->sched_idx / (sizeof(*bitmap) * CHAR_BIT);
+ const uint16_t bit = node->sched_idx % (sizeof(*bitmap) * CHAR_BIT);
+ bitmap[word] |= 1ULL << bit;
}
/**
@@ -242,8 +263,8 @@ __rte_node_enqueue_tail_update(struct rte_graph *graph, struct rte_node *node)
*
* Enqueue sequence prologue function.
*
- * Updates the node to tail of graph reel and resizes the number of objects
- * available in the stream as needed.
+ * Marks the node as pending in the scheduling bitmap and resizes the number
+ * of objects available in the stream as needed.
*
* @param graph
* Pointer to the graph object.
@@ -259,9 +280,7 @@ __rte_node_enqueue_prologue(struct rte_graph *graph, struct rte_node *node,
const uint16_t idx, const uint16_t space)
{
- /* Add to the pending stream list if the node is new */
- if (idx == 0)
- __rte_node_enqueue_tail_update(graph, node);
+ __rte_node_pending_set(graph->pending, node);
if (unlikely(node->size < (idx + space)))
__rte_node_stream_alloc_size(graph, node, node->size + space);
@@ -293,7 +312,7 @@ __rte_node_next_node_get(struct rte_node *node, rte_edge_t next)
/**
* Enqueue the objs to next node for further processing and set
- * the next node to pending state in the circular buffer.
+ * the next node to pending state in the scheduling bitmap.
*
* @param graph
* Graph pointer returned from rte_graph_lookup().
@@ -321,7 +340,7 @@ rte_node_enqueue(struct rte_graph *graph, struct rte_node *node,
/**
* Enqueue only one obj to next node for further processing and
- * set the next node to pending state in the circular buffer.
+ * set the next node to pending state in the scheduling bitmap.
*
* @param graph
* Graph pointer returned from rte_graph_lookup().
@@ -347,7 +366,7 @@ rte_node_enqueue_x1(struct rte_graph *graph, struct rte_node *node,
/**
* Enqueue only two objs to next node for further processing and
- * set the next node to pending state in the circular buffer.
+ * set the next node to pending state in the scheduling bitmap.
* Same as rte_node_enqueue_x1 but enqueue two objs.
*
* @param graph
@@ -377,7 +396,7 @@ rte_node_enqueue_x2(struct rte_graph *graph, struct rte_node *node,
/**
* Enqueue only four objs to next node for further processing and
- * set the next node to pending state in the circular buffer.
+ * set the next node to pending state in the scheduling bitmap.
* Same as rte_node_enqueue_x1 but enqueue four objs.
*
* @param graph
@@ -414,7 +433,7 @@ rte_node_enqueue_x4(struct rte_graph *graph, struct rte_node *node,
/**
* Enqueue objs to multiple next nodes for further processing and
- * set the next nodes to pending state in the circular buffer.
+ * set the next nodes to pending state in the scheduling bitmap.
* objs[i] will be enqueued to nexts[i].
*
* @param graph
@@ -472,7 +491,7 @@ rte_node_next_stream_get(struct rte_graph *graph, struct rte_node *node,
}
/**
- * Put the next stream to pending state in the circular buffer
+ * Put the next stream to pending state in the scheduling bitmap
* for further processing. Should be invoked after rte_node_next_stream_get().
*
* @param graph
@@ -495,9 +514,7 @@ rte_node_next_stream_put(struct rte_graph *graph, struct rte_node *node,
return;
node = __rte_node_next_node_get(node, next);
- if (node->idx == 0)
- __rte_node_enqueue_tail_update(graph, node);
-
+ __rte_node_pending_set(graph->pending, node);
node->idx += idx;
}
@@ -530,7 +547,7 @@ rte_node_next_stream_move(struct rte_graph *graph, struct rte_node *src,
src->objs = dobjs;
src->size = dsz;
dst->idx = src->idx;
- __rte_node_enqueue_tail_update(graph, dst);
+ __rte_node_pending_set(graph->pending, dst);
} else { /* Move the objects from src node to dst node */
rte_node_enqueue(graph, src, next, src->objs, src->idx);
}
--
2.54.0
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mails.dpdk.org/archives/dev/attachments/20260610/2b77ab12/attachment-0001.htm>
More information about the dev
mailing list