[PATCH v5 0/3] eal/topology: introduce topology-aware lcore grouping
Varghese, Vipin
Vipin.Varghese at amd.com
Mon Jun 15 16:34:13 CEST 2026
AMD General
Hi All,
Quick update, implementing CPU_ALLOC instead of rte_cpu_set is turing out to eat a lot of memory.
I am working on logic to use CPU_ALLOC when CPU_SIZE > rte_cpu_Set capacity or find smarter ways not to pre-alloc extra.
Note: for accommodating CPU alloc we need to do it for all valid l1, l2, l3 and l4. On 384 single socket machine it exhausts memory (heap) like anything.
> -----Original Message-----
> From: Varghese, Vipin <Vipin.Varghese at amd.com>
> Sent: Thursday, May 28, 2026 12:48 PM
> To: Stephen Hemminger <stephen at networkplumber.org>
> Cc: dev at dpdk.org; Tummala, Sivaprasad <Sivaprasad.Tummala at amd.com>;
> konstantin.ananyev at huawei.com; wathsala.vithanage at arm.com;
> bruce.richardson at intel.com; viktorin at cesnet.cz; mb at smartsharesystems.com
> Subject: RE: [PATCH v5 0/3] eal/topology: introduce topology-aware lcore grouping
>
> Caution: This message originated from an External Source. Use proper caution
> when opening attachments, clicking links, or responding.
>
>
> AMD General
>
> Sharing v6 with fixes and cache-id fetch shortly.
>
> > -----Original Message-----
> > From: Stephen Hemminger <stephen at networkplumber.org>
> > Sent: Wednesday, April 15, 2026 1:52 AM
> > To: Varghese, Vipin <Vipin.Varghese at amd.com>
> > Cc: dev at dpdk.org; Tummala, Sivaprasad <Sivaprasad.Tummala at amd.com>;
> > konstantin.ananyev at huawei.com; wathsala.vithanage at arm.com;
> > bruce.richardson at intel.com; viktorin at cesnet.cz;
> > mb at smartsharesystems.com
> > Subject: Re: [PATCH v5 0/3] eal/topology: introduce topology-aware
> > lcore grouping
> >
> > Caution: This message originated from an External Source. Use proper
> > caution when opening attachments, clicking links, or responding.
> >
> >
> > On Wed, 15 Apr 2026 01:08:18 +0530
> > Vipin Varghese <vipin.varghese at amd.com> wrote:
> >
> > > This series introduces a topology library that groups DPDK lcores
> > > based on CPU cache hierarchy and NUMA topology. The goal is to
> > > provide a stable and explicit API that allows applications to select
> > > lcores with better locality and cache sharing characteristics.
> > >
> > > The series includes:
> > > - EAL support for topology discovery using hwloc and domain-based lcore
> > > grouping (L1/L2/L3/L4/NUMA)
> > > - Topology-aware test cases validating API behavior and edge conditions
> > > - Programmer’s guide describing the topology library and APIs
> > >
> > > The API is marked experimental and does not change existing lcore
> > > behavior unless explicitly used by the application.
> > >
> > > Changes in v5:
> > > - Addressed review comments from v4
> > > - Fixed ARM cross-compilation issues
> > > - Cleaned up domain iteration and error handling
> > > - Updated tests to cover domain edge cases
> > > - Documentation refinements and API usage clarification
> > >
> > > Changes in v4:
> > > - Corrected domain selection semantics
> > > - Updated example usage
> > > - Fixed naming and typo issues
> > >
> > > Changes in v3:
> > > - Fixed macro naming (USE_NO_TOPOLOGY)
> > > - Minor cleanups based on early feedback
> > >
> > > Tested on:
> > > - AMD EPYC (Milan, Genoa, Siena, Turin, Turin-Dense, Sorano)
> > > - Intel Xeon (SPR-SP, GNR-SP)
> > > - ARM Ampere
> > > - NVIDIA Grace Superchip
> > >
> > > Dependencies:
> > > - hwloc-dev (tested with 2.10.0)
> > >
> > > Patch breakdown:
> > > 1/3 eal/topology: add topology grouping for lcores
> > > 2/3 app: add topology-aware test cases
> > > 3/3 doc: add topology library documentation
> > >
> > > Future Work:
> > > - integrate into examples
> > > -- hellowrld: ready
> > > -- pkt-distributor: in-progress
> > > -- l2fwd: ready
> > > -- l3fwd: to start
> > > -- eventdevpipeline: PoC ready
> > > - integrate topology test
> > > -- crypto: yet to start
> > > -- compression: yet to start
> > > -- dma: PoC ready
> > > - add new features for
> > > -- PQoS: yet to start
> > > -- Data Injection: PoC with BRDCM Thor-2 ready
> > >
> > > Tested OS: Linux only, need help with BSD and Windows
> > >
> > > Tested with and without hwloc-dev library for
> > > - Ampere, aarch64, Neoverse-N1, NUMA-2, 256 CPU threads
> > > - Grace superchip, aarch64, Neoverse-V2, NUMA-2, 144 CPU threads
> > > - Intel GNR-SP, 6767P, NUMA-2, 256 Threads
> > > - AMD EPYC Siena, 8534P, NUMA-1, 128 Threads
> > > - AMD EPYC Sorano, 8635P, NUMA-1, 168 Threads
> > >
> > > Signed-off-by: Vipin Varghese <vipin.varghese at amd.com> ``
> > >
> > > Vipin Varghese (3):
> > > eal/topology: add Topology grouping for lcores
> > > app: add topology aware test case
> > > doc: add new section topology
> > >
> > > app/test/meson.build | 1 +
> > > app/test/test_ring_perf.c | 416 +++++++++++++-
> > > app/test/test_stack_perf.c | 409 ++++++++++++++
> > > app/test/test_topology.c | 676 ++++++++++++++++++++++
> > > config/meson.build | 18 +
> > > doc/api/doxy-api-index.md | 1 +
> > > doc/guides/prog_guide/index.rst | 3 +-
> > > doc/guides/prog_guide/topology_lib.rst | 155 +++++
> > > lib/eal/common/eal_private.h | 74 +++
> > > lib/eal/common/eal_topology.c | 747 +++++++++++++++++++++++++
> > > lib/eal/common/meson.build | 1 +
> > > lib/eal/freebsd/eal.c | 10 +-
> > > lib/eal/include/meson.build | 1 +
> > > lib/eal/include/rte_topology.h | 255 +++++++++
> > > lib/eal/linux/eal.c | 7 +
> > > lib/eal/meson.build | 4 +
> > > 16 files changed, 2773 insertions(+), 5 deletions(-) create mode
> > > 100644 app/test/test_topology.c create mode 100644
> > > doc/guides/prog_guide/topology_lib.rst
> > > create mode 100644 lib/eal/common/eal_topology.c create mode
> > > 100644 lib/eal/include/rte_topology.h
> > >
> >
> > AI review reported lots of possible correctness problems:
> >
> >
> > # DPDK Patch Review - bundle-1843-topology-v5.mbox
> >
> > ## Overview
> >
> > This patch series introduces topology awareness to DPDK, allowing
> > applications to select logical cores based on CPU cache and I/O
> > topology. The series adds a new `rte_topology` library with APIs to
> > query and enumerate logical cores within topology domains (L1/L2/L3/L4/NUMA).
> >
> > ---
> >
> > ## Correctness Bugs (HIGHEST PRIORITY - report at >=50% confidence)
> >
> > ### Patch 1/3: lib/eal/common/eal_topology.c
> >
> > #### **Error: Resource leak on `eal_topology_map_layer()` failure**
> >
> > In `rte_eal_topology_init()`, if any call to
> > `eal_topology_map_layer()` fails (returns < 0), the code calls
> > `rte_eal_topology_release()` which frees previously allocated layers.
> > However, the `hwloc_topology_t` handle `topo_cnfg.topology` is **NOT**
> destroyed before returning. This leaks the hwloc topology handle.
> >
> > **Location:** `lib/eal/common/eal_topology.c:685-712`
> >
> > ```c
> > for (int i = 0; i < 5; i++) {
> > *layers[i].count = hwloc_get_nbobjs_by_depth(topo_cnfg.topology,
> > layers[i].depth);
> > if (eal_topology_map_layer(topo_cnfg.topology, layers[i].depth, layers[i].count,
> > layers[i].ptr, layers[i].total_cores, layers[i].name) < 0) {
> > rte_eal_topology_release(); /* frees layer memory */
> > return -1; /* BUG: topo_cnfg.topology NOT destroyed here */
> > }
> > }
> >
> > hwloc_topology_destroy(topo_cnfg.topology); /* only reached on
> > success */ topo_cnfg.topology = NULL; ```
> >
> > **Fix:** Destroy the topology before returning on error:
> >
> > ```c
> > if (eal_topology_map_layer(...) < 0) {
> > hwloc_topology_destroy(topo_cnfg.topology);
> > topo_cnfg.topology = NULL;
> > rte_eal_topology_release();
> > return -1;
> > }
> > ```
> >
> > ---
> >
> > #### **Error: Potential use-after-free in `eal_topology_map_layer()`
> > on partial allocation failure**
> >
> > In `eal_topology_map_layer()`, the code allocates `dm->cores` for each
> > domain. If a later allocation in the same loop iteration fails (e.g.,
> > for domain `j+1`), the function returns -1 immediately **without
> > freeing `dm->cores` already allocated for earlier domains**. The
> > caller (`rte_eal_topology_init()`) then calls
> > `rte_eal_topology_release()`, which expects `layer_ptr[j]` to be
> > non-NULL but `layer_ptr[j]->cores` may be uninitialized or garbage if
> > the allocation failed before that point. This can cause a use-after-free or double-
> free when `rte_eal_topology_release()` calls `rte_free(d->map[i]->cores)`.
> >
> > **Location:** `lib/eal/common/eal_topology.c:537-557`
> >
> > ```c
> > for (uint16_t j = 0; j < *layer_cnt; j++) {
> > hwloc_obj_t obj = hwloc_get_obj_by_depth(topology, depth, j);
> > int cpu_count = hwloc_bitmap_weight(obj->cpuset);
> > if (cpu_count == -1)
> > continue;
> >
> > struct core_domain_mapping *dm =
> > rte_zmalloc(NULL, sizeof(struct core_domain_mapping), 0);
> > if (!dm)
> > return -1; /* BUG: leaks layer_ptr array allocated by caller
> > */
> >
> > (*layer_ptr)[j] = dm;
> > CPU_ZERO(&dm->core_set);
> > dm->core_count = 0;
> >
> > dm->cores = rte_malloc(NULL, sizeof(uint16_t) * cpu_count, 0);
> > if (!dm->cores)
> > return -1; /* BUG: leaks dm (just allocated) and previous
> > entries */ } ```
> >
> > **Fix:** On allocation failure, free all previously allocated entries before returning:
> >
> > ```c
> > if (!dm) {
> > /* Free all previously allocated entries */
> > for (uint16_t k = 0; k < j; k++) {
> > if ((*layer_ptr)[k]) {
> > rte_free((*layer_ptr)[k]->cores);
> > rte_free((*layer_ptr)[k]);
> > }
> > }
> > rte_free(*layer_ptr);
> > return -1;
> > }
> >
> > /* Same cleanup for dm->cores allocation failure */ if (!dm->cores) {
> > rte_free(dm);
> > for (uint16_t k = 0; k < j; k++) {
> > if ((*layer_ptr)[k]) {
> > rte_free((*layer_ptr)[k]->cores);
> > rte_free((*layer_ptr)[k]);
> > }
> > }
> > rte_free(*layer_ptr);
> > return -1;
> > }
> > ```
> >
> > ---
> >
> > #### **Error: NULL pointer dereference in `eal_topology_map_layer()`
> > when `hwloc_get_obj_by_depth()` returns NULL**
> >
> > In the second loop of `eal_topology_map_layer()` (lines 560-620), the
> > code calls `hwloc_get_obj_by_depth(topology, depth, j)` and checks `if
> > (!obj || hwloc_bitmap_iszero(obj->cpuset))` to skip NULL objects.
> > However, in the **inner
> > loop** (lines 579-618), the code calls
> > `hwloc_get_obj_by_depth(topology, depth, k)` and assigns it to
> > `obj_core` but does **NOT** check if `obj_core` is NULL before calling
> > `hwloc_bitmap_weight(obj_core->cpuset)` on line 581. If
> > `hwloc_get_obj_by_depth()` returns NULL for domain `k`, this will dereference a
> NULL pointer.
> >
> > **Location:** `lib/eal/common/eal_topology.c:579-582`
> >
> > ```c
> > for (uint16_t k = 0; k < *layer_cnt; k++) {
> > hwloc_obj_t obj_core = hwloc_get_obj_by_depth(topology, depth, k);
> > int cpu_count_core = hwloc_bitmap_weight(obj_core->cpuset); /*
> > NULL deref if obj_core == NULL */
> > if (cpu_count_core == -1)
> > continue;
> > ```
> >
> > **Fix:** Check `obj_core` before dereferencing:
> >
> > ```c
> > hwloc_obj_t obj_core = hwloc_get_obj_by_depth(topology, depth, k); if (!obj_core)
> > continue;
> > int cpu_count_core = hwloc_bitmap_weight(obj_core->cpuset);
> > if (cpu_count_core == -1)
> > continue;
> > ```
> >
> > ---
> >
> > #### **Error: Incorrect second argument to
> > `rte_topo_get_nth_lcore_from_domain()` in `get_same_l1_domains()`
> > (test_ring_perf.c and test_stack_perf.c)**
> >
> > In both `app/test/test_ring_perf.c:290` and
> > `app/test/test_stack_perf.c:258`, the function `get_same_l1_domains()` calls:
> >
> > ```c
> > id2 = rte_topo_get_nth_lcore_from_domain(domain, 0, 0,
> > RTE_TOPO_DOMAIN_L1); ```
> >
> > The second argument (`lcore_pos`) is `0`, which is the same as for
> > `id1`. This will assign **the same lcore** to both `id1` and `id2`,
> > causing the subsequent check `if
> > (id1 == id2) return 3;` to always trigger. This is a logic error: the
> > intent is clearly to get two **different** lcores from the same domain.
> >
> > **Location:** `app/test/test_ring_perf.c:287-290` and
> > `app/test/test_stack_perf.c:255-258`
> >
> > **Fix:** Use position `1` for the second lcore:
> >
> > ```c
> > id1 = rte_topo_get_nth_lcore_from_domain(domain, 0, 0,
> > RTE_TOPO_DOMAIN_L1);
> > id2 = rte_topo_get_nth_lcore_from_domain(domain, 1, 0,
> > RTE_TOPO_DOMAIN_L1); ```
> >
> > ---
> >
> > #### **Error: Iteration condition in `test_main_lcore_in_domain()`
> > uses wrong domain type for lookup**
> >
> > In `app/test/test_topology.c:211`, the loop iterates over
> > `domain_count` for `domain_types[d]`, but the call to
> > `rte_topo_is_main_lcore_in_domain()` uses `RTE_TOPO_DOMAIN_NUMA`
> > instead of `domain_types[d]`. This means the test only checks the NUMA
> > domain regardless of which domain type `d` selects (L1/L2/L3/L4).
> >
> > **Location:** `app/test/test_topology.c:206-216`
> >
> > ```c
> > for (unsigned int d = 0; d < RTE_DIM(domain_types); d++) {
> > bool main_lcore_found = false;
> > unsigned int domain_count = rte_topo_get_domain_count(domain_types[d]);
> > for (unsigned int dmn_idx = 0; dmn_idx < domain_count; dmn_idx++) {
> > main_lcore_found =
> > rte_topo_is_main_lcore_in_domain(RTE_TOPO_DOMAIN_NUMA, /* BUG:
> should
> > be domain_types[d] */
> > dmn_idx);
> > if (main_lcore_found)
> > break;
> > }
> > ```
> >
> > **Fix:**
> >
> > ```c
> > main_lcore_found = rte_topo_is_main_lcore_in_domain(domain_types[d],
> > dmn_idx); ```
> >
> > ---
> >
> > #### **Error: Infinite loop risk in
> > `rte_topo_get_nth_lcore_from_domain()` when `ptr-
> > >core_count` is 0**
> >
> > In `lib/eal/common/eal_topology.c:296-318`, the function enters a
> > `while (1)` loop that increments `new_lcore_pos`. If `ptr->core_count`
> > is 0 (which the code checks earlier but does not return immediately),
> > the loop will wrap `new_lcore_pos` back to
> > 0 indefinitely, never breaking. While the function returns
> > `RTE_MAX_LCORE` if `ptr-
> > >core_count == 0` before the loop, the logic flow is unclear and the
> > >loop body does
> > not have a clear termination condition if the core count is 0.
> >
> > **Location:** `lib/eal/common/eal_topology.c:283-318`
> >
> > **Fix:** Add a sanity check inside the loop to prevent infinite iteration:
> >
> > ```c
> > unsigned int iterations = 0;
> > while (1) {
> > if (iterations++ > ptr->core_count * 2) /* safety limit */
> > return RTE_MAX_LCORE;
> > /* ... rest of loop ... */
> > }
> > ```
> >
> > However, the real issue is that the code already returns
> > `RTE_MAX_LCORE` if `ptr->core_count == 0` on line 287, so this is more
> > of a defensive-programming note. The function should be refactored for clarity.
> >
> > ---
> >
> > #### **Error: Missing NULL check after `get_domain_lcore_mapping()` in
> > `rte_topo_get_next_lcore()`**
> >
> > In `rte_topo_get_next_lcore()`, the code calls
> > `get_domain_lcore_mapping(flag, lcore_domain)` and checks if `ptr` is
> > NULL on line 350. However, if `ptr` is NULL, the function returns
> > `RTE_MAX_LCORE`. This is correct, but the subsequent logic on line 381
> > calls `rte_topo_is_main_lcore_in_domain(flag, lcore_domain)`, which
> > internally may call `get_domain_lcore_mapping()` again. If that call
> > also returns NULL (which it will if the domain is invalid), the
> > function `rte_topo_is_main_lcore_in_domain()` will return `false`,
> > which is safe. However, the logic is fragile and should explicitly handle the NULL
> case to avoid relying on transitive safety.
> >
> > **Location:** `lib/eal/common/eal_topology.c:381`
> >
> > **Recommendation:** The code is technically safe but could be clearer.
> > No change required, but consider restructuring for maintainability.
> >
> > ---
> >
> > ### Patch 2/3: app/test Topology Tests
> >
> > #### **Error: Macro
> `RTE_TOPO_FOREACH_WORKER_LCORE_IN_DOMAIN`
> > declares variable in macro expansion (shadowing risk)**
> >
> > In `lib/eal/include/rte_topology.h:243-248`, the macro
> > `RTE_TOPO_FOREACH_WORKER_LCORE_IN_DOMAIN` declares a local
> variable
> > `main_lcore` inside the macro expansion:
> >
> > ```c
> > #define RTE_TOPO_FOREACH_WORKER_LCORE_IN_DOMAIN(lcore,
> > domain_indx, flag) \
> > lcore = rte_topo_get_nth_lcore_from_domain(domain, 0, 0, flag); \
> > uint16_t main_lcore = rte_get_main_lcore(); \
> > for (lcore = (lcore != main_lcore) ? \
> > lcore : rte_topo_get_next_lcore(lcore, 1, 0, flag); \
> > lcore < RTE_MAX_LCORE; \
> > lcore = rte_topo_get_next_lcore(lcore, 1, 0, flag)) ```
> >
> > This can cause a compiler error or shadowing if the caller already has
> > a variable named `main_lcore` in scope. Additionally, the macro uses
> > `domain` (line 244) but the parameter is `domain_indx`, which is a typo and will
> cause a compilation error.
> >
> > **Location:** `lib/eal/include/rte_topology.h:243-248`
> >
> > **Fix:** Wrap in a `do { } while (0)` and use a uniquely-named
> > variable, or document that the macro must not be used if `main_lcore`
> > is already declared. Also fix the
> > typo:
> >
> > ```c
> > #define RTE_TOPO_FOREACH_WORKER_LCORE_IN_DOMAIN(lcore,
> > domain_indx, flag) \
> > lcore = rte_topo_get_nth_lcore_from_domain(domain_indx, 0, 0, flag); \
> > uint16_t __topo_main_lcore = rte_get_main_lcore(); \
> > for (lcore = (lcore != __topo_main_lcore) ? \
> > lcore : rte_topo_get_next_lcore(lcore, 1, 0, flag); \
> > lcore < RTE_MAX_LCORE; \
> > lcore = rte_topo_get_next_lcore(lcore, 1, 0, flag)) ```
> >
> > ---
> >
> > ### Summary of Correctness Bugs
> >
> > 1. **hwloc topology leak** on `eal_topology_map_layer()` failure 2.
> > **Resource
> > leak** in `eal_topology_map_layer()` on partial allocation failure 3.
> > **NULL pointer
> > dereference** in `eal_topology_map_layer()` inner loop 4. **Logic
> > error** in `get_same_l1_domains()` (same lcore assigned to `id1` and
> > `id2`) 5. **Wrong domain type** in `test_main_lcore_in_domain()` (uses
> > `RTE_TOPO_DOMAIN_NUMA` instead of `domain_types[d]`) 6. **Macro typo**
> > in `RTE_TOPO_FOREACH_WORKER_LCORE_IN_DOMAIN` (uses `domain`
> instead of
> > `domain_indx`) 7. **Potential infinite loop** in
> > `rte_topo_get_nth_lcore_from_domain()` if `ptr->core_count == 0`
> > (mitigated by early
More information about the dev
mailing list