[PATCH v4 0/4] Introduce Topology NUMA grouping for lcores

Stephen Hemminger stephen at networkplumber.org
Sat Jan 17 19:57:22 CET 2026
Previous message (by thread): [PATCH v11 6/6] test: fix format overflow in cryptodev test
Next message (by thread): [PATCH v4 0/4] Introduce Topology NUMA grouping for cores
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Mon, 17 Mar 2025 14:46:07 +0100
Jan Viktorin <viktorin at cesnet.cz> wrote:

> Hello Vipin and others,
> 
> please, will there be any progress or update on this series?
> 
> I successfully tested those changes on our Intel and AMD machines and
> would like to use it in production soon.
> 
> The API is a little bit unintuitive, at least for me, but I
> successfully integrated into our software.
> 
> I am missing a clear relation to the NUMA socket approach used in DPDK.
> E.g. I would like to be able to easily walk over a list of lcores from
> a specific NUMA node grouped by L3 domain. Yes, there is the
> RTE_LCORE_DOMAIN_IO, but would it always match the appropriate socket
> IDs?
> 
> Also, I do not clearly understand what is the purpose of using domain
> selector like:
> 
>   RTE_LCORE_DOMAIN_L1 | RTE_LCORE_DOMAIN_L2
> 
> or even:
> 
>   RTE_LCORE_DOMAIN_L3 | RTE_LCORE_DOMAIN_L2
> 
> the documentation does not explain this. I could not spot any kind of
> grouping that would help me in any way. Some "best practices" examples
> would be nice to have to understand the intentions better.
> 
> I found a little catch when running DPDK with more lcores than there
> are physical or SMT CPU cores. This happens when using e.g. an option
> like --lcores=(0-15)@(0-1). The results from the topology API would not
> match the lcores because hwloc is not aware of the lcores concept. This
> might be mentioned somewhere.
> 
> Anyway, I really appreciate this work and would like to see it upstream.
> Especially for AMD machines, some framework like this is a must.
> 
> Kind regards,
> Jan
> 
> On Tue, 5 Nov 2024 15:58:45 +0530
> Vipin Varghese <vipin.varghese at amd.com> wrote:
> 
> > This patch introduces improvements for NUMA topology awareness in
> > relation to DPDK logical cores. The goal is to expose API which allows
> > users to select optimal logical cores for any application. These
> > logical cores can be selected from various NUMA domains like CPU and
> > I/O.
> > 
> > Change Summary:
> >  - Introduces the concept of NUMA domain partitioning based on CPU and
> >    I/O topology.
> >  - Adds support for grouping DPDK logical cores within the same Cache
> >    and I/O domain for improved locality.
> >  - Implements topology detection and core grouping logic that
> >    distinguishes between the following NUMA configurations:
> >     * CPU topology & I/O topology (e.g., AMD SoC EPYC, Intel Xeon SPR)
> >     * CPU+I/O topology (e.g., Ampere One with SLC, Intel Xeon SPR
> > with SNC)
> >  - Enhances performance by minimizing lcore dispersion across
> > tiles|compute package with different L2/L3 cache or IO domains.
> > 
> > Reason:
> >  - Applications using DPDK libraries relies on consistent memory
> > access.
> >  - Lcores being closer to same NUMA domain as IO.
> >  - Lcores sharing same cache.
> > 
> > Latency is minimized by using lcores that share the same NUMA
> > topology. Memory access is optimized by utilizing cores within the
> > same NUMA domain or tile. Cache coherence is preserved within the
> > same shared cache domain, reducing the remote access from
> > tile|compute package via snooping (local hit in either L2 or L3
> > within same NUMA domain).
> > 
> > Library dependency: hwloc
> > 
> > Topology Flags:
> > ---------------
> >  - RTE_LCORE_DOMAIN_L1: to group cores sharing same L1 cache
> >  - RTE_LCORE_DOMAIN_SMT: same as RTE_LCORE_DOMAIN_L1
> >  - RTE_LCORE_DOMAIN_L2: group cores sharing same L2 cache
> >  - RTE_LCORE_DOMAIN_L3: group cores sharing same L3 cache
> >  - RTE_LCORE_DOMAIN_L4: group cores sharing same L4 cache
> >  - RTE_LCORE_DOMAIN_IO: group cores sharing same IO
> > 
> > < Function: Purpose >
> > ---------------------
> >  - rte_get_domain_count: get domain count based on Topology Flag
> >  - rte_lcore_count_from_domain: get valid lcores count under each
> > domain
> >  - rte_get_lcore_in_domain: valid lcore id based on index
> >  - rte_lcore_cpuset_in_domain: return valid cpuset based on index
> >  - rte_lcore_is_main_in_domain: return true|false if main lcore is
> > present
> >  - rte_get_next_lcore_from_domain: next valid lcore within domain
> >  - rte_get_next_lcore_from_next_domain: next valid lcore from next
> > domain
> > 
> > Note:
> >  1. Topology is NUMA grouping.
> >  2. Domain is various sub-groups within a specific Topology.
> > 
> > Topology example: L1, L2, L3, L4, IO
> > Domian example: IO-A, IO-B
> > 
> > < MACRO: Purpose >
> > ------------------
> >  - RTE_LCORE_FOREACH_DOMAIN: iterate lcores from all domains
> >  - RTE_LCORE_FOREACH_WORKER_DOMAIN: iterate worker lcores from all
> > domains
> >  - RTE_LCORE_FORN_NEXT_DOMAIN: iterate domain select n'th lcore
> >  - RTE_LCORE_FORN_WORKER_NEXT_DOMAIN: iterate domain for worker n'th
> > lcore.
> > 
> > Future work (after merge):
> > --------------------------
> >  - dma-perf per IO NUMA
> >  - eventdev per L3 NUMA
> >  - pipeline per SMT|L3 NUMA
> >  - distributor per L3 for Port-Queue
> >  - l2fwd-power per SMT
> >  - testpmd option for IO NUMA per port
> > 
> > Platform tested on:
> > -------------------
> >  - INTEL(R) XEON(R) PLATINUM 8562Y+ (support IO numa 1 & 2)
> >  - AMD EPYC 8534P (supports IO numa 1 & 2)
> >  - AMD EPYC 9554 (supports IO numa 1, 2, 4)
> > 
> > Logs:
> > -----
> > 1. INTEL(R) XEON(R) PLATINUM 8562Y+:
> >  - SNC=1
> >         Domain (IO): at index (0) there are 48 core, with (0) at
> > index 0
> >  - SNC=2
> >         Domain (IO): at index (0) there are 24 core, with (0) at
> > index 0 Domain (IO): at index (1) there are 24 core, with (12) at
> > index 0
> > 
> > 2. AMD EPYC 8534P:
> >  - NPS=1:
> >         Domain (IO): at index (0) there are 128 core, with (0) at
> > index 0
> >  - NPS=2:
> >         Domain (IO): at index (0) there are 64 core, with (0) at
> > index 0 Domain (IO): at index (1) there are 64 core, with (32) at
> > index 0
> > 
> > Signed-off-by: Vipin Varghese <vipin.varghese at amd.com>
> > 
> > Vipin Varghese (4):
> >   eal/lcore: add topology based functions
> >   test/lcore: enable tests for topology
> >   doc: add topology grouping details
> >   examples: update with lcore topology API
> > 
> >  app/test/test_lcores.c                        | 528 +++++++++++++
> >  config/meson.build                            |  18 +
> >  .../prog_guide/env_abstraction_layer.rst      |  22 +
> >  examples/helloworld/main.c                    | 154 +++-
> >  examples/l2fwd/main.c                         |  56 +-
> >  examples/skeleton/basicfwd.c                  |  22 +
> >  lib/eal/common/eal_common_lcore.c             | 714
> > ++++++++++++++++++ lib/eal/common/eal_private.h                  |
> > 58 ++ lib/eal/freebsd/eal.c                         |  10 +
> >  lib/eal/include/rte_lcore.h                   | 209 +++++
> >  lib/eal/linux/eal.c                           |  11 +
> >  lib/eal/meson.build                           |   4 +
> >  lib/eal/version.map                           |  11 +
> >  lib/eal/windows/eal.c                         |  12 +
> >  14 files changed, 1819 insertions(+), 10 deletions(-)
> >   

This patch series does not apply cleanly to current DPDK main branch.
Please rebase and resubmit.

AI patch review had the following insights:

# DPDK Patch Series Review: Topology-based Lcore Functions (v4)

**Series**: [PATCH v4 1-4/4] eal/lcore: add topology based functions  
**Date**: November 5, 2024  
**Author**: Vipin Varghese <vipin.varghese at amd.com>

---

## Executive Summary

This 4-patch series adds topology-aware lcore mapping to DPDK's lcore API, allowing cores to be grouped by chiplet locality (L1-L4 cache, IO). The series includes API additions, tests, documentation, and example updates.

**Overall Assessment**: The patches have several issues that need to be addressed before acceptance.

---

## Patch 1/4: eal/lcore: add topology based functions

### Commit Message Review

#### ✅ PASS: Subject Line Format
- Length: 45 characters (within 60 char limit)
- Format: `eal/lcore: add topology based functions`
- Proper prefix, lowercase, imperative mood, no trailing period

#### ⚠️ WARNING: Commit Body Issues

**Line Length Violations** (must be ≤75 characters):
- Line 147: "Using hwloc library, the dpdk available lcores can be grouped" - appears OK
- Line 148: "into various groups nameley L1, L2, L3, L4 and IO. This patch" - OK
- However, several lines approach the limit

**Typo**: "nameley" should be "namely" (line 147)

**Body Structure**: The commit message starts appropriately but could be more detailed about the problem being solved and why this change is needed.

#### ❌ ERROR: Missing Required Tags

The patch is missing:
- `Signed-off-by:` tag - This is **MANDATORY** per AGENTS.md line 116

The version history (v4 changes) is good and appropriately placed but should be separated from the main commit message by a `---` line before the diff.

### Code Review

#### License Headers

Need to check if new files have proper SPDX headers. From the visible diff, additions to existing files should maintain their existing headers.

#### Naming Conventions

**✅ PASS**: External API functions properly prefixed with `rte_`:
- `rte_get_domain_count`
- `rte_get_lcore_in_domain`
- `rte_get_next_lcore_from_domain`
- `rte_get_next_lcore_from_next_domain`
- `rte_lcore_count_from_domain`
- `rte_lcore_cpuset_in_domain`
- `rte_lcore_is_main_in_domain`

**⚠️ INFO**: Internal functions lack `rte_` prefix, which is appropriate for internal APIs:
- `get_domain_lcore_count`
- `get_domain_lcore_mapping`
- `rte_eal_topology_init`
- `rte_eal_topology_release`

However, the last two (`rte_eal_topology_*`) have `rte_` prefix despite being listed as internal - verify if these should be marked as `__rte_internal`.

#### Code Style Issues

From the visible diffs:

**Line Length**: Need to verify all code lines are ≤100 characters

**Type Usage**:
- Good: Conversion of `l3_count` & `io_count` to `uint16_t` (mentioned in v4 changes)

**Memory Management**:
- Good: Removed unnecessary NULL checks before free (mentioned in v4 changes)
- Good: Removed unnecessary malloc casting (mentioned in v4 changes)

**Comments**: Would need to see full code to verify comment formatting

#### API Design Issues

**⚠️ WARNING: Experimental APIs**
All new external APIs should be marked with `__rte_experimental` per AGENTS.md line 766. The commit message mentions "External Experimental API" but need to verify in the actual header file that each function has the `__rte_experimental` attribute on its own line.

**⚠️ WARNING: Missing Doxygen**
New public APIs must have Doxygen comments (AGENTS.md line 756). Need to verify full header file.

#### Testing Requirements

**⚠️ WARNING**: Per AGENTS.md line 698-699:
- New APIs must be used in `/app` test directory
- New device APIs require at least one driver implementation

The series includes patch 2/4 for tests, which is good. Need to verify the tests use the `TEST_ASSERT` macros and `unit_test_suite_runner` infrastructure (AGENTS.md lines 703-743).

### Documentation Requirements

**⚠️ WARNING**: Per AGENTS.md line 758-762:
- Release notes must be updated in `doc/guides/rel_notes/` for important changes
- Code and documentation must be updated atomically

The series includes patch 3/4 for documentation, which is good. However, need to verify:
- Documentation matches code behavior
- Only the **current release** notes file is updated
- Doxygen comments are present for all public APIs

---

## Patch 2/4: test/lcore: enable tests for topology

### Commit Message Review

#### Subject Line
- Format: `test/lcore: enable tests for topology`
- Length: 37 characters ✅
- Proper prefix, lowercase, imperative ✅

#### Missing Elements
- ❌ **ERROR**: Missing `Signed-off-by:` tag

### Test Code Requirements

**Need to verify**:
- Tests use `TEST_ASSERT` macros (AGENTS.md line 745-752)
- Tests use `unit_test_suite_runner` infrastructure (AGENTS.md line 703-743)
- Tests are properly registered with `REGISTER_FAST_TEST` or similar

---

## Patch 3/4: doc: add topology grouping details

### Commit Message Review

#### Subject Line
- Format: `doc: add topology grouping details`
- Length: 35 characters ✅
- Proper prefix, lowercase, imperative ✅

#### Missing Elements
- ❌ **ERROR**: Missing `Signed-off-by:` tag

### Documentation Requirements

**Need to verify**:
- Documentation matches actual code behavior
- Release notes updated for current release only
- Proper RST formatting
- No passive voice (per DPDK documentation standards)

---

## Patch 4/4: examples: update with lcore topology API

### Commit Message Review

#### Subject Line
- Format: `examples: update with lcore topology API`
- Length: 39 characters ✅
- Proper prefix ✅

**⚠️ WARNING: Component Prefix**
Per AGENTS.md line 89, `example:` should be `examples/foo:` with specific example name.
Should be something like: `examples/helloworld: add topology support`

#### Missing Elements
- ❌ **ERROR**: Missing `Signed-off-by:` tag

### Code Review for Examples

From the visible diff in helloworld:

**Line 2608-2609**: Ternary operator formatting
```c
rte_eal_remote_launch((topo_sel == USE_NO_TOPOLOGY) ?
    lcore_hello : send_lcore_hello, NULL, lcore_id);
```
This is acceptable but could be more readable. Consider:
```c
lcore_func = (topo_sel == USE_NO_TOPOLOGY) ? lcore_hello : send_lcore_hello;
rte_eal_remote_launch(lcore_func, NULL, lcore_id);
```

**Line 2631-2632**: Comment formatting
```c
+/* select lcores based on ports numa (RTE_LCORE_DOMAIN_IO). */
+static bool select_port_from_io_domain;
```
✅ Good comment style

**L2fwd example changes** (lines 2624-2729):

**Line 2641**: Missing space in comment
```c
+	       "  -t : Enable IO domain lcores mapping to Ports\n"
```
Should probably be: "IO-domain lcore mapping" or "IO domain lcore mapping"

**Line 2670**: Variable type change
```c
-	unsigned lcore_id, rx_lcore_id;
+	uint16_t lcore_id, rx_lcore_id;
```
✅ Good: Using explicit types instead of `unsigned`

**Line 2681-2686**: Array initialization
```c
+	uint16_t coreindx_io_domain[RTE_MAX_ETHPORTS] = {0};
+	uint16_t lcore_io_domain[RTE_MAX_ETHPORTS] = {RTE_MAX_LCORE};
```
**⚠️ WARNING**: The second initialization is problematic. It only sets the first element to `RTE_MAX_LCORE`, not all elements. Should use a loop or memset for initialization:
```c
for (int i = 0; i < RTE_MAX_ETHPORTS; i++)
    lcore_io_domain[i] = RTE_MAX_LCORE;
```

**Line 2683**: Variable naming issue
```c
+	uint16_t l3_domain_count = rte_get_domain_count(RTE_LCORE_DOMAIN_IO);
```
**❌ ERROR**: Variable is named `l3_domain_count` but gets IO domain count. Should be `io_domain_count`.

**Line 2705-2710**: Parentheses in while condition
```c
while ((rte_lcore_is_enabled(rx_lcore_id) == 0) ||
       (lcore_queue_conf[rx_lcore_id].n_rx_port ==
	l2fwd_rx_queue_per_lcore)) {
```
Excessive parentheses - the outer ones around each condition aren't needed.

**Skeleton example changes** (lines 2731-2777):

**Line 2750**: Typo in comment
```c
+	/* get the socekt of each port */
```
Should be "socket", not "socekt"

**Line 2760**: Function usage looks good
```c
+	core_count_from_io = rte_lcore_count_from_domain(RTE_LCORE_DOMAIN_IO, sel_io_indx);
```

---

## Summary of Critical Issues

### Errors (Must Fix):

1. **All 4 patches missing `Signed-off-by:` tags** - This is mandatory per DPDK contribution guidelines
2. **Patch 4**: Variable naming bug - `l3_domain_count` should be `io_domain_count`
3. **Patch 4**: Array initialization bug - `lcore_io_domain` array not fully initialized

### Warnings (Should Fix):

4. **Patch 1**: Typo in commit message - "nameley" → "namely"
5. **Patch 4**: Wrong component prefix - should be `examples/helloworld:` and `examples/l2fwd:` and `examples/skeleton:`
6. **Patch 4**: Typo in comment - "socekt" → "socket"
7. **Patch 4**: Unnecessary parentheses in while condition
8. All patches should verify:
   - API functions have `__rte_experimental` markers
   - Doxygen comments present for all public APIs
   - Tests use proper `TEST_ASSERT` infrastructure
   - Release notes updated

### Info (Consider):

9. **Patch 1**: Version history (v4 changes) should be after `---` separator, not in commit body
10. **Patch 4**: Ternary operator could be more readable with intermediate variable
11. Verify all code lines are ≤100 characters
12. Verify commit body lines are ≤75 characters

---

## Recommended Actions

1. **Immediate**: Add `Signed-off-by:` tags to all patches with author's real name and email
2. **Immediate**: Fix variable naming bug (`l3_domain_count` → `io_domain_count`)
3. **Immediate**: Fix array initialization bug for `lcore_io_domain`
4. Fix typos: "nameley" → "namely", "socekt" → "socket"
5. Update patch 4 subject lines to use specific example names
6. Verify all new APIs are marked `__rte_experimental`
7. Verify all new APIs have Doxygen documentation
8. Move v4 changelog to post-`---` section
9. Run DPDK validation tools:
   - `devtools/check-git-log.sh`
   - `devtools/checkpatches.sh`
   - `devtools/test-meson-builds.sh`

---

## Validation Commands

Before resubmitting, run:

```bash
# Check commit messages
devtools/check-git-log.sh

# Check patch formatting
devtools/checkpatches.sh *.patch

# Verify compilation
devtools/test-meson-builds.sh

# Check maintainers
devtools/get-maintainer.sh *.patch
```

---

## Positive Aspects

✅ Good responsiveness to v4 feedback (removed malloc casting, NULL checks, fixed types)  
✅ Comprehensive series with tests, documentation, and examples  
✅ Clear API naming with proper `rte_` prefixes  
✅ Good use of explicit types (`uint16_t`) instead of `unsigned`  
✅ Appropriate subject line formatting and component prefixes (mostly)  

The core concept and implementation approach appear sound, but the patches need the above corrections before merging.
Previous message (by thread): [PATCH v11 6/6] test: fix format overflow in cryptodev test
Next message (by thread): [PATCH v4 0/4] Introduce Topology NUMA grouping for cores
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the dev mailing list