[PATCH v2] eal: fix undetected NUMA nodes
Bruce Richardson
bruce.richardson at intel.com
Wed Mar 19 17:54:31 CET 2025
On Wed, Mar 19, 2025 at 05:31:45PM +0100, David Marchand wrote:
> On Wed, Mar 5, 2025 at 5:25 PM Bruce Richardson
> <bruce.richardson at intel.com> wrote:
> >
> > In cases where the number of cores on a given socket is greater than
> > RTE_MAX_LCORES, then EAL will be unaware of all the sockets/numa nodes
> > on a system. Fix this limitation by having the EAL probe the NUMA node
> > for cores it isn't going to use, and recording that for completeness.
> >
> > This is necessary as memory is tracked per node, and with the --lcores
> > parameters our app lcores may be on different sockets than the lcore ids
> > may imply. For example, lcore 0 is on socket zero, but if app is run
> > with --lcores=0 at 64, then DPDK lcore 0 may be on socket one, so DPDK
> > needs to be aware of that socket.
> >
> > Fixes: 952b20777255 ("eal: provide API for querying valid socket ids")
> > Cc: stable at dpdk.org
> >
> > Signed-off-by: Bruce Richardson <bruce.richardson at intel.com>
>
> On the principle, the fix lgtm.
>
> I have one comment.
>
> >
> > ---
> > v2: handle case where RTE_MAX_LCORE > CPU_SETSIZE (i.e. >1024)
> > ---
> > lib/eal/common/eal_common_lcore.c | 17 ++++++++++++-----
> > 1 file changed, 12 insertions(+), 5 deletions(-)
> >
> > diff --git a/lib/eal/common/eal_common_lcore.c b/lib/eal/common/eal_common_lcore.c
> > index 2ff9252c52..820a6534b1 100644
> > --- a/lib/eal/common/eal_common_lcore.c
> > +++ b/lib/eal/common/eal_common_lcore.c
> > @@ -144,7 +144,11 @@ rte_eal_cpu_init(void)
> > unsigned lcore_id;
> > unsigned count = 0;
> > unsigned int socket_id, prev_socket_id;
> > - int lcore_to_socket_id[RTE_MAX_LCORE];
> > +#if CPU_SETSIZE > RTE_MAX_LCORE
> > + int lcore_to_socket_id[CPU_SETSIZE] = {0};
> > +#else
> > + int lcore_to_socket_id[RTE_MAX_LCORE] = {0};
> > +#endif
>
> This initialisation was unneeded so far because, in the next loop (on
> each possible lcore), eal_cpu_socket_id() (returning 0 even for
> errors) was called regardless of eal_cpu_detected().
> Moving this call after eal_cpu_detected() would be consistent with the
> rest of this patch.
>
So keep the zero-init, and move the function call to set the initial values
in the array then?
>
> It is unrelated to this patch itself, but I also have some doubt about
> the socket_id value stored per lcore, as no check against
> RTE_MAX_NUMA_NODES is done afterwards.
> (it is probably never hit since the default value for RTE_MAX_NUMA_NODES is 32).
>
Well, it's an open question whether RTE_MAX_NUMA_NODES is the max value for a
node id, or the maximum number of ids which can be handled. I imagine most
of the code assumes both - that we have sequential numa nodes with value <
MAX.
/Bruce
More information about the stable
mailing list