[dpdk-dev] [PATCH v4] build: optional NUMA and cpu counts detection

Juraj Linkeš juraj.linkes at pantheon.tech
Fri Jul 16 15:53:18 CEST 2021



> -----Original Message-----
> From: David Christensen <drc at linux.vnet.ibm.com>
> Sent: Tuesday, July 6, 2021 8:11 PM
> To: Bruce Richardson <bruce.richardson at intel.com>; Juraj Linkeš
> <juraj.linkes at pantheon.tech>
> Cc: thomas at monjalon.net; david.marchand at redhat.com;
> Honnappa.Nagarahalli at arm.com; Ruifeng.Wang at arm.com;
> ferruh.yigit at intel.com; jerinjacobk at gmail.com; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v4] build: optional NUMA and cpu counts
> detection
> 
> 
> 
> On 7/6/21 2:08 AM, Bruce Richardson wrote:
> > On Tue, Jul 06, 2021 at 08:56:37AM +0000, Juraj Linkeš wrote:
> >>
> >>
> >>> -----Original Message-----
> >>> From: Bruce Richardson <bruce.richardson at intel.com>
> >>> Sent: Tuesday, June 29, 2021 1:29 PM
> >>> To: Juraj Linkeš <juraj.linkes at pantheon.tech>
> >>> Cc: thomas at monjalon.net; david.marchand at redhat.com;
> >>> Honnappa.Nagarahalli at arm.com; Ruifeng.Wang at arm.com;
> >>> ferruh.yigit at intel.com; jerinjacobk at gmail.com; dev at dpdk.org
> >>> Subject: Re: [PATCH v4] build: optional NUMA and cpu counts
> >>> detection
> >>>
> >>> On Tue, Jun 29, 2021 at 12:55:05PM +0200, Juraj Linkeš wrote:
> >>>> Add an option to automatically discover the host's numa and cpu
> >>>> counts and use those values for a non cross-build.
> >>>> Give users the option to override the per-arch default values or
> >>>> values from cross files by specifying them on the command line with
> >>>> -Dmax_lcores and -Dmax_numa_nodes.
> >>>>
> >>>> Signed-off-by: Juraj Linkeš <juraj.linkes at pantheon.tech>
> >>>> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli at arm.com>
> >>>> ---
> >>> Two very minor suggestions inline below.
> >>>
> >>> Acked-by: Bruce Richardson <bruce.richardson at intel.com>
> >>>
> >>>>
> >>> <snip>
> >>>> +max_lcores = get_option('max_lcores') if max_lcores == 'auto'
> >>>
> >>> Rather than "auto", would "detect" be a clearer name for this option value?
> >>>
> >>> <snip>
> >>>> +option('max_lcores', type: 'string', value: 'default', description:
> >>>> +       'Set maximum number of cores/threads supported by EAL. The
> >>>> +default is different per-arch. Set to auto to detect the number of
> >>>> +cores on the
> >>> build machine.') option('max_numa_nodes', type: 'string', value:
> >>> 'default',
> >>> description:
> >>>> +       'Set highest NUMA node supported by EAL. The default is
> >>>> +different per-arch. Set to auto to detect the highest numa node on
> >>>> +the build machine.')
> >>>
> >>> I'd put the explicit values of "default" and "auto"(or "detect") in
> >>> quotes "" to make clear they are literal values.
> >>>
> >>
> >> Thanks, Bruce, I'll change it. I have one extra question now that I'm looking
> at the patch:
> >> What does subprocess.run(['sysctl', '-n', 'vm.ndomains'], check=False) return
> exactly? Is the the number of NUMA nodes (looks like it) or the highest NUMA
> node on the system (the highest number of all NUMA nodes)? I'm asking
> because of how NUMA works on P9:
> >> NUMA node0 CPU(s):   0-63
> >> NUMA node8 CPU(s):   64-127
> >> NUMA node252 CPU(s):
> >> NUMA node253 CPU(s):
> >> NUMA node254 CPU(s):
> >> NUMA node255 CPU(s):
> >>
> >> Here we need not just two NUMA nodes, but at least 9 (0-8). Linux and
> Windows should return the highest NUMA, not sure about FreeBSD. Or maybe
> we should return the highest NUMA on which there are actual CPUs?
> >
> > I'm not sure, and I think to be really sure we'd need it tested on a
> > P9 system. The help text for the sysctl node says "Number of physical
> > memory domains available", which would imply 2 in the case above.
> > [However, we also would need to find out how BSD numbers the domains,
> > too, as it's possible an OS could just call them 0 and 1, rather than
> > 0 and 8 if it wanted to.]
> >
> > In short, we'd need to test to be sure. Is FreeBSD on P9 a supported
> > config, and if so can the P9 maintainer perhaps help out with testing?
> 
> Results of the v4 patch on an IBM AC922 P9 system with Linux:
> 

Can you get results from FreeBSD as well?

> $ python3 get-numa-count.py
> 8
> NUMA node0 CPU(s):   0-63
> NUMA node8 CPU(s):   64-127
<snip>
Is this the right number for your case, i.e. are you able to use both numa nodes when RTE_MAX_NUMA_NODES=8?

Or maybe this is a question for Bruce or Thomas - what do we need to set in RTE_MAX_NUMA_NODES to be able to use all numa nodes on the system? The highest numa node number or that + 1? In linux, with 4 numa nodes, there will be node0-node3 under /sys/devices/system/node - do we need to set RTE_MAX_NUMA_NODES to 3 or 4?

> OVS has a problem with requiring contiguous NUMA nodes that we've submitted
> patches to fix, so we need to ensure that's not a problem in DPDK.
> 
> I've been doing things like "--socket-mem=2048,0,0,0,0,0,0,0,2048" so far to
> manage the memory-to-NUMA mapping which has been working fine, so it
> depends on how vm.ndomains will be.
> 
> Dave



More information about the dev mailing list