[dpdk-dev] [RFC PATCH v3 3/6] build: automatic NUMA and cpu counts detection

Juraj Linkeš juraj.linkes at pantheon.tech
Fri Oct 23 12:07:17 CEST 2020



> -----Original Message-----
> From: Bruce Richardson <bruce.richardson at intel.com>
> Sent: Wednesday, October 21, 2020 4:28 PM
> To: Juraj Linkeš <juraj.linkes at pantheon.tech>
> Cc: Ruifeng.Wang at arm.com; Honnappa.Nagarahalli at arm.com;
> Phil.Yang at arm.com; vcchunga at amazon.com; Dharmik.Thakkar at arm.com;
> jerinjacobk at gmail.com; hemant.agrawal at nxp.com; dev at dpdk.org
> Subject: Re: [dpdk-dev] [RFC PATCH v3 3/6] build: automatic NUMA and cpu
> counts detection
> 
> On Wed, Oct 21, 2020 at 03:13:19PM +0100, Bruce Richardson wrote:
> > On Wed, Oct 21, 2020 at 01:01:41PM +0000, Juraj Linkeš wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: Bruce Richardson <bruce.richardson at intel.com>
> > > > Sent: Wednesday, October 21, 2020 2:02 PM
> > > > To: Juraj Linkeš <juraj.linkes at pantheon.tech>
> > > > Cc: Ruifeng.Wang at arm.com; Honnappa.Nagarahalli at arm.com;
> > > > Phil.Yang at arm.com; vcchunga at amazon.com;
> Dharmik.Thakkar at arm.com;
> > > > jerinjacobk at gmail.com; hemant.agrawal at nxp.com; dev at dpdk.org
> > > > Subject: Re: [RFC PATCH v3 3/6] build: automatic NUMA and cpu
> > > > counts detection
> > > >
> > > > On Wed, Oct 21, 2020 at 01:37:38PM +0200, Juraj Linkeš wrote:
> > > > > The build machine's number of cpus and numa nodes vary,
> > > > > resulting in mismatched counts of RTE_MAX_LCORE and
> > > > > RTE_MAX_NUMA_NODES for
> > > > many
> > > > > builds. Automatically discover the host's numa and cpu counts to
> > > > > remove this mismatch for native builds. Use current defaults for default
> builds.
> > > > > Force the users to specify the counts for cross build in cross
> > > > > files or on the command line.
> > > > > Give users the option to override the discovery or values from
> > > > > cross files by specifying them on the command line with
> > > > > -Dmax_lcores and -Dmax_numa_nodes.
> > > > >
> > > > > Signed-off-by: Juraj Linkeš <juraj.linkes at pantheon.tech>
> > > > > ---
> > > > >  buildtools/get_cpu_count.py  |  7 ++++++
> > > > > buildtools/get_numa_count.py | 22 +++++++++++++++++++
> > > > >  buildtools/meson.build       |  2 ++
> > > > >  config/meson.build           | 42 ++++++++++++++++++++++++++++++++++-
> -
> > > > >  meson_options.txt            |  8 +++----
> > > > >  5 files changed, 75 insertions(+), 6 deletions(-)  create mode
> > > > > 100644 buildtools/get_cpu_count.py  create mode 100644
> > > > > buildtools/get_numa_count.py
> > > > >
> > > > > diff --git a/buildtools/get_cpu_count.py
> > > > > b/buildtools/get_cpu_count.py new file mode 100644 index
> > > > > 000000000..386f85f8b
> > > > > --- /dev/null
> > > > > +++ b/buildtools/get_cpu_count.py
> > > > > @@ -0,0 +1,7 @@
> > > > > +#!/usr/bin/python3
> > > > > +# SPDX-License-Identifier: BSD-3-Clause # Copyright (c) 2020
> > > > > +PANTHEON.tech s.r.o.
> > > > > +
> > > > > +import os
> > > > > +
> > > > > +print(os.cpu_count())
> > > > > diff --git a/buildtools/get_numa_count.py
> > > > > b/buildtools/get_numa_count.py new file mode 100644 index
> > > > > 000000000..f0c49973a
> > > > > --- /dev/null
> > > > > +++ b/buildtools/get_numa_count.py
> > > > > @@ -0,0 +1,22 @@
> > > > > +#!/usr/bin/python3
> > > > > +# SPDX-License-Identifier: BSD-3-Clause # Copyright (c) 2020
> > > > > +PANTHEON.tech s.r.o.
> > > > > +
> > > > > +import ctypes
> > > > > +import glob
> > > > > +import os
> > > > > +import subprocess
> > > > > +
> > > > > +if os.name == 'posix':
> > > > > +    if os.path.isdir('/sys/devices/system/node'):
> > > > > +        print(len(glob.glob('/sys/devices/system/node/node*')))
> > > > > +    else:
> > > > > +        print(subprocess.run(['sysctl', 'vm.ndomains'],
> > > > > +capture_output=True).stdout)
> > > > > +
> > > > > +elif os.name == 'nt':
> > > > > +    libkernel32 = ctypes.windll.kernel32
> > > > > +
> > > > > +    count = ctypes.c_ulong()
> > > > > +
> > > > > +    libkernel32.GetNumaHighestNodeNumber(ctypes.pointer(count))
> > > > > +    print(count.value + 1)
> > > > > diff --git a/buildtools/meson.build b/buildtools/meson.build
> > > > > index
> > > > > 04808dabc..925e733b1 100644
> > > > > --- a/buildtools/meson.build
> > > > > +++ b/buildtools/meson.build
> > > > > @@ -17,3 +17,5 @@ else
> > > > >  endif
> > > > >  map_to_win_cmd = py3 + files('map_to_win.py')  sphinx_wrapper =
> > > > > py3 +
> > > > > files('call-sphinx-build.py')
> > > > > +get_cpu_count_cmd = py3 + files('get_cpu_count.py')
> > > > > +get_numa_count_cmd = py3 + files('get_numa_count.py')
> > > > > diff --git a/config/meson.build b/config/meson.build index
> > > > > a57c8ae9e..c4477f977 100644
> > > > > --- a/config/meson.build
> > > > > +++ b/config/meson.build
> > > > > @@ -74,7 +74,11 @@ endif
> > > > >  # still being able to support the CPU features required for DPDK.
> > > > >  # This can be bumped up by the DPDK project, but it can never
> > > > > be an # invariant like 'native'
> > > > > +max_lcores = get_option('max_lcores') max_numa_nodes =
> > > > > +get_option('max_numa_nodes')
> > > > >  if machine == 'default'
> > > > > +	max_numa_nodes = 4
> > > > > +	max_lcores = 128
> > > >
> > > > This doesn't seem right, since you are overriding the
> > > > user-specified values with hard-coded ones.
> > > >
> > >
> > > I understand we're using the default build/generic to build portalbe dpdk
> distro packages, meaning the settings for those packages should always be the
> same, no? If not, what should the default/generic build be? And when would
> someone do a default/generic build with their values? It wouldn't be a
> default/generic anymore, right?
> > >
> > > > >  	if host_machine.cpu_family().startswith('x86')
> > > > >  		# matches the old pre-meson build systems default
> > > > >  		machine = 'corei7'
> > > > > @@ -83,6 +87,22 @@ if machine == 'default'
> > > > >  	elif host_machine.cpu_family().startswith('ppc')
> > > > >  		machine = 'power8'
> > > > >  	endif
> > > > > +elif not meson.is_cross_build()
> > > > > +	# find host core count and numa node count for native builds
> > > > > +	if max_lcores == 0
> > > > > +		max_lcores =
> > > > run_command(get_cpu_count_cmd).stdout().to_int()
> > > > > +		min_lcores = 2
> > > > > +		if max_lcores < min_lcores
> > > > > +			message('Found less than @0@ cores, building
> for
> > > > @0@ cores'.format(min_lcores))
> > > > > +			max_lcores = min_lcores
> > > > > +		else
> > > > > +			message('Found @0@
> cores'.format(max_lcores))
> > > > > +		endif
> > > > > +	endif
> > > > > +	if max_numa_nodes == 0
> > > > > +		max_numa_nodes =
> > > > run_command(get_numa_count_cmd).stdout().to_int()
> > > > > +		message('Found @0@ numa
> nodes'.format(max_numa_nodes))
> > > > > +	endif
> > > > >  endif
> > > > >
> > > > >  dpdk_conf.set('RTE_MACHINE', machine) @@ -227,8 +247,10 @@
> > > > > foreach
> > > > > arg: warning_flags  endforeach
> > > > >
> > > > >  # set other values pulled from the build options
> > > > > -dpdk_conf.set('RTE_MAX_LCORE', get_option('max_lcores'))
> > > > > -dpdk_conf.set('RTE_MAX_NUMA_NODES',
> > > > > get_option('max_numa_nodes'))
> > > > > +if not meson.is_cross_build()
> > > > > +	dpdk_conf.set('RTE_MAX_LCORE', max_lcores)
> > > > > +	dpdk_conf.set('RTE_MAX_NUMA_NODES', max_numa_nodes)
> endif
> > > >
> > > > Rather than conditionally setting the value here, you should move
> > > > the checks below up above this to simplify things.
> > > >
> > >
> > > Do you mean the cross build checks? Those have to be after
> subdir(arch_subdir) so that we can override the values from cross files (as the
> commit message says).
> > >
> > > > >  dpdk_conf.set('RTE_MAX_ETHPORTS', get_option('max_ethports'))
> > > > > dpdk_conf.set('RTE_LIBEAL_USE_HPET', get_option('use_hpet'))
> > > > > dpdk_conf.set('RTE_ENABLE_TRACE_FP',
> > > > > get_option('enable_trace_fp')) @@
> > > > > -247,6 +269,22 @@ compile_time_cpuflags = []
> > > > >  subdir(arch_subdir)
> > > > >  dpdk_conf.set('RTE_COMPILE_TIME_CPUFLAGS',
> > > > > ','.join(compile_time_cpuflags))
> > > > >
> > > > > +# check that cpu and numa count is set in cross builds if
> > > > > +meson.is_cross_build()
> > > > > +    	if max_lcores > 0
> > > > > +		# specified on the cmdline
> > > > > +		dpdk_conf.set('RTE_MAX_LCORE', max_lcores)
> > > > > +	elif not dpdk_conf.has('RTE_MAX_LCORE')
> > > > > +		error('Number of cores for cross build not specified in
> @0@
> > > > subdir (e.g. in a cross-file) nor on the
> > > > cmdline'.format(arch_subdir))
> > > > > +	endif
> > > > > +	if max_numa_nodes > 0
> > > > > +		# specified on the cmdline
> > > > > +		dpdk_conf.set('RTE_MAX_NUMA_NODES',
> max_numa_nodes)
> > > > > +	elif not dpdk_conf.has('RTE_MAX_NUMA_NODES')
> > > > > +		error('Number of numa nodes for cross build not
> specified in
> > > > @0@ subdir (e.g. in a cross-file) nor on the
> > > > cmdline'.format(arch_subdir))
> > > > > +	endif
> > > > > +endif
> > > > > +
> > > > >  # set the install path for the drivers
> > > > > dpdk_conf.set_quoted('RTE_EAL_PMD_PATH', eal_pmd_path)
> > > > >
> > > > > diff --git a/meson_options.txt b/meson_options.txt index
> > > > > 9bf18ab6b..01b0c45c3 100644
> > > > > --- a/meson_options.txt
> > > > > +++ b/meson_options.txt
> > > > > @@ -26,10 +26,10 @@ option('machine', type: 'string', value: 'native',
> > > > >  	description: 'set the target machine type')
> > > > > option('max_ethports',
> > > > > type: 'integer', value: 32,
> > > > >  	description: 'maximum number of Ethernet devices')
> > > > > -option('max_lcores', type: 'integer', value: 128,
> > > > > -	description: 'maximum number of cores/threads supported by
> EAL')
> > > > > -option('max_numa_nodes', type: 'integer', value: 4,
> > > > > -	description: 'maximum number of NUMA nodes supported by
> EAL')
> > > > > +option('max_lcores', type: 'integer', value: 0,
> > > > > +	description: 'maximum number of cores/threads supported by
> EAL.
> > > > > +Value 0 means the number of cpus on the host will be used. For
> > > > > +cross build,
> > > > set to non-zero to overwrite the cross-file value.')
> > > > option('max_numa_nodes',
> > > > type: 'integer', value: 0,
> > > > > +	description: 'maximum number of NUMA nodes supported by
> EAL.
> > > > > +Value
> > > > 0
> > > > > +means the number of numa nodes on the host will be used. For
> > > > > +cross build, set to non-zero to overwrite the cross-file
> > > > > +value.')
> > > >
> > > > I don't like this change, because it very much assumes for
> > > > non-cross-compiles that people will be running DPDK on the system
> > > > they build it on. That's a very, very big assumption!
> > >
> > > I'll be using definitions from https://mesonbuild.com/Cross-
> compilation.html.
> > > I understand cross compilation to be building for a diffent host machine than
> the build machine (which is aligned with pretty much every definition I found). I
> understand this to be true not only for builds between architectures, but also
> within an architecture (e.g. x86_64 build machine building for x86_64 host
> machine).
> > > So yes, when someone does a native build, it stands to reason they want to
> use it on the build machine. If they wanted to use it elsewhere, they would cross
> compile.
> > > Another thing is the current build philosophy is to detect as much as possible
> (not having statically defined configuration, as you mentioned in the past).
> Detecting the number of cores and numa nodes fits this perfectly.
> > > And yet another thing is that the assumption seems to be already present in
> the build system - it already detects a lot things, some of which may not be
> satisfied on machines other than the build machine. I may be wrong about this.
> > >
> > > > I'm ok with having zero as a "detect" option, and having the
> > > > values overridden from cross-files, but not with detection as the
> > > > default out- of-the-box option! Lots of users may pull builds from
> > > > a CI based on VMs with just a few cores, for instance.
> > >
> > > If not having the automatic detection is a concern because of users using CI
> builds, then we (if it's from our CI) can change what we're building in CI - the
> default/generic build seems like a good fit because it's supposed to work on a
> variety of systems. Expecting that native build from random VMs would work
> anywhere doesn't seen very realistic - it's been build for that VM environment
> (because it's a native build).
> > >
> > > Here's my understanding on which the current version is based:
> > > 1. Since we want to get away from having statically defined configuration,
> numa and core count discovery is exactly what we should have in the build
> system. Since discorery is currently the default for lib/drivers, it stands to reason
> it should be default for everything else, if possible.
> > > 2. Native build should produce binaries matching the build machine as well as
> possible.
> > > 3. Default/generic build should produce binaries executable on a range of
> systems (ideally all systems of a given architecture).
> > > 4. Other builds, that is non-native builds, are cross-compilation, since we're
> building for host machine other that the build machine.
> > >
> > > What I haven't taken into account is users using CI builds. That could be
> remedied by modifying the CI a bit while being consistent with what
> native/default/generic/cross builds are (or should be). And in any case, if we're
> not interested in testing the exact CI environment (which we aren't, since we
> don't want to use 2 cores with 1 numa), we really shouldn't be doing native
> builds there.
> > >
> > > I'm interested in hearing where my thinking deviates from yours.
> > >
> >
> > There are a number of points in which we differ, I think.
> >
> > Firstly, the use of "native" and "default/generic" for the "machine"
> > parameter refers only to the instruction-set level from the compiler,
> > and should not affect any other settings, since all settings are independent.

It seems that what you're saying may be true from the x86 point of view, but is not true from arm point of view and thus not true in general.
It's not true that all setting are independent for arm build. When building with machine=native we're discovering what machine we're building on and setting some settings (other than numa and core count) are based on that. For machine=generic we also have a set of values that build the generic binaries you describe below. More on this in that section.

> > Therefore, setting "machine" to "native" does not mean that we should
> > detect cores and numa nodes, and similarly setting it to "default"
> > does not mean that we should ignore the settings for these values and
> > pick our own chosen default values.
> >

I don't think this follows (if all settings are independent, then setting machine to native or anything else implies nothing for other settings). But I think it does follow from the what you write next.

However, since machine="default/generic" does not specify a type of build, but rather just one configuration option (or a small subset for arm builds), removing the ability to overwrite the default values when machine="default/generic" is not warranted.

> > Secondly, the use of cross-compilation only applies when you are
> > compiling for a different architecture or environment (e.g. OS) to
> > what you are building on. Building on a 4-core x86 machine to run on a
> > dual-socket, 32-core x86 machine is not cross-compiling, but still
> > needs to work by default.

We (the arm community, or at least a part of it) frequently refer to aarch64 -> aarch64 cross-compilation when we're building for a specific SoC (usually small/slow) on a fast server grade aarch64 machine. This is probably be because we're conflating what most people consider cross-compilation to be (building for a different arch or OS) with what meson cross compilation is (using a cross-file). Arm SoCs vary greatly and we want to capture that variation in cross files and use the cross files even on the same architecture (or rather, we want users to be able to). From now on when talking about cross compilation I'll mean meson cross compilation (i.e. using a cross file).

> > Something like building a 32-bit binary on a
> > 64-bit OS is in most cases not done by cross-compilation either, but
> > is rather outside the scope of the discussion, except as a reference
> > point to show the scope of differences which can be accomodated as "native
> builds".
> >
> > In terms of dynamic configuration for things like cores and numa
> > nodes, the ideal end state here is not to have them detected at build
> > time on the host system, but instead to have them detected at runtime and
> sized dynamically.

This would simplify everything, from building to users not having to worry about one more thing - would this be hard/time consuming to implement?

> > In the absense of that, these values should be set to reasonable
> > defaults so that when a user compiles up a binary without settings
> > these explicitly it should run on 95%+ of systems of that type.
> >

This, coupled with the paragraph about cross-compilation imply that we shouldn't default to discovering the values and I think that's reasonable. Having the ability to specify these on the command line, using an optional discovery mechanism and specifying these in a cross file should cover our (arm) needs.

> > This is my understanding of the issues, anyway. :-)
> >
> What could possibly work is to set the defaults for these to "0" as done in your
> patch, but thereafter have the resulting defaults set per-architecture, rather
> than globally. That would allow x86 to tune things more for native-style builds,
> while in all cases allowing user to override.
> 
> /Bruce

I'll change it so that per-arch (and for arm, per processor type) defaults are the defaults that will be used (when set to 0). I'd like to give users the option to use the discovery mechanism - I think we'll have to use another value with specific meaning and we can't use positive integers so I guess the best we can do is use -1 for this.

Juraj


More information about the dev mailing list