[dpdk-dev] [PATCH] eal: add option --avail-cores to detect lcores

Ananyev, Konstantin konstantin.ananyev at intel.com
Wed Mar 9 20:33:46 CET 2016



> >>>>>>>> On 3/8/2016 4:54 PM, Panu Matilainen wrote:
> >>>>>>>>> On 03/04/2016 12:05 PM, Jianfeng Tan wrote:
> >>>>>>>>>> This patch adds option, --avail-cores, to use lcores which are
> >>>>>>>>>> available
> >>>>>>>>>> by calling pthread_getaffinity_np() to narrow down detected cores
> >>>>>>>>>> before
> >>>>>>>>>> parsing coremask (-c), corelist (-l), and coremap (--lcores).
> >>>>>>>>>>
> >>>>>>>>>> Test example:
> >>>>>>>>>> $ taskset 0xc0000 ./examples/helloworld/build/helloworld \
> >>>>>>>>>>            --avail-cores -m 1024
> >>>>>>>>>>
> >>>>>>>>>> Signed-off-by: Jianfeng Tan <jianfeng.tan at intel.com>
> >>>>>>>>>> Acked-by: Neil Horman <nhorman at tuxdriver.com>
> >>>>>>>>> Hmm, to me this sounds like something that should be done always so
> >>>>>>>>> there's no need for an option. Or if there's a chance it might do the
> >>>>>>>>> wrong thing in some rare circumstance then perhaps there should be a
> >>>>>>>>> disabler option instead?
> >>>>>>>> Thanks for comments.
> >>>>>>>>
> >>>>>>>> Yes, there's a use case that we cannot handle.
> >>>>>>>>
> >>>>>>>> If we make it as default, DPDK applications may fail to start, when user
> >>>>>>>> specifies a core in isolcpus and its parent process (say bash) has a
> >>>>>>>> cpuset affinity that excludes isolcpus. Originally, DPDK applications
> >>>>>>>> just blindly do pthread_setaffinity_np() and it always succeeds because
> >>>>>>>> it always has root privilege to change any cpu affinity.
> >>>>>>>>
> >>>>>>>> Now, if we do the checking in rte_eal_cpu_init(), those lcores will be
> >>>>>>>> flagged as undetected (in my older implementation) and leads to failure.
> >>>>>>>> To make it correct, we would always add "taskset mask" (or other ways)
> >>>>>>>> before DPDK application cmd lines.
> >>>>>>>>
> >>>>>>>> How do you think?
> >>>>>>> I still think it sounds like something that should be done by default
> >>>>>>> and maybe be overridable with some flag, rather than the other way
> >>>>>>> around. Another alternative might be detecting the cores always but if
> >>>>>>> running as root, override but with a warning.
> >>>>>> For your second solution, only root can setaffinity to isolcpus?
> >>>>>> Your first solution seems like a promising way for me.
> >>>>>>
> >>>>>>> But I dont know, just wondering. To look at it from another angle: why
> >>>>>>> would somebody use this new --avail-cores option and in what
> >>>>>>> situation, if things "just work" otherwise anyway?
> >>>>>> For DPDK applications, the most common case to initialize DPDK is like
> >>>>>> this: "$dpdk-app [options for DPDK] -- [options for app]", so users need
> >>>>>> to specify which cores to run and how much hugepages are used. Suppose
> >>>>>> we need this dpdk-app to run in a container, users already give those
> >>>>>> information when they build up the cgroup for it to run inside, this
> >>>>>> option or this patch is to make DPDK more smart to discover how much
> >>>>>> resource will be used. Make sense?
> >>>>> But then, all we need might be just a script that would extract this information from the system
> >>>>> and form a proper cmdline parameter for DPDK?
> >>>> Yes, a script will work. Or to construct (argc, argv) to call
> >>>> rte_eal_init() in the application. But as Neil Horman once suggested, a
> >>>> simple pthread_getaffinity_np() will get all things done. So if it worth
> >>>> a patch here?
> >>> Don't know...
> >>> Personally I would prefer not to put extra logic inside EAL.
> >>> For me - there are too many different options already.
> >> Then how about make it default in rte_eal_cpu_init()? And it is already
> >> known it will bring trouble to those use isolcpus users, they need to
> >> add "taskset [mask]" before starting a DPDK app.
> > As I said - provide a script?
> 
> Yes. But what I want to say is this script is hard to be right, if there
> are different kinds of limitations. (Barely happen though :-) )

My thought was to keep dpdk code untouched - i.e. let it still blindly set_pthread_affinity()
based on the input parameters, and in addition provide a script for those who want to run
in '--avail-cores' mode. 
So it could do 'taskset -p $$' and then either form -c parameter list  for the app,
or check existing -c/-l/--lcores parameter and complain if not allowed pcpu detected.
But ok, might be it is easier and more convenient to have this logic inside EAL,
then in a separate script.

> 
> > Same might be for amount of hugepage memory available to the user?
> 
> Ditto. Limitations like hugetlbfs quota, cgroup hugetlb, some are used
> by app themself (more like an artificial argument) ...
> >
> >>>   From other side looking at the patch itself:
> >>> You are updating lcore_count and lcore_config[],based on physical cpu availability,
> >>> but these days it is not always one-to-one mapping between EAL lcore and physical cpu.
> >>> Shouldn't that be taken into account?
> >> I have not see the problem so far, because this work is done before
> >> parsing coremask (-c), corelist (-l), and coremap (--lcores). If a core
> >> is disabled here, it's like it is not detected in rte_eal_cpu_init(). Or
> >> could you please give more hints?
> > I didn't test try changes, so probably I am missing something.
> > Let say iuser allowed to use only cpus 0-3.
> > If he would type with:
> >   --avail-cores  --lcores='(1-7)@2',
> > then only lcores 1-3 would be started.
> > Again if user would specify '2@(1-7)' it would also be undetected
> > that cpus 4-7 are note available to the user.
> > Is that so?
> 
> After reading the code:
> For case --lcores='(1-7)@2', lcores 1-7 would be started, and bind to
> pcore 2.
> For case --lcores='2@(1-7)', this will fail with "core 4 unavailable".
> 
> It's because:
> a.  although 1:1 mapping is built-up and flagged as detected if pcore is
> found in sysfs. (ROLE_RTE, cpuset, detected is true)
> b. in the beginning of eal_parse_lcores(), "reset lcore config".
> (ROLE_OFF, cpuset is empty, detected is still true)
> c. pcore cpuset will be checked by convert_to_cpuset using the previous
> "detected" value.

Ok, my bad then - I misunderstood the code.
Thanks for explanation.
So if I get it right now - first inside lib/librte_eal/common/eal_common_lcore.c
Both lcore_count and lcore_config relate to the pcpus.
Then later, at lib/librte_eal/common/eal_common_options.c
they are overwritten related to lcores information.
Except lcore_config[].detected, which seems kept intact.
Is that correct? 

> 
> I have tested it with the patch. Result aligns above analysis.
> For case --lcores='(1-7)@2': sudo taskset 0xf
> ./examples/helloworld/build/helloworld --avail-cores --lcores='(1-7)@2'
> ...
> hello from core 2
> hello from core 3
> hello from core 4
> hello from core 5
> hello from core 6
> hello from core 7
> hello from core 1
> 
> For case --lcores='2@(1-7)': sudo taskset 0xf
> ./examples/helloworld/build/helloworld --avail-cores --lcores='2@(1-7)'
> ...
> EAL: core 4 unavailable
> EAL: invalid parameter for --lcores
> ...
> 
> One thing may worth mention: shall "detected" be maintained in struct
> lcore_config? Maybe we need to maintain an data structure for pcores?

Yes, it might be good to split pcpu and lcores information somehow,
as it is a bit confusing right now.
But I suppose this is a subject for another patch/discussion.
Konstantin




More information about the dev mailing list