[dpdk-dev] [PATCH v3 09/12] service: avoid race condition for MT unsafe service

Van Haaren, Harry harry.van.haaren at intel.com
Tue Apr 21 19:43:40 CEST 2020


> -----Original Message-----
> From: Honnappa Nagarahalli <Honnappa.Nagarahalli at arm.com>
> Sent: Saturday, April 18, 2020 7:22 AM
> To: Van Haaren, Harry <harry.van.haaren at intel.com>; Phil Yang
> <Phil.Yang at arm.com>; thomas at monjalon.net; Ananyev, Konstantin
> <konstantin.ananyev at intel.com>; stephen at networkplumber.org;
> maxime.coquelin at redhat.com; dev at dpdk.org
> Cc: david.marchand at redhat.com; jerinj at marvell.com;
> hemant.agrawal at nxp.com; Gavin Hu <Gavin.Hu at arm.com>; Ruifeng Wang
> <Ruifeng.Wang at arm.com>; Joyce Kong <Joyce.Kong at arm.com>; nd
> <nd at arm.com>; stable at dpdk.org; nd <nd at arm.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli at arm.com>; nd <nd at arm.com>
> Subject: RE: [PATCH v3 09/12] service: avoid race condition for MT unsafe service
> 
> <snip>
<snip snip>
> > > > To achieve our desired goal;
> > > >  - control thread doing mapping performs the following operations
> > > >     -- write service->num_mapped_lcores++ (atomic not required, only
> > > > single- writer allowed by APIs)
> > > This has to be atomic because of rte_service_run_iter_on_app_lcore API.
> > > Performance should be fine as this API is not called frequently. But
> > > need to consider the implications of more than one thread updating
> > num_mapped_cores.
> > >
> > > >     -- MFENCE (full st-ld barrier) to flush write, and force later
> > > > loads to
> > > issue
> > > > after
> > > I am not exactly sure what MFENCE on x86 does. On Arm platforms, the
> > > full barrier (DMB ISH) just makes sure that memory operations are not
> > > re-ordered around it. It does not say anything about when that store
> > > is visible to other cores. It will be visible at some point in time to cores.
> > > But, I do not think we need to be worried about flushing to memory.
> > >
> > > >     -- read the "calls_per_service" counter for each lcores, add them up.
> > > This can be trimmed down to the single core the service is mapped to
> > > currently, no need to add all the counters.
> >
> > Correct - however that requires figuring out which lcore is running the service.
> > Anyway, agree - it's an implementation detail as to exactly how we detect it.
> >
> > >
> > > >     ---- Wait :)
> > > >     -- re-read the "calls_per_service", and ensure the count has changed.
> Here, there is an assumption that the service core function is running on the
> service core. If the service core is not running, the code will be stuck in this
> polling loop.

Right - we could add a timeout ehre, but that just moves the problem somewhere
else (the application) which now needs to handle error rets, and possibly retries.
It could be a possible solution.. I'm not in favour of it at the moment, but it
needs some more time to think.

> I could not come up with a good way to check if the service core is running.
> Checking the app_runstate and comp_runstate is not enough as they just
> indicate that the service is ready to run. Using the counter 'calls_per_service'
> introduces race conditions.
> 
> Only way I can think of is asking the user to follow a specific sequence of APIs to
> ensure the service core is running before calling rte_service_map_lcore_set.

Good point - I'm thinking about this - but haven't come to an obvious conclusion yet.
I'm considering other ways to detect the core is/isn't running, and also considering
just "high-jacking" the service function pointer temporarily with a CAS, which gives
some new options on avoiding threads entering the critical section.

As above, I don't have a good solution yet.

<snip irrelevant to above discussion stuff>


More information about the dev mailing list