[dpdk-dev] [PATCH] service: fix race in service on app lcore function

Van Haaren, Harry harry.van.haaren at intel.com
Wed Nov 1 18:59:51 CET 2017

Previous message: [dpdk-dev] [PATCH] service: fix race in service on app lcore function
Next message: [dpdk-dev] [PATCH] service: fix race in service on app lcore function
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

> From: Richardson, Bruce
> Sent: Wednesday, November 1, 2017 5:09 PM
> To: Van Haaren, Harry <harry.van.haaren at intel.com>
> Cc: dev at dpdk.org; pbhagavatula at caviumnetworks.com; thomas at monjalon.net
> Subject: Re: [dpdk-dev] [PATCH] service: fix race in service on app lcore
> function
> 
> On Tue, Oct 31, 2017 at 11:49:02AM +0000, Harry van Haaren wrote:
> > This commit fixes a possible race condition if an application
> > uses the service-cores infrastructure and the function to run
> > a service on an application lcore at the same time.
> >
> > The fix is to change the num_mapped_cores variable to be an
> > atomic variable. This causes concurrent accesses by multiple
> > threads to a service using rte_service_run_iter_on_app_lcore()
> > to detect if another core is currently mapped to the service,
> > and refuses to run if it is not multi-thread safe.
> >
> > No performance impact is expected as the mappings for the
> > service-cores changes at control-path frequency, hence the
> > change from an ordinary write to an atomic write will not
> > have any significant impact.
> >
> > Two unit tests were added to verify the behaviour of the
> > function to run a service on an application core, testing both
> > a multi-thread safe service, and a multi-thread unsafe service.
> >
> > The doxygen API documentation for the function has been updated
> > to reflect the current and correct behaviour.
> >
> > Fixes: e9139a32f6e8 ("service: add function to run on app lcore")
> >
> > Signed-off-by: Harry van Haaren <harry.van.haaren at intel.com>
> >
> > ---
> >
> <snip>
> @@ -381,8 +381,28 @@ service_run(uint32_t i, struct core_state *cs, uint64_t
> service_mask)
> >  int32_t rte_service_run_iter_on_app_lcore(uint32_t id)
> >  {
> >  	/* run service on calling core, using all-ones as the service mask */
> > +	if (!service_valid(id))
> > +		return -EINVAL;
> > +
> >  	struct core_state *cs = &lcore_states[rte_lcore_id()];
> > -	return service_run(id, cs, UINT64_MAX);
> > +	struct rte_service_spec_impl *s = &rte_services[id];
> > +
> > +	/* Atomically add this core to the mapped cores first, then examine if
> > +	 * we can run the service. This avoids a race condition between
> > +	 * checking the value, and atomically adding to the mapped count.
> > +	 */
> > +	rte_atomic32_inc(&s->num_mapped_cores);
> > +
> > +	if (service_mt_safe(s) == 0 &&
> > +			rte_atomic32_read(&s->num_mapped_cores) > 1) {
> > +		rte_atomic32_dec(&s->num_mapped_cores);
> > +		return -EBUSY;
> > +	}
> > +
> > +	int ret = service_run(id, cs, UINT64_MAX);
> > +	rte_atomic32_dec(&s->num_mapped_cores);
> > +
> > +	return ret;
> >  }
> 
> Do we really need to do an atomic inc and dec in this function? If we
> check that there are no service cores mapped, would that not be enough
> for safety? If an app core is calling a service, the control plane is
> unlikely to decide to start spawning off a service core for that
> service simultaneously. Manipulating the count is safer, yes, but unlike
> the other changes in this patch, this one will affect performance, so I
> think we can go without. Similarly, for multiple data plane threads
> calling the same service simultaneously: everything else dataplane is
> done without locks so I think this should be too.


I think the inc-dec is required with the current code; what if *two* application
cores are calling the same service? If we don't have the atomic inc-dec of the
num-mapped-cores, both app cores could simultaneously run a multi-thread unsafe service.

The other option is to demand that the *app* must serialize access to a particular service
using this function - but in my eyes that limits the usefulness of this function, because
it pushes the problem up a level instead of solving it.

I guess an application "in the know" of how its cores are acting would prefer to not
have the atomics at the service-cores library level. As a compromise, the use of these
atomics could be a parameter to the rte_service_run_iter_on_app_lcore() function.

That would allow the application to choose if it wants to avail of the functionality
in the service cores library, or if it takes the responsibility to serialize access
to multi-thread unsafe functions itself.


I'll spin up a v2 with a parameter to the function.


Thanks for review, -Harry

Previous message: [dpdk-dev] [PATCH] service: fix race in service on app lcore function
Next message: [dpdk-dev] [PATCH] service: fix race in service on app lcore function
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the dev mailing list