[dpdk-dev] [PATCH v2] Change alarm cancel function to thread-safe:

Wodkowski, PawelX pawelx.wodkowski at intel.com
Fri Sep 26 16:01:05 CEST 2014


> > > Maybe I don't see something obvious? :)
> 
> I think you're missing the fact that your patch doesn't do what you assert above
> either :)

Issue is not in setting alarms but canceling it. If you look closer to my patch you
see that it address this issue (look at added *do { lock(); ....; unlock(); } while( )* 
part).

> 
> First, lets address rte_alarm_set.  There is no notion of "re-arming" in this
> alarm implementation, because theres no ability to refer to a specific alarm
> from the callers perspective.  When you call rte_eal_alarm_set you get a new
> alarm every time.  So I don't really see a race there.  It might not be exactly
> the behavior you want, but its not a race, becuase you're not modifying an
> alarm
> in the middle of execution, you're just creating a new alarm, which is safe.

OK, it is safe, but this is not the case.

> 
> There is a race in what you describe above, insofar as its possible that you
> might call rte_eal_alarm_cancel and return without having canceled all the
> matching alarms.  I don't see any clear documentation on what the behavior is
> supposed to be, but if you want to ensure that all matching alarms are cancelled
> or complete on return from rte_eal_alarm_cancel, thats perfectly fine (in linux
> API parlance, thats usually denoted as a cancel_sync operation).

Again, look at the patch. I changed documentation to inform about this behavior.

> 
> For that race condition, you're correct, my patch doesn't address it, I see that
> now.  Though your patch doesn't either.  If you call rte_eal_alarm_cancel from
> within a callback function, then, by definition, you can't wait on the
> completion of the active alarm, because thats a deadlock.  Its a necessecary
> evil, I grant you, but it means that you can't be guaranteed the cancelled and
> complete (cancel_sync) behavior that you want, at least not with the current
> api.  If you want that behavior, you need to do one of two things:

This patch does not break any API. It only removes undefined behavior.

> 
> 1) Modify the api to allow callers to individually reference timer instances, so
> that when cancelling, we can return an appropriate return code to indicate to
> the caller that this alarm is in-progress.  That way you can guarantee the
> caller that the specific alarm that you cancelled is either complete and cancelled
> or currently executing.  Add an api to expicitly wait on a referenced alarm as
> well.  This allows developers to know that, when executing an alarm callback, an
> -ECURRENTLYEXECUTING return code is ok, because they are in the currently
> executing context.

This would brake API for sure.



More information about the dev mailing list