[dpdk-dev] [PATCH v2] Change alarm cancel function to thread-safe:
Wodkowski, PawelX
pawelx.wodkowski at intel.com
Fri Sep 26 16:01:05 CEST 2014
> > > Maybe I don't see something obvious? :)
>
> I think you're missing the fact that your patch doesn't do what you assert above
> either :)
Issue is not in setting alarms but canceling it. If you look closer to my patch you
see that it address this issue (look at added *do { lock(); ....; unlock(); } while( )*
part).
>
> First, lets address rte_alarm_set. There is no notion of "re-arming" in this
> alarm implementation, because theres no ability to refer to a specific alarm
> from the callers perspective. When you call rte_eal_alarm_set you get a new
> alarm every time. So I don't really see a race there. It might not be exactly
> the behavior you want, but its not a race, becuase you're not modifying an
> alarm
> in the middle of execution, you're just creating a new alarm, which is safe.
OK, it is safe, but this is not the case.
>
> There is a race in what you describe above, insofar as its possible that you
> might call rte_eal_alarm_cancel and return without having canceled all the
> matching alarms. I don't see any clear documentation on what the behavior is
> supposed to be, but if you want to ensure that all matching alarms are cancelled
> or complete on return from rte_eal_alarm_cancel, thats perfectly fine (in linux
> API parlance, thats usually denoted as a cancel_sync operation).
Again, look at the patch. I changed documentation to inform about this behavior.
>
> For that race condition, you're correct, my patch doesn't address it, I see that
> now. Though your patch doesn't either. If you call rte_eal_alarm_cancel from
> within a callback function, then, by definition, you can't wait on the
> completion of the active alarm, because thats a deadlock. Its a necessecary
> evil, I grant you, but it means that you can't be guaranteed the cancelled and
> complete (cancel_sync) behavior that you want, at least not with the current
> api. If you want that behavior, you need to do one of two things:
This patch does not break any API. It only removes undefined behavior.
>
> 1) Modify the api to allow callers to individually reference timer instances, so
> that when cancelling, we can return an appropriate return code to indicate to
> the caller that this alarm is in-progress. That way you can guarantee the
> caller that the specific alarm that you cancelled is either complete and cancelled
> or currently executing. Add an api to expicitly wait on a referenced alarm as
> well. This allows developers to know that, when executing an alarm callback, an
> -ECURRENTLYEXECUTING return code is ok, because they are in the currently
> executing context.
This would brake API for sure.
More information about the dev
mailing list