[dpdk-dev] [PATCH v2 0/3] timer: fix rte_timer_manage and improve unit tests
thomas.monjalon at 6wind.com
Mon Aug 3 00:06:38 CEST 2015
2015-07-27 18:46, rsanford2 at gmail.com:
> From: Robert Sanford <rsanford at akamai.com>
> This patchset fixes a bug in timer stress test 2, adds a new stress test
> to expose a race condition bug in API rte_timer_manage(), and then fixes
> the rte_timer_manage() bug.
> Description of rte_timer_manage() race condition bug: Through code
> inspection, we notice a potential problem in rte_timer_manage() that
> leads to corruption of per-lcore pending-lists (implemented as
> skip-lists). The race condition occurs when rte_timer_manage() expires
> multiple timers on lcore A, while lcore B simultaneously invokes
> rte_timer_reset() for one of the expiring timers (other than the first
> Lcore A splits its pending-list, creating a local list of expired timers
> linked through their sl_next pointers, and sets the first expired
> timer to the RUNNING state, all during one list-lock round trip.
> Lcore A then unlocks the list-lock to run the first callback, and that
> is when A and B can have different interpretations of the subsequent
> expired timers' true state. Lcore B sees an expired timer still in the
> PENDING state, atomically changes the timer to the CONFIG state, locks
> lcore A's list-lock, and reinserts the timer into A's pending-list.
> The two lcores try to use the same next-pointers to maintain both lists!
> v2 changes:
> Move patch descriptions to their respective patches.
> Correct checkpatch warnings.
More information about the dev