[dpdk-dev] [PATCH 0/3] * timer library enhancements *

Wiles, Keith keith.wiles at intel.com
Wed Aug 23 18:50:14 CEST 2017
Previous message: [dpdk-dev] [PATCH 0/3] *** timer library enhancements ***
Next message: [dpdk-dev] [PATCH 0/3] *** timer library enhancements ***
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
> On Aug 23, 2017, at 11:19 AM, Carrillo, Erik G <erik.g.carrillo at intel.com> wrote:
> 
> 
> 
>> -----Original Message-----
>> From: Wiles, Keith
>> Sent: Wednesday, August 23, 2017 10:02 AM
>> To: Carrillo, Erik G <erik.g.carrillo at intel.com>
>> Cc: rsanford at akamai.com; dev at dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH 0/3] *** timer library enhancements ***
>> 
>> 
>>> On Aug 23, 2017, at 9:47 AM, Gabriel Carrillo <erik.g.carrillo at intel.com>
>> wrote:
>>> 
>>> In the current implementation of the DPDK timer library, timers can be
>>> created and set to be handled by a target lcore by adding it to a
>>> skiplist that corresponds to that lcore.  However, if an application
>>> enables multiple lcores, and each of these lcores repeatedly attempts
>>> to install timers on the same target lcore, overall application
>>> throughput will be reduced as all lcores contend to acquire the lock
>>> guarding the single skiplist of pending timers.
>>> 
>>> This patchset addresses this scenario by adding an array of skiplists
>>> to each lcore's priv_timer struct, such that when lcore i installs a
>>> timer on lcore k, the timer will be added to the ith skiplist for
>>> lcore k.  If lcore j installs a timer on lcore k simultaneously,
>>> lcores i and j can both proceed since they will be acquiring different
>>> locks for different lists.
>>> 
>>> When lcore k processes its pending timers, it will traverse each
>>> skiplist in its array and acquire a skiplist's lock while a run list
>>> is broken out; meanwhile, all other lists can continue to be modified.
>>> Then, all run lists for lcore k are collected and traversed together
>>> so timers are executed in their global order.
>> 
>> What is the performance and/or latency added to the timeout now?
>> 
>> I worry about the case when just about all of the cores are enabled, which
>> could be as high was 128 or more now.
> 
> There is a case in the timer_perf_autotest that runs rte_timer_manage with zero timers that can give a sense of the added latency.   When run with one lcore, it completes in around 25 cycles.  When run with 43 lcores (the highest I have access to at the moment), rte_timer_mange completes in around 155 cycles.  So it looks like each added lcore adds around 3 cycles of overhead for checking empty lists in my testing.

Does this mean we have only 25 cycles on the current design or is the 25 cycles for the new design?

If for the new design, then what is the old design cost compared to the new cost.

I also think we need the call to a timer function in the calculation, just to make sure we have at least one timer in the list and we account for any short cuts in the code for no timers active.

> 
>> 
>> One option is to have the lcore j that wants to install a timer on lcore k to pass
>> a message via a ring to lcore k to add that timer. We could even add that logic
>> into setting a timer on a different lcore then the caller in the current API. The
>> ring would be a multi-producer and single consumer, we still have the lock.
>> What am I missing here?
>> 
> 
> I did try this approach: initially I had a multi-producer single-consumer ring that would hold requests to add or delete a timer from lcore k's skiplist, but it didn't really give an appreciable increase in my test application throughput.  In profiling this solution, the hotspot had moved from acquiring the skiplist's spinlock to the rte_atomic32_cmpset that the multiple-producer ring code uses to manipulate the head pointer.
> 
> Then, I tried multiple single-producer single-consumer rings per target lcore.  This removed the ring hotspot, but the performance didn't increase as much as with the proposed solution. These solutions also add overhead to rte_timer_manage, as it would have to process the rings and then process the skiplists.
> 
> One other thing to note is that a solution that uses such messages changes the use models for the timer.  One interesting example is:  
> - lcore I enqueues a message to install a timer on lcore k
> - lcore k runs rte_timer_manage, processes its messages and adds the timer to its list
> - lcore I then enqueues a message to stop the same timer, now owned by lcore k
> - lcore k does not run rte_timer_manage again
> - lcore I wants to free the timer but it might not be safe

This case seems like a mistake to me as lcore k should continue to call rte_timer_manager() to process any new timers from other lcores not just the case where the list becomes empty and lcore k does not add timer to his list.

> 
> Even though lcore I has successfully enqueued the request to stop the timer (and delete it from lcore k's pending list), it hasn't actually been deleted from the list yet,  so freeing it could corrupt the list.  This case exists in the existing timer stress tests.
> 
> Another interesting scenario is:
> - lcore I resets a timer to install it on lcore k
> - lcore j resets the same timer to install it on lcore k
> - then, lcore k runs timer_manage

This one also seems like a mistake, more then one lcore setting the same timer seems like a problem and should not be done. A lcore should own a timer and no other lcore should be able to change that timer. If multiple lcores need a timer then they should not share the same timer structure.

> 
> Lcore j's message obviates lcore i's message, and it would be wasted work for lcore k to process it, so we should mark it to be skipped over.   Handling all the edge cases was more complex than the solution proposed.

Hmmm, to me it seems simple here as long as the lcores follow the same rules and sharing a timer structure is very risky and avoidable IMO.

Once you have lcores adding timers to another lcore then all accesses to that skip list must be serialized or you get unpredictable results. This should also fix most of the edge cases you are talking about.

Also it seems to me the case with an lcore adding timers to another lcore timer list is a specific use case and could be handled by a different set of APIs for that specific use case. Then we do not need to change the current design and all of the overhead is placed on the new APIs/design. IMO we are turning the current timer design into a global timer design as it really is a per lcore design today and I beleive that is a mistake.

> 
>>> 
>>> Gabriel Carrillo (3):
>>> timer: add per-installer pending lists for each lcore
>>> timer: handle timers installed from non-EAL threads
>>> doc: update timer lib docs
>>> 
>>> doc/guides/prog_guide/timer_lib.rst |  19 ++-
>>> lib/librte_timer/rte_timer.c        | 329 +++++++++++++++++++++++---------
>> ----
>>> lib/librte_timer/rte_timer.h        |   9 +-
>>> 3 files changed, 231 insertions(+), 126 deletions(-)
>>> 
>>> --
>>> 2.6.4
>>> 
>> 
>> Regards,
>> Keith

Regards,
Keith
Previous message: [dpdk-dev] [PATCH 0/3] *** timer library enhancements ***
Next message: [dpdk-dev] [PATCH 0/3] *** timer library enhancements ***
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the dev mailing list