[dpdk-dev] [PATCH 0/3] *** timer library enhancements ***

Wiles, Keith keith.wiles at intel.com
Wed Aug 23 23:04:47 CEST 2017


> On Aug 23, 2017, at 2:28 PM, Carrillo, Erik G <erik.g.carrillo at intel.com> wrote:
> 
>> 
>> -----Original Message-----
>> From: Wiles, Keith
>> Sent: Wednesday, August 23, 2017 11:50 AM
>> To: Carrillo, Erik G <erik.g.carrillo at intel.com>
>> Cc: rsanford at akamai.com; dev at dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH 0/3] *** timer library enhancements ***
>> 
>> 
>>> On Aug 23, 2017, at 11:19 AM, Carrillo, Erik G <erik.g.carrillo at intel.com>
>> wrote:
>>> 
>>> 
>>> 
>>>> -----Original Message-----
>>>> From: Wiles, Keith
>>>> Sent: Wednesday, August 23, 2017 10:02 AM
>>>> To: Carrillo, Erik G <erik.g.carrillo at intel.com>
>>>> Cc: rsanford at akamai.com; dev at dpdk.org
>>>> Subject: Re: [dpdk-dev] [PATCH 0/3] *** timer library enhancements
>>>> ***
>>>> 
>>>> 
>>>>> On Aug 23, 2017, at 9:47 AM, Gabriel Carrillo
>>>>> <erik.g.carrillo at intel.com>
>>>> wrote:
>>>>> 
>>>>> In the current implementation of the DPDK timer library, timers can
>>>>> be created and set to be handled by a target lcore by adding it to a
>>>>> skiplist that corresponds to that lcore.  However, if an application
>>>>> enables multiple lcores, and each of these lcores repeatedly
>>>>> attempts to install timers on the same target lcore, overall
>>>>> application throughput will be reduced as all lcores contend to
>>>>> acquire the lock guarding the single skiplist of pending timers.
>>>>> 
>>>>> This patchset addresses this scenario by adding an array of
>>>>> skiplists to each lcore's priv_timer struct, such that when lcore i
>>>>> installs a timer on lcore k, the timer will be added to the ith
>>>>> skiplist for lcore k.  If lcore j installs a timer on lcore k
>>>>> simultaneously, lcores i and j can both proceed since they will be
>>>>> acquiring different locks for different lists.
>>>>> 
>>>>> When lcore k processes its pending timers, it will traverse each
>>>>> skiplist in its array and acquire a skiplist's lock while a run list
>>>>> is broken out; meanwhile, all other lists can continue to be modified.
>>>>> Then, all run lists for lcore k are collected and traversed together
>>>>> so timers are executed in their global order.
>>>> 
>>>> What is the performance and/or latency added to the timeout now?
>>>> 
>>>> I worry about the case when just about all of the cores are enabled,
>>>> which could be as high was 128 or more now.
>>> 
>>> There is a case in the timer_perf_autotest that runs rte_timer_manage
>> with zero timers that can give a sense of the added latency.   When run with
>> one lcore, it completes in around 25 cycles.  When run with 43 lcores (the
>> highest I have access to at the moment), rte_timer_mange completes in
>> around 155 cycles.  So it looks like each added lcore adds around 3 cycles of
>> overhead for checking empty lists in my testing.
>> 
>> Does this mean we have only 25 cycles on the current design or is the 25
>> cycles for the new design?
>> 
> 
> Both - when run with one lcore, the new design becomes equivalent to the original one.  I tested the current design to confirm.

Good thanks

> 
>> If for the new design, then what is the old design cost compared to the new
>> cost.
>> 
>> I also think we need the call to a timer function in the calculation, just to
>> make sure we have at least one timer in the list and we account for any short
>> cuts in the code for no timers active.
>> 
> 
> Looking at the numbers for non-empty lists in timer_perf_autotest, the overhead appears to fall away.  Here are some representative runs for timer_perf_autotest:
> 
> 43 lcores enabled, installing 1M timers on an lcore and processing them with current design:
> 
> <...snipped...>
> Appending 1000000 timers
> Time for 1000000 timers: 424066294 (193ms), Time per timer: 424 (0us)
> Time for 1000000 callbacks: 73124504 (33ms), Time per callback: 73 (0us)
> Resetting 1000000 timers
> Time for 1000000 timers: 1406756396 (641ms), Time per timer: 1406 (1us)
> <...snipped...>
> 
> 43 lcores enabled, installing 1M timers on an lcore and processing them with proposed design:
> 
> <...snipped...>
> Appending 1000000 timers
> Time for 1000000 timers: 382912762 (174ms), Time per timer: 382 (0us)
> Time for 1000000 callbacks: 79194418 (36ms), Time per callback: 79 (0us)
> Resetting 1000000 timers
> Time for 1000000 timers: 1427189116 (650ms), Time per timer: 1427 (1us)
> <...snipped…>

it looks ok then. The main concern I had was the timers in Pktgen and someone telling the jitter increase or latency or performance. I guess I will just have to wait an see. 

> 
> The above are not averages, so the numbers don't really indicate which is faster, but they show that the overhead of the proposed design should not be appreciable.
> 
>>> 
>>>> 
>>>> One option is to have the lcore j that wants to install a timer on
>>>> lcore k to pass a message via a ring to lcore k to add that timer. We
>>>> could even add that logic into setting a timer on a different lcore
>>>> then the caller in the current API. The ring would be a multi-producer and
>> single consumer, we still have the lock.
>>>> What am I missing here?
>>>> 
>>> 
>>> I did try this approach: initially I had a multi-producer single-consumer ring
>> that would hold requests to add or delete a timer from lcore k's skiplist, but it
>> didn't really give an appreciable increase in my test application throughput.
>> In profiling this solution, the hotspot had moved from acquiring the skiplist's
>> spinlock to the rte_atomic32_cmpset that the multiple-producer ring code
>> uses to manipulate the head pointer.
>>> 
>>> Then, I tried multiple single-producer single-consumer rings per target
>> lcore.  This removed the ring hotspot, but the performance didn't increase as
>> much as with the proposed solution. These solutions also add overhead to
>> rte_timer_manage, as it would have to process the rings and then process
>> the skiplists.
>>> 
>>> One other thing to note is that a solution that uses such messages changes
>> the use models for the timer.  One interesting example is:
>>> - lcore I enqueues a message to install a timer on lcore k
>>> - lcore k runs rte_timer_manage, processes its messages and adds the
>>> timer to its list
>>> - lcore I then enqueues a message to stop the same timer, now owned by
>>> lcore k
>>> - lcore k does not run rte_timer_manage again
>>> - lcore I wants to free the timer but it might not be safe
>> 
>> This case seems like a mistake to me as lcore k should continue to call
>> rte_timer_manager() to process any new timers from other lcores not just
>> the case where the list becomes empty and lcore k does not add timer to his
>> list.
>> 
>>> 
>>> Even though lcore I has successfully enqueued the request to stop the
>> timer (and delete it from lcore k's pending list), it hasn't actually been
>> deleted from the list yet,  so freeing it could corrupt the list.  This case exists
>> in the existing timer stress tests.
>>> 
>>> Another interesting scenario is:
>>> - lcore I resets a timer to install it on lcore k
>>> - lcore j resets the same timer to install it on lcore k
>>> - then, lcore k runs timer_manage
>> 
>> This one also seems like a mistake, more then one lcore setting the same
>> timer seems like a problem and should not be done. A lcore should own a
>> timer and no other lcore should be able to change that timer. If multiple
>> lcores need a timer then they should not share the same timer structure.
>> 
> 
> Both of the above cases exist in the timer library stress tests, so a solution would presumably need to address them or it would be less flexible.  The original design passed these tests, as does the proposed one.

I get this twitch when one lcore is adding timers to another lcore as I come from a realtime OS background, but I guess if no one else cares or finds a problem I will have to live with it. Having a test for something does not make it a good test or a reasonable reason to continue a design issue. We can make any test work, but is it right is the real question and we will just have to wait an see I guess.

> 
>>> 
>>> Lcore j's message obviates lcore i's message, and it would be wasted work
>> for lcore k to process it, so we should mark it to be skipped over.   Handling all
>> the edge cases was more complex than the solution proposed.
>> 
>> Hmmm, to me it seems simple here as long as the lcores follow the same
>> rules and sharing a timer structure is very risky and avoidable IMO.
>> 
>> Once you have lcores adding timers to another lcore then all accesses to that
>> skip list must be serialized or you get unpredictable results. This should also
>> fix most of the edge cases you are talking about.
>> 
>> Also it seems to me the case with an lcore adding timers to another lcore
>> timer list is a specific use case and could be handled by a different set of APIs
>> for that specific use case. Then we do not need to change the current design
>> and all of the overhead is placed on the new APIs/design. IMO we are
>> turning the current timer design into a global timer design as it really is a per
>> lcore design today and I beleive that is a mistake.
>> 
> 
> Well, the original API explicitly supports installing a timer to be executed on a different lcore, and there are no API changes in the patchset.  Also, the proposed design keeps the per-lcore design intact;  it only takes what used to be one large skiplist that held timers for all installing lcores, and separates it into N skiplists that correspond 1:1 to an installing lcore.  When an lcore processes timers on its lists it will still only be managing timers it owns, and no others.


Having an API to explicitly support some feature is not a reason to keep something, but I think you have reduce my twitching some :-) so I will let it go.

Thanks for the information.

>  
> 
>>> 
>>>>> 
>>>>> Gabriel Carrillo (3):
>>>>> timer: add per-installer pending lists for each lcore
>>>>> timer: handle timers installed from non-EAL threads
>>>>> doc: update timer lib docs
>>>>> 
>>>>> doc/guides/prog_guide/timer_lib.rst |  19 ++-
>>>>> lib/librte_timer/rte_timer.c        | 329 +++++++++++++++++++++++------
>> ---
>>>> ----
>>>>> lib/librte_timer/rte_timer.h        |   9 +-
>>>>> 3 files changed, 231 insertions(+), 126 deletions(-)
>>>>> 
>>>>> --
>>>>> 2.6.4
>>>>> 
>>>> 
>>>> Regards,
>>>> Keith
>> 
>> Regards,
>> Keith

Regards,
Keith



More information about the dev mailing list