[dpdk-dev] [PATCH v2 3/7] eal: add power monitor for multiple events

Burakov, Anatoly anatoly.burakov at intel.com
Mon Jun 28 14:43:48 CEST 2021


On 28-Jun-21 1:37 PM, Ananyev, Konstantin wrote:
> 
>> Use RTM and WAITPKG instructions to perform a wait-for-writes similar to
>> what UMWAIT does, but without the limitation of having to listen for
>> just one event. This works because the optimized power state used by the
>> TPAUSE instruction will cause a wake up on RTM transaction abort, so if
>> we add the addresses we're interested in to the read-set, any write to
>> those addresses will wake us up.
>>
>> Signed-off-by: Konstantin Ananyev <konstantin.ananyev at intel.com>
>> Signed-off-by: Anatoly Burakov <anatoly.burakov at intel.com>
>> ---
>>
>> Notes:
>>      v2:
>>      - Adapt to callback mechanism
>>
>>   doc/guides/rel_notes/release_21_08.rst        |  2 +
>>   lib/eal/arm/rte_power_intrinsics.c            | 11 +++
>>   lib/eal/include/generic/rte_cpuflags.h        |  2 +
>>   .../include/generic/rte_power_intrinsics.h    | 35 ++++++++++
>>   lib/eal/ppc/rte_power_intrinsics.c            | 11 +++
>>   lib/eal/version.map                           |  3 +
>>   lib/eal/x86/rte_cpuflags.c                    |  2 +
>>   lib/eal/x86/rte_power_intrinsics.c            | 69 +++++++++++++++++++
>>   8 files changed, 135 insertions(+)
>>
> ...
> 
>> diff --git a/lib/eal/x86/rte_power_intrinsics.c b/lib/eal/x86/rte_power_intrinsics.c
>> index 3c5c9ce7ad..3fc6f62ef5 100644
>> --- a/lib/eal/x86/rte_power_intrinsics.c
>> +++ b/lib/eal/x86/rte_power_intrinsics.c
>> @@ -4,6 +4,7 @@
>>
>>   #include <rte_common.h>
>>   #include <rte_lcore.h>
>> +#include <rte_rtm.h>
>>   #include <rte_spinlock.h>
>>
>>   #include "rte_power_intrinsics.h"
>> @@ -28,6 +29,7 @@ __umwait_wakeup(volatile void *addr)
>>   }
>>
>>   static bool wait_supported;
>> +static bool wait_multi_supported;
>>
>>   static inline uint64_t
>>   __get_umwait_val(const volatile void *p, const uint8_t sz)
>> @@ -164,6 +166,8 @@ RTE_INIT(rte_power_intrinsics_init) {
>>
>>        if (i.power_monitor && i.power_pause)
>>                wait_supported = 1;
>> +     if (i.power_monitor_multi)
>> +             wait_multi_supported = 1;
>>   }
>>
>>   int
>> @@ -202,6 +206,9 @@ rte_power_monitor_wakeup(const unsigned int lcore_id)
>>         * In this case, since we've already woken up, the "wakeup" was
>>         * unneeded, and since T1 is still waiting on T2 releasing the lock, the
>>         * wakeup address is still valid so it's perfectly safe to write it.
>> +      *
>> +      * For multi-monitor case, the act of locking will in itself trigger the
>> +      * wakeup, so no additional writes necessary.
>>         */
>>        rte_spinlock_lock(&s->lock);
>>        if (s->monitor_addr != NULL)
>> @@ -210,3 +217,65 @@ rte_power_monitor_wakeup(const unsigned int lcore_id)
>>
>>        return 0;
>>   }
>> +
>> +int
>> +rte_power_monitor_multi(const struct rte_power_monitor_cond pmc[],
>> +             const uint32_t num, const uint64_t tsc_timestamp)
>> +{
>> +     const unsigned int lcore_id = rte_lcore_id();
>> +     struct power_wait_status *s = &wait_status[lcore_id];
>> +     uint32_t i, rc;
>> +
>> +     /* check if supported */
>> +     if (!wait_multi_supported)
>> +             return -ENOTSUP;
>> +
>> +     if (pmc == NULL || num == 0)
>> +             return -EINVAL;
>> +
>> +     /* we are already inside transaction region, return */
>> +     if (rte_xtest() != 0)
>> +             return 0;
>> +
>> +     /* start new transaction region */
>> +     rc = rte_xbegin();
>> +
>> +     /* transaction abort, possible write to one of wait addresses */
>> +     if (rc != RTE_XBEGIN_STARTED)
>> +             return 0;
>> +
>> +     /*
>> +      * the mere act of reading the lock status here adds the lock to
>> +      * the read set. This means that when we trigger a wakeup from another
>> +      * thread, even if we don't have a defined wakeup address and thus don't
>> +      * actually cause any writes, the act of locking our lock will itself
>> +      * trigger the wakeup and abort the transaction.
>> +      */
>> +     rte_spinlock_is_locked(&s->lock);
>> +
>> +     /*
>> +      * add all addresses to wait on into transaction read-set and check if
>> +      * any of wakeup conditions are already met.
>> +      */
>> +     for (i = 0; i < num; i++) {
>> +             const struct rte_power_monitor_cond *c = &pmc[i];
>> +
>> +             if (pmc->fn == NULL)
> 
> Should be c->fn, I believe.

Yep, will fix.

> 
>> +                     continue;
> 
> Actually that way, if c->fn == NULL, we'll never add  our c->addr to monitored addresses.
> Is that what we really want?
> My thought was, that if callback is not set, we'll just go to power-save state without extra checking, no?
> Something like that:
> 
> const struct rte_power_monitor_cond *c = &pmc[i];
> const uint64_t val = __get_umwait_val(c->addr, c->size);
> 
> if (c->fn && c->fn(val, c->opaque) != 0)
>     break;

This is consistent with previous behavior of rte_power_monitor where if 
mask wasn't set we entered power save mode without any checks. If we do 
a break, that means the check condition has failed somewhere and we have 
to abort the sleep. Continue keeps the sleep.

> 
> Same thought for rte_power_monitor().
> 
>> +             const uint64_t val = __get_umwait_val(pmc->addr, pmc->size);
> 
> Same thing: s/pmc->/c->/

Yep, you're right.

> 
>> +
>> +             /* abort if callback indicates that we need to stop */
>> +             if (c->fn(val, c->opaque) != 0)
>> +                     break;
>> +     }
>> +
>> +     /* none of the conditions were met, sleep until timeout */
>> +     if (i == num)
>> +             rte_power_pause(tsc_timestamp);
>> +
>> +     /* end transaction region */
>> +     rte_xend();
>> +
>> +     return 0;
>> +}
>> --
>> 2.25.1
> 


-- 
Thanks,
Anatoly


More information about the dev mailing list