[dpdk-dev] [PATCH v2 3/7] eal: add power monitor for multiple events
Burakov, Anatoly
anatoly.burakov at intel.com
Mon Jun 28 14:43:48 CEST 2021
On 28-Jun-21 1:37 PM, Ananyev, Konstantin wrote:
>
>> Use RTM and WAITPKG instructions to perform a wait-for-writes similar to
>> what UMWAIT does, but without the limitation of having to listen for
>> just one event. This works because the optimized power state used by the
>> TPAUSE instruction will cause a wake up on RTM transaction abort, so if
>> we add the addresses we're interested in to the read-set, any write to
>> those addresses will wake us up.
>>
>> Signed-off-by: Konstantin Ananyev <konstantin.ananyev at intel.com>
>> Signed-off-by: Anatoly Burakov <anatoly.burakov at intel.com>
>> ---
>>
>> Notes:
>> v2:
>> - Adapt to callback mechanism
>>
>> doc/guides/rel_notes/release_21_08.rst | 2 +
>> lib/eal/arm/rte_power_intrinsics.c | 11 +++
>> lib/eal/include/generic/rte_cpuflags.h | 2 +
>> .../include/generic/rte_power_intrinsics.h | 35 ++++++++++
>> lib/eal/ppc/rte_power_intrinsics.c | 11 +++
>> lib/eal/version.map | 3 +
>> lib/eal/x86/rte_cpuflags.c | 2 +
>> lib/eal/x86/rte_power_intrinsics.c | 69 +++++++++++++++++++
>> 8 files changed, 135 insertions(+)
>>
> ...
>
>> diff --git a/lib/eal/x86/rte_power_intrinsics.c b/lib/eal/x86/rte_power_intrinsics.c
>> index 3c5c9ce7ad..3fc6f62ef5 100644
>> --- a/lib/eal/x86/rte_power_intrinsics.c
>> +++ b/lib/eal/x86/rte_power_intrinsics.c
>> @@ -4,6 +4,7 @@
>>
>> #include <rte_common.h>
>> #include <rte_lcore.h>
>> +#include <rte_rtm.h>
>> #include <rte_spinlock.h>
>>
>> #include "rte_power_intrinsics.h"
>> @@ -28,6 +29,7 @@ __umwait_wakeup(volatile void *addr)
>> }
>>
>> static bool wait_supported;
>> +static bool wait_multi_supported;
>>
>> static inline uint64_t
>> __get_umwait_val(const volatile void *p, const uint8_t sz)
>> @@ -164,6 +166,8 @@ RTE_INIT(rte_power_intrinsics_init) {
>>
>> if (i.power_monitor && i.power_pause)
>> wait_supported = 1;
>> + if (i.power_monitor_multi)
>> + wait_multi_supported = 1;
>> }
>>
>> int
>> @@ -202,6 +206,9 @@ rte_power_monitor_wakeup(const unsigned int lcore_id)
>> * In this case, since we've already woken up, the "wakeup" was
>> * unneeded, and since T1 is still waiting on T2 releasing the lock, the
>> * wakeup address is still valid so it's perfectly safe to write it.
>> + *
>> + * For multi-monitor case, the act of locking will in itself trigger the
>> + * wakeup, so no additional writes necessary.
>> */
>> rte_spinlock_lock(&s->lock);
>> if (s->monitor_addr != NULL)
>> @@ -210,3 +217,65 @@ rte_power_monitor_wakeup(const unsigned int lcore_id)
>>
>> return 0;
>> }
>> +
>> +int
>> +rte_power_monitor_multi(const struct rte_power_monitor_cond pmc[],
>> + const uint32_t num, const uint64_t tsc_timestamp)
>> +{
>> + const unsigned int lcore_id = rte_lcore_id();
>> + struct power_wait_status *s = &wait_status[lcore_id];
>> + uint32_t i, rc;
>> +
>> + /* check if supported */
>> + if (!wait_multi_supported)
>> + return -ENOTSUP;
>> +
>> + if (pmc == NULL || num == 0)
>> + return -EINVAL;
>> +
>> + /* we are already inside transaction region, return */
>> + if (rte_xtest() != 0)
>> + return 0;
>> +
>> + /* start new transaction region */
>> + rc = rte_xbegin();
>> +
>> + /* transaction abort, possible write to one of wait addresses */
>> + if (rc != RTE_XBEGIN_STARTED)
>> + return 0;
>> +
>> + /*
>> + * the mere act of reading the lock status here adds the lock to
>> + * the read set. This means that when we trigger a wakeup from another
>> + * thread, even if we don't have a defined wakeup address and thus don't
>> + * actually cause any writes, the act of locking our lock will itself
>> + * trigger the wakeup and abort the transaction.
>> + */
>> + rte_spinlock_is_locked(&s->lock);
>> +
>> + /*
>> + * add all addresses to wait on into transaction read-set and check if
>> + * any of wakeup conditions are already met.
>> + */
>> + for (i = 0; i < num; i++) {
>> + const struct rte_power_monitor_cond *c = &pmc[i];
>> +
>> + if (pmc->fn == NULL)
>
> Should be c->fn, I believe.
Yep, will fix.
>
>> + continue;
>
> Actually that way, if c->fn == NULL, we'll never add our c->addr to monitored addresses.
> Is that what we really want?
> My thought was, that if callback is not set, we'll just go to power-save state without extra checking, no?
> Something like that:
>
> const struct rte_power_monitor_cond *c = &pmc[i];
> const uint64_t val = __get_umwait_val(c->addr, c->size);
>
> if (c->fn && c->fn(val, c->opaque) != 0)
> break;
This is consistent with previous behavior of rte_power_monitor where if
mask wasn't set we entered power save mode without any checks. If we do
a break, that means the check condition has failed somewhere and we have
to abort the sleep. Continue keeps the sleep.
>
> Same thought for rte_power_monitor().
>
>> + const uint64_t val = __get_umwait_val(pmc->addr, pmc->size);
>
> Same thing: s/pmc->/c->/
Yep, you're right.
>
>> +
>> + /* abort if callback indicates that we need to stop */
>> + if (c->fn(val, c->opaque) != 0)
>> + break;
>> + }
>> +
>> + /* none of the conditions were met, sleep until timeout */
>> + if (i == num)
>> + rte_power_pause(tsc_timestamp);
>> +
>> + /* end transaction region */
>> + rte_xend();
>> +
>> + return 0;
>> +}
>> --
>> 2.25.1
>
--
Thanks,
Anatoly
More information about the dev
mailing list