[dpdk-dev] [PATCH v2 3/7] eal: add power monitor for multiple events
Burakov, Anatoly
anatoly.burakov at intel.com
Mon Jun 28 15:29:20 CEST 2021
On 28-Jun-21 1:58 PM, Ananyev, Konstantin wrote:
>
>> On 28-Jun-21 1:37 PM, Ananyev, Konstantin wrote:
>>>
>>>> Use RTM and WAITPKG instructions to perform a wait-for-writes similar to
>>>> what UMWAIT does, but without the limitation of having to listen for
>>>> just one event. This works because the optimized power state used by the
>>>> TPAUSE instruction will cause a wake up on RTM transaction abort, so if
>>>> we add the addresses we're interested in to the read-set, any write to
>>>> those addresses will wake us up.
>>>>
>>>> Signed-off-by: Konstantin Ananyev <konstantin.ananyev at intel.com>
>>>> Signed-off-by: Anatoly Burakov <anatoly.burakov at intel.com>
>>>> ---
>>>>
>>>> Notes:
>>>> v2:
>>>> - Adapt to callback mechanism
>>>>
>>>> doc/guides/rel_notes/release_21_08.rst | 2 +
>>>> lib/eal/arm/rte_power_intrinsics.c | 11 +++
>>>> lib/eal/include/generic/rte_cpuflags.h | 2 +
>>>> .../include/generic/rte_power_intrinsics.h | 35 ++++++++++
>>>> lib/eal/ppc/rte_power_intrinsics.c | 11 +++
>>>> lib/eal/version.map | 3 +
>>>> lib/eal/x86/rte_cpuflags.c | 2 +
>>>> lib/eal/x86/rte_power_intrinsics.c | 69 +++++++++++++++++++
>>>> 8 files changed, 135 insertions(+)
>>>>
>>> ...
>>>
>>>> diff --git a/lib/eal/x86/rte_power_intrinsics.c b/lib/eal/x86/rte_power_intrinsics.c
>>>> index 3c5c9ce7ad..3fc6f62ef5 100644
>>>> --- a/lib/eal/x86/rte_power_intrinsics.c
>>>> +++ b/lib/eal/x86/rte_power_intrinsics.c
>>>> @@ -4,6 +4,7 @@
>>>>
>>>> #include <rte_common.h>
>>>> #include <rte_lcore.h>
>>>> +#include <rte_rtm.h>
>>>> #include <rte_spinlock.h>
>>>>
>>>> #include "rte_power_intrinsics.h"
>>>> @@ -28,6 +29,7 @@ __umwait_wakeup(volatile void *addr)
>>>> }
>>>>
>>>> static bool wait_supported;
>>>> +static bool wait_multi_supported;
>>>>
>>>> static inline uint64_t
>>>> __get_umwait_val(const volatile void *p, const uint8_t sz)
>>>> @@ -164,6 +166,8 @@ RTE_INIT(rte_power_intrinsics_init) {
>>>>
>>>> if (i.power_monitor && i.power_pause)
>>>> wait_supported = 1;
>>>> + if (i.power_monitor_multi)
>>>> + wait_multi_supported = 1;
>>>> }
>>>>
>>>> int
>>>> @@ -202,6 +206,9 @@ rte_power_monitor_wakeup(const unsigned int lcore_id)
>>>> * In this case, since we've already woken up, the "wakeup" was
>>>> * unneeded, and since T1 is still waiting on T2 releasing the lock, the
>>>> * wakeup address is still valid so it's perfectly safe to write it.
>>>> + *
>>>> + * For multi-monitor case, the act of locking will in itself trigger the
>>>> + * wakeup, so no additional writes necessary.
>>>> */
>>>> rte_spinlock_lock(&s->lock);
>>>> if (s->monitor_addr != NULL)
>>>> @@ -210,3 +217,65 @@ rte_power_monitor_wakeup(const unsigned int lcore_id)
>>>>
>>>> return 0;
>>>> }
>>>> +
>>>> +int
>>>> +rte_power_monitor_multi(const struct rte_power_monitor_cond pmc[],
>>>> + const uint32_t num, const uint64_t tsc_timestamp)
>>>> +{
>>>> + const unsigned int lcore_id = rte_lcore_id();
>>>> + struct power_wait_status *s = &wait_status[lcore_id];
>>>> + uint32_t i, rc;
>>>> +
>>>> + /* check if supported */
>>>> + if (!wait_multi_supported)
>>>> + return -ENOTSUP;
>>>> +
>>>> + if (pmc == NULL || num == 0)
>>>> + return -EINVAL;
>>>> +
>>>> + /* we are already inside transaction region, return */
>>>> + if (rte_xtest() != 0)
>>>> + return 0;
>>>> +
>>>> + /* start new transaction region */
>>>> + rc = rte_xbegin();
>>>> +
>>>> + /* transaction abort, possible write to one of wait addresses */
>>>> + if (rc != RTE_XBEGIN_STARTED)
>>>> + return 0;
>>>> +
>>>> + /*
>>>> + * the mere act of reading the lock status here adds the lock to
>>>> + * the read set. This means that when we trigger a wakeup from another
>>>> + * thread, even if we don't have a defined wakeup address and thus don't
>>>> + * actually cause any writes, the act of locking our lock will itself
>>>> + * trigger the wakeup and abort the transaction.
>>>> + */
>>>> + rte_spinlock_is_locked(&s->lock);
>>>> +
>>>> + /*
>>>> + * add all addresses to wait on into transaction read-set and check if
>>>> + * any of wakeup conditions are already met.
>>>> + */
>>>> + for (i = 0; i < num; i++) {
>>>> + const struct rte_power_monitor_cond *c = &pmc[i];
>>>> +
>>>> + if (pmc->fn == NULL)
>>>
>>> Should be c->fn, I believe.
>>
>> Yep, will fix.
>>
>>>
>>>> + continue;
>>>
>>> Actually that way, if c->fn == NULL, we'll never add our c->addr to monitored addresses.
>>> Is that what we really want?
>>> My thought was, that if callback is not set, we'll just go to power-save state without extra checking, no?
>>> Something like that:
>>>
>>> const struct rte_power_monitor_cond *c = &pmc[i];
>>> const uint64_t val = __get_umwait_val(c->addr, c->size);
>>>
>>> if (c->fn && c->fn(val, c->opaque) != 0)
>>> break;
>>
>> This is consistent with previous behavior of rte_power_monitor where if
>> mask wasn't set we entered power save mode without any checks. If we do
>> a break, that means the check condition has failed somewhere and we have
>> to abort the sleep. Continue keeps the sleep.
>
> Ok, so what is current intention?
> If pmc->fn == NULL what does it mean:
> 1) pmc->addr shouldn't be monitored at all?
> 2) pmc->addr should be monitored unconditionally
> 3) pmc->fn should never be NULL and monitor should return an error
> 3) something else?
>
> For me 1) looks really strange, if user doesn't want to sleep on that address,
> he can just not add this addr to pmc[].
Ah, i see what you mean now. While we can skip the comparison, we still
want to read the address. So, i would guess 2) should be true.
>
> 2) is probably ok... but is that really needed?
> User can just provide NOP as a callback and it would be the same.
>
> 3) seems like a most sane to be.
In that case, we have to do the same in the singular power monitor
function - fn set to NULL should return an error. Will fix in v4 (v3
went out before i noticed your comments :/ ).
--
Thanks,
Anatoly
More information about the dev
mailing list