[dpdk-dev] [PATCH v4 02/10] eal: add power management intrinsics

Burakov, Anatoly anatoly.burakov at intel.com
Mon Oct 12 12:36:46 CEST 2020


On 12-Oct-20 11:35 AM, Burakov, Anatoly wrote:
> On 10-Oct-20 2:19 PM, Ananyev, Konstantin wrote:
>>
>>
>>>>>>>>> Add two new power management intrinsics, and provide an 
>>>>>>>>> implementation
>>>>>>>>> in eal/x86 based on UMONITOR/UMWAIT instructions. The instructions
>>>>>>>>> are implemented as raw byte opcodes because there is not yet 
>>>>>>>>> widespread
>>>>>>>>> compiler support for these instructions.
>>>>>>>>>
>>>>>>>>> The power management instructions provide an architecture-specific
>>>>>>>>> function to either wait until a specified TSC timestamp is 
>>>>>>>>> reached, or
>>>>>>>>> optionally wait until either a TSC timestamp is reached or a 
>>>>>>>>> memory
>>>>>>>>> location is written to. The monitor function also provides an 
>>>>>>>>> optional
>>>>>>>>> comparison, to avoid sleeping when the expected write has already
>>>>>>>>> happened, and no more writes are expected.
>>>>>>>>
>>>>>>>> I think what this API is missing - a function to wakeup sleeping 
>>>>>>>> core.
>>>>>>>> If user can/should use some system call to achieve that, then at 
>>>>>>>> least
>>>>>>>> it has to be clearly documented, even better some wrapper provided.
>>>>>>>
>>>>>>> I don't think it's possible to do that without severely 
>>>>>>> overcomplicating
>>>>>>> the intrinsic and its usage, because AFAIK the only way to wake up a
>>>>>>> sleeping core would be to send some kind of interrupt to the 
>>>>>>> core, or
>>>>>>> trigger a write to the cache-line in question.
>>>>>>>
>>>>>>
>>>>>> Yes, I think we either need a syscall that would do an IPI for us
>>>>>> (on top of my head - membarrier() does that, might be there are 
>>>>>> some other syscalls too),
>>>>>> or something hand-made. For hand-made, I wonder would something 
>>>>>> like that
>>>>>> be safe and sufficient:
>>>>>> uint64_t val = atomic_load(addr);
>>>>>> CAS(addr, val, &val);
>>>>>> ?
>>>>>> Anyway, one way or another - I think ability to wakeup core we put 
>>>>>> to sleep
>>>>>> have to be an essential part of this feature.
>>>>>> As I understand linux kernel will limit max amount of sleep time 
>>>>>> for these instructions:
>>>>>> https://lwn.net/Articles/790920/
>>>>>> But relying just on that, seems too vague for me:
>>>>>> - user can adjust that value
>>>>>> - wouldn't apply to older kernels and non-linux cases
>>>>>> Konstantin
>>>>>>
>>>>>
>>>>> This implies knowing the value the core is sleeping on.
>>>>
>>>> You don't the value to wait for, you just need an address.
>>>> And you can make wakeup function to accept address as a parameter,
>>>> same as monitor() does.
>>>
>>> Sorry, i meant the address. We don't know the address we're sleeping on.
>>>
>>>>
>>>>> That's not
>>>>> always the case - with this particular PMD power management scheme, we
>>>>> get the address from the PMD and it stays inside the callback.
>>>>
>>>> That's fine - you can store address inside you callback metadata
>>>> and do wakeup as part of _disable_ function.
>>>>
>>>
>>> The address may be different, and by the time we access the address it
>>> may become stale, so i don't see how that would help unless you're
>>> suggesting to have some kind of synchronization mechanism there.
>>
>> Yes, we'll need something to sync here for sure.
>> Sorry, I should say it straightway, to avoid further misunderstanding.
>> Let say, associate a spin_lock with monitor(), by analogy with 
>> pthread_cond_wait().
>> Konstantin
>>
> 
> The idea was to provide an intrinsic-like function - as in, raw 
> instruction call, without anything extra. We even added the masks/values 
> etc. only because there's no race-less way to combine UMONITOR/UMWAIT 
> without those.
> 
> Perhaps we can provide a synchronize-able wrapper around it to avoid 
> adding overhead to calls that function but doesn't need the sync mechanism?
> 

Also, how would having a spinlock help to synchronize? Are you 
suggesting we do UMWAIT on a spinlock address, or something to that effect?

-- 
Thanks,
Anatoly


More information about the dev mailing list