[dpdk-dev] [PATCH v4 02/10] eal: add power management intrinsics

Ananyev, Konstantin konstantin.ananyev at intel.com
Sat Oct 10 15:19:25 CEST 2020



> >>>>>> Add two new power management intrinsics, and provide an implementation
> >>>>>> in eal/x86 based on UMONITOR/UMWAIT instructions. The instructions
> >>>>>> are implemented as raw byte opcodes because there is not yet widespread
> >>>>>> compiler support for these instructions.
> >>>>>>
> >>>>>> The power management instructions provide an architecture-specific
> >>>>>> function to either wait until a specified TSC timestamp is reached, or
> >>>>>> optionally wait until either a TSC timestamp is reached or a memory
> >>>>>> location is written to. The monitor function also provides an optional
> >>>>>> comparison, to avoid sleeping when the expected write has already
> >>>>>> happened, and no more writes are expected.
> >>>>>
> >>>>> I think what this API is missing - a function to wakeup sleeping core.
> >>>>> If user can/should use some system call to achieve that, then at least
> >>>>> it has to be clearly documented, even better some wrapper provided.
> >>>>
> >>>> I don't think it's possible to do that without severely overcomplicating
> >>>> the intrinsic and its usage, because AFAIK the only way to wake up a
> >>>> sleeping core would be to send some kind of interrupt to the core, or
> >>>> trigger a write to the cache-line in question.
> >>>>
> >>>
> >>> Yes, I think we either need a syscall that would do an IPI for us
> >>> (on top of my head - membarrier() does that, might be there are some other syscalls too),
> >>> or something hand-made. For hand-made, I wonder would something like that
> >>> be safe and sufficient:
> >>> uint64_t val = atomic_load(addr);
> >>> CAS(addr, val, &val);
> >>> ?
> >>> Anyway, one way or another - I think ability to wakeup core we put to sleep
> >>> have to be an essential part of this feature.
> >>> As I understand linux kernel will limit max amount of sleep time for these instructions:
> >>> https://lwn.net/Articles/790920/
> >>> But relying just on that, seems too vague for me:
> >>> - user can adjust that value
> >>> - wouldn't apply to older kernels and non-linux cases
> >>> Konstantin
> >>>
> >>
> >> This implies knowing the value the core is sleeping on.
> >
> > You don't the value to wait for, you just need an address.
> > And you can make wakeup function to accept address as a parameter,
> > same as monitor() does.
> 
> Sorry, i meant the address. We don't know the address we're sleeping on.
> 
> >
> >> That's not
> >> always the case - with this particular PMD power management scheme, we
> >> get the address from the PMD and it stays inside the callback.
> >
> > That's fine - you can store address inside you callback metadata
> > and do wakeup as part of _disable_ function.
> >
> 
> The address may be different, and by the time we access the address it
> may become stale, so i don't see how that would help unless you're
> suggesting to have some kind of synchronization mechanism there.

Yes, we'll need something to sync here for sure.
Sorry, I should say it straightway, to avoid further misunderstanding.
Let say, associate a spin_lock with monitor(), by analogy with pthread_cond_wait().  
Konstantin


More information about the dev mailing list