[dpdk-dev] [PATCH v1 5/7] power: support callbacks for multiple Rx queues

Burakov, Anatoly anatoly.burakov at intel.com
Wed Jun 23 11:56:49 CEST 2021


On 23-Jun-21 10:49 AM, Ananyev, Konstantin wrote:
> 
>>
>> On 22-Jun-21 10:41 AM, Ananyev, Konstantin wrote:
>>>
>>>> Currently, there is a hard limitation on the PMD power management
>>>> support that only allows it to support a single queue per lcore. This is
>>>> not ideal as most DPDK use cases will poll multiple queues per core.
>>>>
>>>> The PMD power management mechanism relies on ethdev Rx callbacks, so it
>>>> is very difficult to implement such support because callbacks are
>>>> effectively stateless and have no visibility into what the other ethdev
>>>> devices are doing.  This places limitations on what we can do within the
>>>> framework of Rx callbacks, but the basics of this implementation are as
>>>> follows:
>>>>
>>>> - Replace per-queue structures with per-lcore ones, so that any device
>>>>     polled from the same lcore can share data
>>>> - Any queue that is going to be polled from a specific lcore has to be
>>>>     added to the list of cores to poll, so that the callback is aware of
>>>>     other queues being polled by the same lcore
>>>> - Both the empty poll counter and the actual power saving mechanism is
>>>>     shared between all queues polled on a particular lcore, and is only
>>>>     activated when a special designated "power saving" queue is polled. To
>>>>     put it another way, we have no idea which queue the user will poll in
>>>>     what order, so we rely on them telling us that queue X is the last one
>>>>     in the polling loop, so any power management should happen there.
>>>> - A new API is added to mark a specific Rx queue as "power saving".
>>>
>>> Honestly, I don't understand the logic behind that new function.
>>> I understand that depending on HW we ca monitor either one or multiple queues.
>>> That's ok, but why we now need to mark one queue as a 'very special' one?
>>
>> Because we don't know which of the queues we are supposed to sleep on.
>>
>> Imagine a situation where you have 3 queues. What usually happens is you
>> poll them in a loop, so q0, q1, q2, q0, q1, q2... etc. We only want to
>> enter power-optimized state on polling q2, because otherwise we're
>> risking going into power optimized state while q1 or q2 have traffic.
> 
> That's why before going to sleep we need to make sure that for *all* queues
> we have at least EMPTYPOLL_MAX empty polls.
> Then the order of queue checking wouldn't matter.
> With your example it should be:
> if (q1.empty_polls >  EMPTYPOLL_MAX && q2. empty_polls >  EMPTYPOLL_MAX &&
>       q3.empy_pools >  EMPTYPOLL_MAX)
>          goto_sleep;
> 
> Don't take me wrong, I am not suggesting to make *precisely* that checks
> in the actual code (it could be time consuming if number of checks is big),
> but the logic needs to remain.
> 

The empty poll counter is *per core*, not *per queue*. All the shared 
data is per core. We only increment empty poll counter on last queue, 
but we drop it to 0 on any queue that has received traffic. That way, we 
can avoid checking/incrementing empty poll counters for multiple queues. 
In other words, this is effectively achieving what you're suggesting, 
but without per-queue checks.

Of course, i could make it per-queue like before, but then we just end 
up doing way more checks on every callback and basically need to have 
the same logic anyway, so why bother?

>>
>> Worst case scenario, we enter sleep after polling q0, then traffic
>> arrives at q2, we wake up, and then attempt to go to sleep on q1 instead
>> of skipping it. Essentially, we will be attempting to sleep at every
>> queue, instead of once in a loop. This *might* be OK for multi-monitor
>> because we'll be aborting sleep due to sleep condition check failure,
>> but for modes like rte_pause()/rte_power_pause()-based sleep, we will be
>> entering sleep unconditionally, and will be risking to sleep at q1 while
>> there's traffic at q2.
>>
>> So, we need this mechanism to be activated once every *loop*, not per queue.
>>
>>> Why can't rte_power_ethdev_pmgmt_queue_enable() just:
>>> Check is number of monitored queues exceed HW/SW capabilities,
>>> and if so then just return a failure.
>>> Otherwise add queue to the list and treat them all equally, i.e:
>>> go to power save mode when number of sequential empty polls on
>>> all monitored queues will exceed EMPTYPOLL_MAX threshold?
>>>
>>>>     Failing to call this API will result in no power management, however
>>>>     when having only one queue per core it is obvious which queue is the
>>>>     "power saving" one, so things will still work without this new API for
>>>>     use cases that were previously working without it.
>>>> - The limitation on UMWAIT-based polling is not removed because UMWAIT
>>>>     is incapable of monitoring more than one address.
>>>>
>>>> Signed-off-by: Anatoly Burakov <anatoly.burakov at intel.com>
>>>> ---
>>>>    lib/power/rte_power_pmd_mgmt.c | 335 ++++++++++++++++++++++++++-------
>>>>    lib/power/rte_power_pmd_mgmt.h |  34 ++++
>>>>    lib/power/version.map          |   3 +
>>>>    3 files changed, 306 insertions(+), 66 deletions(-)
>>>>
>>>> diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c
>>>> index 0707c60a4f..60dd21a19c 100644
>>>> --- a/lib/power/rte_power_pmd_mgmt.c
>>>> +++ b/lib/power/rte_power_pmd_mgmt.c
>>>> @@ -33,7 +33,19 @@ enum pmd_mgmt_state {
>>>>         PMD_MGMT_ENABLED
>>>>    };
>>>>
>>>> -struct pmd_queue_cfg {
>>>> +struct queue {
>>>> +     uint16_t portid;
>>>> +     uint16_t qid;
>>>> +};
>>>
>>> Just a thought: if that would help somehow, it can be changed to:
>>> union queue {
>>>           uint32_t raw;
>>>           struct { uint16_t portid, qid;
>>>           };
>>> };
>>>
>>> That way in queue find/cmp functions below you can operate with single raw 32-bt values.
>>> Probably not that important, as all these functions are on slow path, but might look nicer.
>>
>> Sure, that can work. We actually do comparisons with power save queue on
>> fast path, so maybe that'll help.
>>
>>>
>>>> +struct pmd_core_cfg {
>>>> +     struct queue queues[RTE_MAX_ETHPORTS];
>>>
>>> If we'll have ability to monitor multiple queues per lcore, would it be always enough?
>>>   From other side, it is updated on control path only.
>>> Wouldn't normal list with malloc(/rte_malloc) would be more suitable here?
>>
>> You're right, it should be dynamically allocated.
>>
>>>
>>>> +     /**< Which port-queue pairs are associated with this lcore? */
>>>> +     struct queue power_save_queue;
>>>> +     /**< When polling multiple queues, all but this one will be ignored */
>>>> +     bool power_save_queue_set;
>>>> +     /**< When polling multiple queues, power save queue must be set */
>>>> +     size_t n_queues;
>>>> +     /**< How many queues are in the list? */
>>>>         volatile enum pmd_mgmt_state pwr_mgmt_state;
>>>>         /**< State of power management for this queue */
>>>>         enum rte_power_pmd_mgmt_type cb_mode;
>>>> @@ -43,8 +55,97 @@ struct pmd_queue_cfg {
>>>>         uint64_t empty_poll_stats;
>>>>         /**< Number of empty polls */
>>>>    } __rte_cache_aligned;
>>>> +static struct pmd_core_cfg lcore_cfg[RTE_MAX_LCORE];
>>>>
>>>> -static struct pmd_queue_cfg port_cfg[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
>>>> +static inline bool
>>>> +queue_equal(const struct queue *l, const struct queue *r)
>>>> +{
>>>> +     return l->portid == r->portid && l->qid == r->qid;
>>>> +}
>>>> +
>>>> +static inline void
>>>> +queue_copy(struct queue *dst, const struct queue *src)
>>>> +{
>>>> +     dst->portid = src->portid;
>>>> +     dst->qid = src->qid;
>>>> +}
>>>> +
>>>> +static inline bool
>>>> +queue_is_power_save(const struct pmd_core_cfg *cfg, const struct queue *q) {
>>>
>>> Here and in other places - any reason why standard DPDK coding style is not used?
>>
>> Just accidental :)
>>
>>>
>>>> +     const struct queue *pwrsave = &cfg->power_save_queue;
>>>> +
>>>> +     /* if there's only single queue, no need to check anything */
>>>> +     if (cfg->n_queues == 1)
>>>> +             return true;
>>>> +     return cfg->power_save_queue_set && queue_equal(q, pwrsave);
>>>> +}
>>>> +
>>
>>
>> --
>> Thanks,
>> Anatoly


-- 
Thanks,
Anatoly


More information about the dev mailing list