[dpdk-dev] [PATCH v6 05/10] power: add PMD power management API and callback

Burakov, Anatoly anatoly.burakov at intel.com
Thu Oct 15 12:31:54 CEST 2020


On 14-Oct-20 7:41 PM, Ananyev, Konstantin wrote:
>> From: Liang Ma <liang.j.ma at intel.com>
>>
>> Add a simple on/off switch that will enable saving power when no
>> packets are arriving. It is based on counting the number of empty
>> polls and, when the number reaches a certain threshold, entering an
>> architecture-defined optimized power state that will either wait
>> until a TSC timestamp expires, or when packets arrive.
>>
>> This API mandates a core-to-single-queue mapping (that is, multiple
>> queued per device are supported, but they have to be polled on different
>> cores).
>>
>> This design is using PMD RX callbacks.
>>
>> 1. UMWAIT/UMONITOR:
>>
>>     When a certain threshold of empty polls is reached, the core will go
>>     into a power optimized sleep while waiting on an address of next RX
>>     descriptor to be written to.
>>
>> 2. Pause instruction
>>
>>     Instead of move the core into deeper C state, this method uses the
>>     pause instruction to avoid busy polling.
>>
>> 3. Frequency scaling
>>     Reuse existing DPDK power library to scale up/down core frequency
>>     depending on traffic volume.
>>
>> Signed-off-by: Liang Ma <liang.j.ma at intel.com>
>> Signed-off-by: Anatoly Burakov <anatoly.burakov at intel.com>
>> ---
>>
>> Notes:
>>      v6:
>>      - Added wakeup mechanism for UMWAIT
>>      - Removed memory allocation (everything is now allocated statically)
>>      - Fixed various typos and comments
>>      - Check for invalid queue ID
>>      - Moved release notes to this patch
>>
>>      v5:
>>      - Make error checking more robust
>>        - Prevent initializing scaling if ACPI or PSTATE env wasn't set
>>        - Prevent initializing UMWAIT path if PMD doesn't support get_wake_addr
>>      - Add some debug logging
>>      - Replace x86-specific code path to generic path using the intrinsic check
>>


<snip>

> 
> 
> I think you need to check state here, and _disable() have to set state with lock grabbed.
> Otherwise this lock wouldn't protect you from race conditions.
> As an example:
> 
> CP at T0:
> rte_spinlock_lock(&queue_cfg->umwait_lock);
> if (queue_cfg->wait_addr != NULL) //wait_addr == NULL, fallthrough
> rte_spinlock_unlock(&queue_cfg->umwait_lock);
> 
> DP at T1:
> rte_spinlock_lock(&queue_cfg->umwait_lock);
> queue_cfg->wait_addr = target_addr;
> monitor_sync(...);  // DP was put to sleep
> 
> CP at T2:
> queue_cfg->cur_cb = NULL;
> queue_cfg->pwr_mgmt_state = PMD_MGMT_DISABLED;
> ret = 0;
> 
> rte_power_pmd_mgmt_queue_disable() finished with success,
> but DP core wasn't wokenup.
> 
> To be more specific:
> clb_umwait(...) {
> ...
> lock(&qcfg->lck);
> if (qcfg->state == ENABLED)  {
> qcfg->wake_addr = addr;
> monitor_sync(addr, ...,&qcfg->lck);
> }
> unlock(&qcfg->lck);
> ...
> }
> 
> _disable(...) {
> ...
> lock(&qcfg->lck);
> qcfg->state = DISABLED;
> if (qcfg->wake_addr != NULL)
> monitor_wakeup(qcfg->wake_addr);
> unlock(&qcfg->lock);
> ...
> }

True, didn't think of that. Will fix.

>> +
>> +if (!i.power_monitor) {
>> +RTE_LOG(DEBUG, POWER, "Monitoring intrinsics are not supported\n");
>> +ret = -ENOTSUP;
>> +goto end;
>> +}
>> +
>> +/* check if the device supports the necessary PMD API */
>> +if (rte_eth_get_wake_addr(port_id, queue_id,
>> +&dummy_addr, &dummy_expected,
>> +&dummy_mask, &dummy_sz) == -ENOTSUP) {
>> +RTE_LOG(DEBUG, POWER, "The device does not support rte_eth_rxq_ring_addr_get\n");
>> +ret = -ENOTSUP;
>> +goto end;
>> +}
>> +/* initialize UMWAIT spinlock */
>> +rte_spinlock_init(&queue_cfg->umwait_lock);
> 
> I think don't need to do that.
> It supposed to be in valid state (otherwise you are probably in trouble anyway).

This is mostly for initialization, for when we first run the callback. I 
suppose we could do it in an RTE_INIT() function or just leave it be 
since the spinlocks are part of a statically allocated structure and 
will default to 0 anyway (although it wouldn't be proper usage of the 
API as that would be relying on implementation detail).

> 
>> +
>> +/* initialize data before enabling the callback */
>> +queue_cfg->empty_poll_stats = 0;
>> +queue_cfg->cb_mode = mode;
>> +queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
>> +
>> +queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id,
>> +clb_umwait, NULL);
> 
> Would be a bit cleaner/nicer to move add_rx_callback out of switch() {}
> As you have to do it always anyway.
> Same thought for disable() and remove_rx_callback().

The functions are different for each, so we can't move them out of 
switch (unless you're suggesting to unify the callback to handle all 
three modes?).

-- 
Thanks,
Anatoly


More information about the dev mailing list