[dpdk-dev] [PATCH v16 07/11] power: add PMD power management API and callback

Burakov, Anatoly anatoly.burakov at intel.com
Wed Jan 13 18:29:04 CET 2021


On 13-Jan-21 12:58 PM, Ananyev, Konstantin wrote:
> 
> 
>> -----Original Message-----
>> From: Burakov, Anatoly <anatoly.burakov at intel.com>
>> Sent: Tuesday, January 12, 2021 5:37 PM
>> To: dev at dpdk.org
>> Cc: Ma, Liang J <liang.j.ma at intel.com>; Hunt, David <david.hunt at intel.com>; Ray Kinsella <mdr at ashroe.eu>; Neil Horman
>> <nhorman at tuxdriver.com>; thomas at monjalon.net; Ananyev, Konstantin <konstantin.ananyev at intel.com>; McDaniel, Timothy
>> <timothy.mcdaniel at intel.com>; Richardson, Bruce <bruce.richardson at intel.com>; Macnamara, Chris <chris.macnamara at intel.com>
>> Subject: [PATCH v16 07/11] power: add PMD power management API and callback
>>
>> From: Liang Ma <liang.j.ma at intel.com>
>>
>> Add a simple on/off switch that will enable saving power when no
>> packets are arriving. It is based on counting the number of empty
>> polls and, when the number reaches a certain threshold, entering an
>> architecture-defined optimized power state that will either wait
>> until a TSC timestamp expires, or when packets arrive.
>>
>> This API mandates a core-to-single-queue mapping (that is, multiple
>> queued per device are supported, but they have to be polled on different
>> cores).
>>
>> This design is using PMD RX callbacks.
>>
>> 1. UMWAIT/UMONITOR:
>>
>>     When a certain threshold of empty polls is reached, the core will go
>>     into a power optimized sleep while waiting on an address of next RX
>>     descriptor to be written to.
>>
>> 2. TPAUSE/Pause instruction
>>
>>     This method uses the pause (or TPAUSE, if available) instruction to
>>     avoid busy polling.
>>
>> 3. Frequency scaling
>>     Reuse existing DPDK power library to scale up/down core frequency
>>     depending on traffic volume.
>>
>> Signed-off-by: Liang Ma <liang.j.ma at intel.com>
>> Signed-off-by: Anatoly Burakov <anatoly.burakov at intel.com>
>> ---
>>
>> Notes:
>>      v15:
>>      - Fix check in UMWAIT callback
>>
>>      v13:
>>      - Rework the synchronization mechanism to not require locking
>>      - Add more parameter checking
>>      - Rework n_rx_queues access to not go through internal PMD structures and use
>>        public API instead
>>
>>      v13:
>>      - Rework the synchronization mechanism to not require locking
>>      - Add more parameter checking
>>      - Rework n_rx_queues access to not go through internal PMD structures and use
>>        public API instead
>>
>>   doc/guides/prog_guide/power_man.rst    |  44 +++
>>   doc/guides/rel_notes/release_21_02.rst |  10 +
>>   lib/librte_power/meson.build           |   5 +-
>>   lib/librte_power/rte_power_pmd_mgmt.c  | 359 +++++++++++++++++++++++++
>>   lib/librte_power/rte_power_pmd_mgmt.h  |  90 +++++++
>>   lib/librte_power/version.map           |   5 +
>>   6 files changed, 511 insertions(+), 2 deletions(-)
>>   create mode 100644 lib/librte_power/rte_power_pmd_mgmt.c
>>   create mode 100644 lib/librte_power/rte_power_pmd_mgmt.h
>>
> 
> ...
> 
>> +
>> +static uint16_t
>> +clb_umwait(uint16_t port_id, uint16_t qidx, struct rte_mbuf **pkts __rte_unused,
>> +uint16_t nb_rx, uint16_t max_pkts __rte_unused,
>> +void *addr __rte_unused)
>> +{
>> +
>> +struct pmd_queue_cfg *q_conf;
>> +
>> +q_conf = &port_cfg[port_id][qidx];
>> +
>> +if (unlikely(nb_rx == 0)) {
>> +q_conf->empty_poll_stats++;
>> +if (unlikely(q_conf->empty_poll_stats > EMPTYPOLL_MAX)) {
>> +struct rte_power_monitor_cond pmc;
>> +uint16_t ret;
>> +
>> +/*
>> + * we might get a cancellation request while being
>> + * inside the callback, in which case the wakeup
>> + * wouldn't work because it would've arrived too early.
>> + *
>> + * to get around this, we notify the other thread that
>> + * we're sleeping, so that it can spin until we're done.
>> + * unsolicited wakeups are perfectly safe.
>> + */
>> +q_conf->umwait_in_progress = true;
> 
> This write and subsequent read can be reordered by the cpu.
> I think you need rte_atomic_thread_fence(__ATOMIC_SEQ_CST) here and
> in disable() code-path below.
> 
>> +
>> +/* check if we need to cancel sleep */
>> +if (q_conf->pwr_mgmt_state == PMD_MGMT_ENABLED) {
>> +/* use monitoring condition to sleep */
>> +ret = rte_eth_get_monitor_addr(port_id, qidx,
>> +&pmc);
>> +if (ret == 0)
>> +rte_power_monitor(&pmc, -1ULL);
>> +}
>> +q_conf->umwait_in_progress = false;
>> +}
>> +} else
>> +q_conf->empty_poll_stats = 0;
>> +
>> +return nb_rx;
>> +}
>> +
> 
> ...
> 
>> +
>> +int
>> +rte_power_pmd_mgmt_queue_disable(unsigned int lcore_id,
>> +uint16_t port_id, uint16_t queue_id)
>> +{
>> +struct pmd_queue_cfg *queue_cfg;
>> +
>> +RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
>> +
>> +if (lcore_id >= RTE_MAX_LCORE || queue_id >= RTE_MAX_QUEUES_PER_PORT)
>> +return -EINVAL;
>> +
>> +/* no need to check queue id as wrong queue id would not be enabled */
>> +queue_cfg = &port_cfg[port_id][queue_id];
>> +
>> +if (queue_cfg->pwr_mgmt_state != PMD_MGMT_ENABLED)
>> +return -EINVAL;
>> +
>> +/* let the callback know we're shutting down */
>> +queue_cfg->pwr_mgmt_state = PMD_MGMT_BUSY;
> 
> Same as above - write to pwr_mgmt_state and read from umwait_in_progress
> could be reordered by cpu.
> Need to insert rte_atomic_thread_fence(__ATOMIC_SEQ_CST) between them.
> 
> BTW, out of curiosity - why do you need this intermediate
> state (PMD_MGMT_BUSY) at all?
> Why not directly:
> queue_cfg->pwr_mgmt_state = PMD_MGMT_DISABLED;
> ?
> 

Thanks for suggestions, i'll add those.

The goal for the "intermediate" step is to prevent Rx callback from 
sleeping in the first place. We can't "wake up" earlier than it goes to 
sleep, but we may get a request to disable power management while we're 
at the beginning of the callback and haven't yet entered the 
rte_power_monitor code.

In this case, setting it to "BUSY" will prevent the callback from ever 
sleeping in the first place (see rte_power_pmd_mgmt:108 check), and will 
unset the "umwait in progress" if there was any.

So, we have three situations to handle:

1) wake up during umwait
2) "wake up" during callback after we've set the "umwait in progress" 
flag but before actual umwait happens - we don't wait to exit before 
we're sure there's nothing sleeping there
3) "wake up" during callback before we set the "umwait in progress" flag

1) is handled by the rte_power_monitor_wakeup() call, so that's taken 
care of. 2) is handled by the other thread waiting on "umwait in 
progress" becoming false. 3) is handled by having this BUSY check in the 
umwait thread.

Hope that made sense!

-- 
Thanks,
Anatoly


More information about the dev mailing list