[dpdk-dev] [PATCH v16 07/11] power: add PMD power management API and callback
Ananyev, Konstantin
konstantin.ananyev at intel.com
Wed Jan 13 13:58:56 CET 2021
> -----Original Message-----
> From: Burakov, Anatoly <anatoly.burakov at intel.com>
> Sent: Tuesday, January 12, 2021 5:37 PM
> To: dev at dpdk.org
> Cc: Ma, Liang J <liang.j.ma at intel.com>; Hunt, David <david.hunt at intel.com>; Ray Kinsella <mdr at ashroe.eu>; Neil Horman
> <nhorman at tuxdriver.com>; thomas at monjalon.net; Ananyev, Konstantin <konstantin.ananyev at intel.com>; McDaniel, Timothy
> <timothy.mcdaniel at intel.com>; Richardson, Bruce <bruce.richardson at intel.com>; Macnamara, Chris <chris.macnamara at intel.com>
> Subject: [PATCH v16 07/11] power: add PMD power management API and callback
>
> From: Liang Ma <liang.j.ma at intel.com>
>
> Add a simple on/off switch that will enable saving power when no
> packets are arriving. It is based on counting the number of empty
> polls and, when the number reaches a certain threshold, entering an
> architecture-defined optimized power state that will either wait
> until a TSC timestamp expires, or when packets arrive.
>
> This API mandates a core-to-single-queue mapping (that is, multiple
> queued per device are supported, but they have to be polled on different
> cores).
>
> This design is using PMD RX callbacks.
>
> 1. UMWAIT/UMONITOR:
>
> When a certain threshold of empty polls is reached, the core will go
> into a power optimized sleep while waiting on an address of next RX
> descriptor to be written to.
>
> 2. TPAUSE/Pause instruction
>
> This method uses the pause (or TPAUSE, if available) instruction to
> avoid busy polling.
>
> 3. Frequency scaling
> Reuse existing DPDK power library to scale up/down core frequency
> depending on traffic volume.
>
> Signed-off-by: Liang Ma <liang.j.ma at intel.com>
> Signed-off-by: Anatoly Burakov <anatoly.burakov at intel.com>
> ---
>
> Notes:
> v15:
> - Fix check in UMWAIT callback
>
> v13:
> - Rework the synchronization mechanism to not require locking
> - Add more parameter checking
> - Rework n_rx_queues access to not go through internal PMD structures and use
> public API instead
>
> v13:
> - Rework the synchronization mechanism to not require locking
> - Add more parameter checking
> - Rework n_rx_queues access to not go through internal PMD structures and use
> public API instead
>
> doc/guides/prog_guide/power_man.rst | 44 +++
> doc/guides/rel_notes/release_21_02.rst | 10 +
> lib/librte_power/meson.build | 5 +-
> lib/librte_power/rte_power_pmd_mgmt.c | 359 +++++++++++++++++++++++++
> lib/librte_power/rte_power_pmd_mgmt.h | 90 +++++++
> lib/librte_power/version.map | 5 +
> 6 files changed, 511 insertions(+), 2 deletions(-)
> create mode 100644 lib/librte_power/rte_power_pmd_mgmt.c
> create mode 100644 lib/librte_power/rte_power_pmd_mgmt.h
>
...
> +
> +static uint16_t
> +clb_umwait(uint16_t port_id, uint16_t qidx, struct rte_mbuf **pkts __rte_unused,
> + uint16_t nb_rx, uint16_t max_pkts __rte_unused,
> + void *addr __rte_unused)
> +{
> +
> + struct pmd_queue_cfg *q_conf;
> +
> + q_conf = &port_cfg[port_id][qidx];
> +
> + if (unlikely(nb_rx == 0)) {
> + q_conf->empty_poll_stats++;
> + if (unlikely(q_conf->empty_poll_stats > EMPTYPOLL_MAX)) {
> + struct rte_power_monitor_cond pmc;
> + uint16_t ret;
> +
> + /*
> + * we might get a cancellation request while being
> + * inside the callback, in which case the wakeup
> + * wouldn't work because it would've arrived too early.
> + *
> + * to get around this, we notify the other thread that
> + * we're sleeping, so that it can spin until we're done.
> + * unsolicited wakeups are perfectly safe.
> + */
> + q_conf->umwait_in_progress = true;
This write and subsequent read can be reordered by the cpu.
I think you need rte_atomic_thread_fence(__ATOMIC_SEQ_CST) here and
in disable() code-path below.
> +
> + /* check if we need to cancel sleep */
> + if (q_conf->pwr_mgmt_state == PMD_MGMT_ENABLED) {
> + /* use monitoring condition to sleep */
> + ret = rte_eth_get_monitor_addr(port_id, qidx,
> + &pmc);
> + if (ret == 0)
> + rte_power_monitor(&pmc, -1ULL);
> + }
> + q_conf->umwait_in_progress = false;
> + }
> + } else
> + q_conf->empty_poll_stats = 0;
> +
> + return nb_rx;
> +}
> +
...
> +
> +int
> +rte_power_pmd_mgmt_queue_disable(unsigned int lcore_id,
> + uint16_t port_id, uint16_t queue_id)
> +{
> + struct pmd_queue_cfg *queue_cfg;
> +
> + RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
> +
> + if (lcore_id >= RTE_MAX_LCORE || queue_id >= RTE_MAX_QUEUES_PER_PORT)
> + return -EINVAL;
> +
> + /* no need to check queue id as wrong queue id would not be enabled */
> + queue_cfg = &port_cfg[port_id][queue_id];
> +
> + if (queue_cfg->pwr_mgmt_state != PMD_MGMT_ENABLED)
> + return -EINVAL;
> +
> + /* let the callback know we're shutting down */
> + queue_cfg->pwr_mgmt_state = PMD_MGMT_BUSY;
Same as above - write to pwr_mgmt_state and read from umwait_in_progress
could be reordered by cpu.
Need to insert rte_atomic_thread_fence(__ATOMIC_SEQ_CST) between them.
BTW, out of curiosity - why do you need this intermediate
state (PMD_MGMT_BUSY) at all?
Why not directly:
queue_cfg->pwr_mgmt_state = PMD_MGMT_DISABLED;
?
> +
> + switch (queue_cfg->cb_mode) {
> + case RTE_POWER_MGMT_TYPE_MONITOR:
> + {
> + bool exit = false;
> + do {
> + /*
> + * we may request cancellation while the other thread
> + * has just entered the callback but hasn't started
> + * sleeping yet, so keep waking it up until we know it's
> + * done sleeping.
> + */
> + if (queue_cfg->umwait_in_progress)
> + rte_power_monitor_wakeup(lcore_id);
> + else
> + exit = true;
> + } while (!exit);
> + }
> + /* fall-through */
> + case RTE_POWER_MGMT_TYPE_PAUSE:
> + rte_eth_remove_rx_callback(port_id, queue_id,
> + queue_cfg->cur_cb);
> + break;
> + case RTE_POWER_MGMT_TYPE_SCALE:
> + rte_power_freq_max(lcore_id);
> + rte_eth_remove_rx_callback(port_id, queue_id,
> + queue_cfg->cur_cb);
> + rte_power_exit(lcore_id);
> + break;
> + }
> + /*
> + * we don't free the RX callback here because it is unsafe to do so
> + * unless we know for a fact that all data plane threads have stopped.
> + */
> + queue_cfg->cur_cb = NULL;
> + queue_cfg->pwr_mgmt_state = PMD_MGMT_DISABLED;
> +
> + return 0;
> +}
> diff --git a/lib/librte_power/rte_power_pmd_mgmt.h b/lib/librte_power/rte_power_pmd_mgmt.h
> new file mode 100644
> index 0000000000..0bfbc6ba69
> --- /dev/null
> +++ b/lib/librte_power/rte_power_pmd_mgmt.h
> @@ -0,0 +1,90 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2010-2020 Intel Corporation
> + */
> +
> +#ifndef _RTE_POWER_PMD_MGMT_H
> +#define _RTE_POWER_PMD_MGMT_H
> +
> +/**
> + * @file
> + * RTE PMD Power Management
> + */
> +#include <stdint.h>
> +#include <stdbool.h>
> +
> +#include <rte_common.h>
> +#include <rte_byteorder.h>
> +#include <rte_log.h>
> +#include <rte_power.h>
> +#include <rte_atomic.h>
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +/**
> + * PMD Power Management Type
> + */
> +enum rte_power_pmd_mgmt_type {
> + /** Use power-optimized monitoring to wait for incoming traffic */
> + RTE_POWER_MGMT_TYPE_MONITOR = 1,
> + /** Use power-optimized sleep to avoid busy polling */
> + RTE_POWER_MGMT_TYPE_PAUSE,
> + /** Use frequency scaling when traffic is low */
> + RTE_POWER_MGMT_TYPE_SCALE,
> +};
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * Enable power management on a specified RX queue and lcore.
> + *
> + * @note This function is not thread-safe.
> + *
> + * @param lcore_id
> + * lcore_id.
> + * @param port_id
> + * The port identifier of the Ethernet device.
> + * @param queue_id
> + * The queue identifier of the Ethernet device.
> + * @param mode
> + * The power management callback function type.
> +
> + * @return
> + * 0 on success
> + * <0 on error
> + */
> +__rte_experimental
> +int
> +rte_power_pmd_mgmt_queue_enable(unsigned int lcore_id,
> + uint16_t port_id, uint16_t queue_id,
> + enum rte_power_pmd_mgmt_type mode);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * Disable power management on a specified RX queue and lcore.
> + *
> + * @note This function is not thread-safe.
> + *
> + * @param lcore_id
> + * lcore_id.
> + * @param port_id
> + * The port identifier of the Ethernet device.
> + * @param queue_id
> + * The queue identifier of the Ethernet device.
> + * @return
> + * 0 on success
> + * <0 on error
> + */
> +__rte_experimental
> +int
> +rte_power_pmd_mgmt_queue_disable(unsigned int lcore_id,
> + uint16_t port_id, uint16_t queue_id);
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif
> diff --git a/lib/librte_power/version.map b/lib/librte_power/version.map
> index 69ca9af616..61996b4d11 100644
> --- a/lib/librte_power/version.map
> +++ b/lib/librte_power/version.map
> @@ -34,4 +34,9 @@ EXPERIMENTAL {
> rte_power_guest_channel_receive_msg;
> rte_power_poll_stat_fetch;
> rte_power_poll_stat_update;
> +
> + # added in 21.02
> + rte_power_pmd_mgmt_queue_enable;
> + rte_power_pmd_mgmt_queue_disable;
> +
> };
> --
> 2.25.1
More information about the dev
mailing list