[dpdk-dev] [PATCH v7 05/10] power: add PMD power management API and callback

Ananyev, Konstantin konstantin.ananyev at intel.com
Thu Oct 15 18:52:58 CEST 2020



> Add a simple on/off switch that will enable saving power when no
> packets are arriving. It is based on counting the number of empty
> polls and, when the number reaches a certain threshold, entering an
> architecture-defined optimized power state that will either wait
> until a TSC timestamp expires, or when packets arrive.
> 
> This API mandates a core-to-single-queue mapping (that is, multiple
> queued per device are supported, but they have to be polled on different
> cores).
> 
> This design is using PMD RX callbacks.
> 
> 1. UMWAIT/UMONITOR:
> 
>    When a certain threshold of empty polls is reached, the core will go
>    into a power optimized sleep while waiting on an address of next RX
>    descriptor to be written to.
> 
> 2. Pause instruction
> 
>    Instead of move the core into deeper C state, this method uses the
>    pause instruction to avoid busy polling.
> 
> 3. Frequency scaling
>    Reuse existing DPDK power library to scale up/down core frequency
>    depending on traffic volume.
> 
> Signed-off-by: Liang Ma <liang.j.ma at intel.com>
> Signed-off-by: Anatoly Burakov <anatoly.burakov at intel.com>
> Acked-by: David Hunt <david.hunt at intel.com>
> ---
> 
> Notes:
>     v7:
>     - Fixed race condition (Konstantin)
>     - Slight rework of the structure of monitor code
>     - Added missing inline for wakeup
> 
>     v6:
>     - Added wakeup mechanism for UMWAIT
>     - Removed memory allocation (everything is now allocated statically)
>     - Fixed various typos and comments
>     - Check for invalid queue ID
>     - Moved release notes to this patch
> 
>     v5:
>     - Make error checking more robust
>       - Prevent initializing scaling if ACPI or PSTATE env wasn't set
>       - Prevent initializing UMWAIT path if PMD doesn't support get_wake_addr
>     - Add some debug logging
>     - Replace x86-specific code path to generic path using the intrinsic check
> 
>  doc/guides/rel_notes/release_20_11.rst |  11 +
>  lib/librte_power/meson.build           |   5 +-
>  lib/librte_power/rte_power_pmd_mgmt.c  | 320 +++++++++++++++++++++++++
>  lib/librte_power/rte_power_pmd_mgmt.h  |  92 +++++++
>  lib/librte_power/rte_power_version.map |   4 +
>  5 files changed, 430 insertions(+), 2 deletions(-)
>  create mode 100644 lib/librte_power/rte_power_pmd_mgmt.c
>  create mode 100644 lib/librte_power/rte_power_pmd_mgmt.h
> 
> diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst
> index 4c6a615ce9..a814c67d54 100644
> --- a/doc/guides/rel_notes/release_20_11.rst
> +++ b/doc/guides/rel_notes/release_20_11.rst
> @@ -204,6 +204,17 @@ New Features
> 
>     * Added support to update subport rate dynamically.
> 
> +* **Add PMD power management mechanism**
> +
> +  3 new Ethernet PMD power management mechanism is added through existing
> +  RX callback infrastructure.
> +
> +  * Add power saving scheme based on UMWAIT instruction (x86 only)
> +  * Add power saving scheme based on ``rte_pause()``
> +  * Add power saving scheme based on frequency scaling through the power library
> +  * Add new EXPERIMENTAL API ``rte_power_pmd_mgmt_queue_enable()``
> +  * Add new EXPERIMENTAL API ``rte_power_pmd_mgmt_queue_disable()``
> +
> 
>  Removed Items
>  -------------

....

> +
> +int
> +rte_power_pmd_mgmt_queue_enable(unsigned int lcore_id,
> +		uint16_t port_id, uint16_t queue_id,
> +		enum rte_power_pmd_mgmt_type mode)
> +{
> +	struct rte_eth_dev *dev;
> +	struct pmd_queue_cfg *queue_cfg;
> +	int ret;
> +
> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
> +	dev = &rte_eth_devices[port_id];
> +
> +	/* check if queue id is valid */
> +	if (queue_id >= dev->data->nb_rx_queues ||
> +			queue_id >= RTE_MAX_QUEUES_PER_PORT) {
> +		return -EINVAL;
> +	}
> +
> +	queue_cfg = &port_cfg[port_id][queue_id];
> +
> +	if (queue_cfg->pwr_mgmt_state == PMD_MGMT_ENABLED) {
> +		ret = -EINVAL;
> +		goto end;
> +	}
> +
> +	switch (mode) {
> +	case RTE_POWER_MGMT_TYPE_WAIT:
> +	{
> +		/* check if rte_power_monitor is supported */
> +		uint64_t dummy_expected, dummy_mask;
> +		struct rte_cpu_intrinsics i;
> +		volatile void *dummy_addr;
> +		uint8_t dummy_sz;
> +
> +		rte_cpu_get_intrinsics_support(&i);
> +
> +		if (!i.power_monitor) {
> +			RTE_LOG(DEBUG, POWER, "Monitoring intrinsics are not supported\n");
> +			ret = -ENOTSUP;
> +			goto end;
> +		}
> +
> +		/* check if the device supports the necessary PMD API */
> +		if (rte_eth_get_wake_addr(port_id, queue_id,
> +				&dummy_addr, &dummy_expected,
> +				&dummy_mask, &dummy_sz) == -ENOTSUP) {
> +			RTE_LOG(DEBUG, POWER, "The device does not support rte_eth_rxq_ring_addr_get\n");
> +			ret = -ENOTSUP;
> +			goto end;
> +		}
> +		/* initialize UMWAIT spinlock */
> +		rte_spinlock_init(&queue_cfg->umwait_lock);

Still looks excessive and possibly error prone to me.
Apart from that:
Acked-by: Konstantin Ananyev <konstantin.ananyev at intel.com>

> +
> +		/* initialize data before enabling the callback */
> +		queue_cfg->empty_poll_stats = 0;
> +		queue_cfg->cb_mode = mode;
> +		queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
> +
> +		queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id,
> +				clb_umwait, NULL);
> +		break;
> +	}
> +	case RTE_POWER_MGMT_TYPE_SCALE:
> +	{
> +		enum power_management_env env;
> +		/* only PSTATE and ACPI modes are supported */
> +		if (!rte_power_check_env_supported(PM_ENV_ACPI_CPUFREQ) &&
> +				!rte_power_check_env_supported(
> +					PM_ENV_PSTATE_CPUFREQ)) {
> +			RTE_LOG(DEBUG, POWER, "Neither ACPI nor PSTATE modes are supported\n");
> +			ret = -ENOTSUP;
> +			goto end;
> +		}
> +		/* ensure we could initialize the power library */
> +		if (rte_power_init(lcore_id)) {
> +			ret = -EINVAL;
> +			goto end;
> +		}
> +		/* ensure we initialized the correct env */
> +		env = rte_power_get_env();
> +		if (env != PM_ENV_ACPI_CPUFREQ &&
> +				env != PM_ENV_PSTATE_CPUFREQ) {
> +			RTE_LOG(DEBUG, POWER, "Neither ACPI nor PSTATE modes were initialized\n");
> +			ret = -ENOTSUP;
> +			goto end;
> +		}
> +		/* initialize data before enabling the callback */
> +		queue_cfg->empty_poll_stats = 0;
> +		queue_cfg->cb_mode = mode;
> +		queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
> +
> +		queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id,
> +				queue_id, clb_scale_freq, NULL);
> +		break;
> +	}
> +	case RTE_POWER_MGMT_TYPE_PAUSE:
> +		/* initialize data before enabling the callback */
> +		queue_cfg->empty_poll_stats = 0;
> +		queue_cfg->cb_mode = mode;
> +		queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
> +
> +		queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id,
> +				clb_pause, NULL);
> +		break;
> +	}
> +	ret = 0;
> +
> +end:
> +	return ret;
> +}
> +


More information about the dev mailing list