[dpdk-dev] [PATCH v6 5/7] power: support callbacks for multiple Rx queues

Ananyev, Konstantin konstantin.ananyev at intel.com
Tue Jul 6 20:50:32 CEST 2021


> Currently, there is a hard limitation on the PMD power management
> support that only allows it to support a single queue per lcore. This is
> not ideal as most DPDK use cases will poll multiple queues per core.
> 
> The PMD power management mechanism relies on ethdev Rx callbacks, so it
> is very difficult to implement such support because callbacks are
> effectively stateless and have no visibility into what the other ethdev
> devices are doing. This places limitations on what we can do within the
> framework of Rx callbacks, but the basics of this implementation are as
> follows:
> 
> - Replace per-queue structures with per-lcore ones, so that any device
>   polled from the same lcore can share data
> - Any queue that is going to be polled from a specific lcore has to be
>   added to the list of queues to poll, so that the callback is aware of
>   other queues being polled by the same lcore
> - Both the empty poll counter and the actual power saving mechanism is
>   shared between all queues polled on a particular lcore, and is only
>   activated when all queues in the list were polled and were determined
>   to have no traffic.
> - The limitation on UMWAIT-based polling is not removed because UMWAIT
>   is incapable of monitoring more than one address.
> 
> Also, while we're at it, update and improve the docs.
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov at intel.com>
> ---
> 
> Notes:
>     v6:
>     - Track each individual queue sleep status (Konstantin)
>     - Fix segfault (Dave)
> 
>     v5:
>     - Remove the "power save queue" API and replace it with mechanism suggested by
>       Konstantin
> 
>     v3:
>     - Move the list of supported NICs to NIC feature table
> 
>     v2:
>     - Use a TAILQ for queues instead of a static array
>     - Address feedback from Konstantin
>     - Add additional checks for stopped queues
> 
>  doc/guides/nics/features.rst           |  10 +
>  doc/guides/prog_guide/power_man.rst    |  65 ++--
>  doc/guides/rel_notes/release_21_08.rst |   3 +
>  lib/power/rte_power_pmd_mgmt.c         | 452 +++++++++++++++++++------
>  4 files changed, 394 insertions(+), 136 deletions(-)
> 
> diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
> index 403c2b03a3..a96e12d155 100644
> --- a/doc/guides/nics/features.rst
> +++ b/doc/guides/nics/features.rst
> @@ -912,6 +912,16 @@ Supports to get Rx/Tx packet burst mode information.
>  * **[implements] eth_dev_ops**: ``rx_burst_mode_get``, ``tx_burst_mode_get``.
>  * **[related] API**: ``rte_eth_rx_burst_mode_get()``, ``rte_eth_tx_burst_mode_get()``.
> 
> +.. _nic_features_get_monitor_addr:
> +
> +PMD power management using monitor addresses
> +--------------------------------------------
> +
> +Supports getting a monitoring condition to use together with Ethernet PMD power
> +management (see :doc:`../prog_guide/power_man` for more details).
> +
> +* **[implements] eth_dev_ops**: ``get_monitor_addr``
> +
>  .. _nic_features_other:
> 
>  Other dev ops not represented by a Feature
> diff --git a/doc/guides/prog_guide/power_man.rst b/doc/guides/prog_guide/power_man.rst
> index c70ae128ac..ec04a72108 100644
> --- a/doc/guides/prog_guide/power_man.rst
> +++ b/doc/guides/prog_guide/power_man.rst
> @@ -198,34 +198,41 @@ Ethernet PMD Power Management API
>  Abstract
>  ~~~~~~~~
> 
> -Existing power management mechanisms require developers
> -to change application design or change code to make use of it.
> -The PMD power management API provides a convenient alternative
> -by utilizing Ethernet PMD RX callbacks,
> -and triggering power saving whenever empty poll count reaches a certain number.
> -
> -Monitor
> -   This power saving scheme will put the CPU into optimized power state
> -   and use the ``rte_power_monitor()`` function
> -   to monitor the Ethernet PMD RX descriptor address,
> -   and wake the CPU up whenever there's new traffic.
> -
> -Pause
> -   This power saving scheme will avoid busy polling
> -   by either entering power-optimized sleep state
> -   with ``rte_power_pause()`` function,
> -   or, if it's not available, use ``rte_pause()``.
> -
> -Frequency scaling
> -   This power saving scheme will use ``librte_power`` library
> -   functionality to scale the core frequency up/down
> -   depending on traffic volume.
> -
> -.. note::
> -
> -   Currently, this power management API is limited to mandatory mapping
> -   of 1 queue to 1 core (multiple queues are supported,
> -   but they must be polled from different cores).
> +Existing power management mechanisms require developers to change application
> +design or change code to make use of it. The PMD power management API provides a
> +convenient alternative by utilizing Ethernet PMD RX callbacks, and triggering
> +power saving whenever empty poll count reaches a certain number.
> +
> +* Monitor
> +   This power saving scheme will put the CPU into optimized power state and
> +   monitor the Ethernet PMD RX descriptor address, waking the CPU up whenever
> +   there's new traffic. Support for this scheme may not be available on all
> +   platforms, and further limitations may apply (see below).
> +
> +* Pause
> +   This power saving scheme will avoid busy polling by either entering
> +   power-optimized sleep state with ``rte_power_pause()`` function, or, if it's
> +   not supported by the underlying platform, use ``rte_pause()``.
> +
> +* Frequency scaling
> +   This power saving scheme will use ``librte_power`` library functionality to
> +   scale the core frequency up/down depending on traffic volume.
> +
> +The "monitor" mode is only supported in the following configurations and scenarios:
> +
> +* If ``rte_cpu_get_intrinsics_support()`` function indicates that
> +  ``rte_power_monitor()`` is supported by the platform, then monitoring will be
> +  limited to a mapping of 1 core 1 queue (thus, each Rx queue will have to be
> +  monitored from a different lcore).
> +
> +* If ``rte_cpu_get_intrinsics_support()`` function indicates that the
> +  ``rte_power_monitor()`` function is not supported, then monitor mode will not
> +  be supported.
> +
> +* Not all Ethernet drivers support monitoring, even if the underlying
> +  platform may support the necessary CPU instructions. Please refer to
> +  :doc:`../nics/overview` for more information.
> +
.... 
> +static inline void
> +queue_reset(struct pmd_core_cfg *cfg, struct queue_list_entry *qcfg)
> +{
> +	const bool is_ready_to_sleep = qcfg->n_empty_polls > EMPTYPOLL_MAX;
> +
> +	/* reset empty poll counter for this queue */
> +	qcfg->n_empty_polls = 0;
> +	/* reset the queue sleep counter as well */
> +	qcfg->n_sleeps = 0;
> +	/* remove the queue from list of cores ready to sleep */
> +	if (is_ready_to_sleep)
> +		cfg->n_queues_ready_to_sleep--;
> +	/*
> +	 * no need change the lcore sleep target counter because this lcore will
> +	 * reach the n_sleeps anyway, and the other cores are already counted so
> +	 * there's no need to do anything else.
> +	 */
> +}
> +
> +static inline bool
> +queue_can_sleep(struct pmd_core_cfg *cfg, struct queue_list_entry *qcfg)
> +{
> +	/* this function is called - that means we have an empty poll */
> +	qcfg->n_empty_polls++;
> +
> +	/* if we haven't reached threshold for empty polls, we can't sleep */
> +	if (qcfg->n_empty_polls <= EMPTYPOLL_MAX)
> +		return false;
> +
> +	/*
> +	 * we've reached a point where we are able to sleep, but we still need
> +	 * to check if this queue has already been marked for sleeping.
> +	 */
> +	if (qcfg->n_sleeps == cfg->sleep_target)
> +		return true;
> +
> +	/* mark this queue as ready for sleep */
> +	qcfg->n_sleeps = cfg->sleep_target;
> +	cfg->n_queues_ready_to_sleep++;

So, assuming there is no incoming traffic, should it be:
1) poll_all_queues(times=EMPTYPOLL_MAX); sleep; poll_all_queues(times=1); sleep; poll_all_queues(times=1); sleep; ...
OR
2) poll_all_queues(times=EMPTYPOLL_MAX); sleep; poll_all_queues(times= EMPTYPOLL_MAX); sleep; poll_all_queues(times= EMPTYPOLL_MAX); sleep; ...
?

My initial thought was 2) but might be the intention is 1)?

> +
> +	return true;
> +}
> +
> +static inline bool
> +lcore_can_sleep(struct pmd_core_cfg *cfg)
> +{
> +	/* are all queues ready to sleep? */
> +	if (cfg->n_queues_ready_to_sleep != cfg->n_queues)
> +		return false;
> +
> +	/* we've reached an iteration where we can sleep, reset sleep counter */
> +	cfg->n_queues_ready_to_sleep = 0;
> +	cfg->sleep_target++;
> +
> +	return true;
> +}
> +
>  static uint16_t
>  clb_umwait(uint16_t port_id, uint16_t qidx, struct rte_mbuf **pkts __rte_unused,
> -		uint16_t nb_rx, uint16_t max_pkts __rte_unused,
> -		void *addr __rte_unused)
> +		uint16_t nb_rx, uint16_t max_pkts __rte_unused, void *arg)
>  {
> +	struct queue_list_entry *queue_conf = arg;
> 
> -	struct pmd_queue_cfg *q_conf;
> -
> -	q_conf = &port_cfg[port_id][qidx];
> -
> +	/* this callback can't do more than one queue, omit multiqueue logic */
>  	if (unlikely(nb_rx == 0)) {
> -		q_conf->empty_poll_stats++;
> -		if (unlikely(q_conf->empty_poll_stats > EMPTYPOLL_MAX)) {
> +		queue_conf->n_empty_polls++;
> +		if (unlikely(queue_conf->n_empty_polls > EMPTYPOLL_MAX)) {
>  			struct rte_power_monitor_cond pmc;
> -			uint16_t ret;
> +			int ret;
> 
>  			/* use monitoring condition to sleep */
>  			ret = rte_eth_get_monitor_addr(port_id, qidx,
> @@ -97,60 +231,77 @@ clb_umwait(uint16_t port_id, uint16_t qidx, struct rte_mbuf **pkts __rte_unused,
>  				rte_power_monitor(&pmc, UINT64_MAX);
>  		}
>  	} else
> -		q_conf->empty_poll_stats = 0;
> +		queue_conf->n_empty_polls = 0;
> 
>  	return nb_rx;
>  }
> 


More information about the dev mailing list