[dpdk-dev] [PATCH v5 5/7] power: support callbacks for multiple Rx queues

Burakov, Anatoly anatoly.burakov at intel.com
Mon Jul 5 12:23:44 CEST 2021


On 30-Jun-21 12:04 PM, Ananyev, Konstantin wrote:
> 
> 
> 
>> Currently, there is a hard limitation on the PMD power management
>> support that only allows it to support a single queue per lcore. This is
>> not ideal as most DPDK use cases will poll multiple queues per core.
>>
>> The PMD power management mechanism relies on ethdev Rx callbacks, so it
>> is very difficult to implement such support because callbacks are
>> effectively stateless and have no visibility into what the other ethdev
>> devices are doing. This places limitations on what we can do within the
>> framework of Rx callbacks, but the basics of this implementation are as
>> follows:
>>
>> - Replace per-queue structures with per-lcore ones, so that any device
>>    polled from the same lcore can share data
>> - Any queue that is going to be polled from a specific lcore has to be
>>    added to the list of queues to poll, so that the callback is aware of
>>    other queues being polled by the same lcore
>> - Both the empty poll counter and the actual power saving mechanism is
>>    shared between all queues polled on a particular lcore, and is only
>>    activated when all queues in the list were polled and were determined
>>    to have no traffic.
>> - The limitation on UMWAIT-based polling is not removed because UMWAIT
>>    is incapable of monitoring more than one address.
>>
>> Also, while we're at it, update and improve the docs.
>>
>> Signed-off-by: Anatoly Burakov <anatoly.burakov at intel.com>
>> ---
>>
>> Notes:
>>      v5:
>>      - Remove the "power save queue" API and replace it with mechanism suggested by
>>        Konstantin
>>
>>      v3:
>>      - Move the list of supported NICs to NIC feature table
>>
>>      v2:
>>      - Use a TAILQ for queues instead of a static array
>>      - Address feedback from Konstantin
>>      - Add additional checks for stopped queues
>>
>>   doc/guides/nics/features.rst           |  10 +
>>   doc/guides/prog_guide/power_man.rst    |  65 ++--
>>   doc/guides/rel_notes/release_21_08.rst |   3 +
>>   lib/power/rte_power_pmd_mgmt.c         | 431 ++++++++++++++++++-------
>>   4 files changed, 373 insertions(+), 136 deletions(-)
>>
>> diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
>> index 403c2b03a3..a96e12d155 100644
>> --- a/doc/guides/nics/features.rst
>> +++ b/doc/guides/nics/features.rst
>> @@ -912,6 +912,16 @@ Supports to get Rx/Tx packet burst mode information.
>>   * **[implements] eth_dev_ops**: ``rx_burst_mode_get``, ``tx_burst_mode_get``.
>>   * **[related] API**: ``rte_eth_rx_burst_mode_get()``, ``rte_eth_tx_burst_mode_get()``.
>>
>> +.. _nic_features_get_monitor_addr:
>> +
>> +PMD power management using monitor addresses
>> +--------------------------------------------
>> +
>> +Supports getting a monitoring condition to use together with Ethernet PMD power
>> +management (see :doc:`../prog_guide/power_man` for more details).
>> +
>> +* **[implements] eth_dev_ops**: ``get_monitor_addr``
>> +
>>   .. _nic_features_other:
>>
>>   Other dev ops not represented by a Feature
>> diff --git a/doc/guides/prog_guide/power_man.rst b/doc/guides/prog_guide/power_man.rst
>> index c70ae128ac..ec04a72108 100644
>> --- a/doc/guides/prog_guide/power_man.rst
>> +++ b/doc/guides/prog_guide/power_man.rst
>> @@ -198,34 +198,41 @@ Ethernet PMD Power Management API
>>   Abstract
>>   ~~~~~~~~
>>
>> -Existing power management mechanisms require developers
>> -to change application design or change code to make use of it.
>> -The PMD power management API provides a convenient alternative
>> -by utilizing Ethernet PMD RX callbacks,
>> -and triggering power saving whenever empty poll count reaches a certain number.
>> -
>> -Monitor
>> -   This power saving scheme will put the CPU into optimized power state
>> -   and use the ``rte_power_monitor()`` function
>> -   to monitor the Ethernet PMD RX descriptor address,
>> -   and wake the CPU up whenever there's new traffic.
>> -
>> -Pause
>> -   This power saving scheme will avoid busy polling
>> -   by either entering power-optimized sleep state
>> -   with ``rte_power_pause()`` function,
>> -   or, if it's not available, use ``rte_pause()``.
>> -
>> -Frequency scaling
>> -   This power saving scheme will use ``librte_power`` library
>> -   functionality to scale the core frequency up/down
>> -   depending on traffic volume.
>> -
>> -.. note::
>> -
>> -   Currently, this power management API is limited to mandatory mapping
>> -   of 1 queue to 1 core (multiple queues are supported,
>> -   but they must be polled from different cores).
>> +Existing power management mechanisms require developers to change application
>> +design or change code to make use of it. The PMD power management API provides a
>> +convenient alternative by utilizing Ethernet PMD RX callbacks, and triggering
>> +power saving whenever empty poll count reaches a certain number.
>> +
>> +* Monitor
>> +   This power saving scheme will put the CPU into optimized power state and
>> +   monitor the Ethernet PMD RX descriptor address, waking the CPU up whenever
>> +   there's new traffic. Support for this scheme may not be available on all
>> +   platforms, and further limitations may apply (see below).
>> +
>> +* Pause
>> +   This power saving scheme will avoid busy polling by either entering
>> +   power-optimized sleep state with ``rte_power_pause()`` function, or, if it's
>> +   not supported by the underlying platform, use ``rte_pause()``.
>> +
>> +* Frequency scaling
>> +   This power saving scheme will use ``librte_power`` library functionality to
>> +   scale the core frequency up/down depending on traffic volume.
>> +
>> +The "monitor" mode is only supported in the following configurations and scenarios:
>> +
>> +* If ``rte_cpu_get_intrinsics_support()`` function indicates that
>> +  ``rte_power_monitor()`` is supported by the platform, then monitoring will be
>> +  limited to a mapping of 1 core 1 queue (thus, each Rx queue will have to be
>> +  monitored from a different lcore).
>> +
>> +* If ``rte_cpu_get_intrinsics_support()`` function indicates that the
>> +  ``rte_power_monitor()`` function is not supported, then monitor mode will not
>> +  be supported.
>> +
>> +* Not all Ethernet drivers support monitoring, even if the underlying
>> +  platform may support the necessary CPU instructions. Please refer to
>> +  :doc:`../nics/overview` for more information.
>> +
>>
>>   API Overview for Ethernet PMD Power Management
>>   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> @@ -242,3 +249,5 @@ References
>>
>>   *   The :doc:`../sample_app_ug/vm_power_management`
>>       chapter in the :doc:`../sample_app_ug/index` section.
>> +
>> +*   The :doc:`../nics/overview` chapter in the :doc:`../nics/index` section
>> diff --git a/doc/guides/rel_notes/release_21_08.rst b/doc/guides/rel_notes/release_21_08.rst
>> index f015c509fc..3926d45ef8 100644
>> --- a/doc/guides/rel_notes/release_21_08.rst
>> +++ b/doc/guides/rel_notes/release_21_08.rst
>> @@ -57,6 +57,9 @@ New Features
>>
>>   * eal: added ``rte_power_monitor_multi`` to support waiting for multiple events.
>>
>> +* rte_power: The experimental PMD power management API now supports managing
>> +  multiple Ethernet Rx queues per lcore.
>> +
>>
>>   Removed Items
>>   -------------
>> diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c
>> index 9b95cf1794..fccfd236c2 100644
>> --- a/lib/power/rte_power_pmd_mgmt.c
>> +++ b/lib/power/rte_power_pmd_mgmt.c
>> @@ -33,18 +33,96 @@ enum pmd_mgmt_state {
>>        PMD_MGMT_ENABLED
>>   };
>>
>> -struct pmd_queue_cfg {
>> +union queue {
>> +     uint32_t val;
>> +     struct {
>> +             uint16_t portid;
>> +             uint16_t qid;
>> +     };
>> +};
>> +
>> +struct queue_list_entry {
>> +     TAILQ_ENTRY(queue_list_entry) next;
>> +     union queue queue;
>> +     uint64_t n_empty_polls;
>> +     const struct rte_eth_rxtx_callback *cb;
>> +};
>> +
>> +struct pmd_core_cfg {
>> +     TAILQ_HEAD(queue_list_head, queue_list_entry) head;
>> +     /**< List of queues associated with this lcore */
>> +     size_t n_queues;
>> +     /**< How many queues are in the list? */
>>        volatile enum pmd_mgmt_state pwr_mgmt_state;
>>        /**< State of power management for this queue */
>>        enum rte_power_pmd_mgmt_type cb_mode;
>>        /**< Callback mode for this queue */
>> -     const struct rte_eth_rxtx_callback *cur_cb;
>> -     /**< Callback instance */
>> -     uint64_t empty_poll_stats;
>> -     /**< Number of empty polls */
>> +     uint64_t n_queues_ready_to_sleep;
>> +     /**< Number of queues ready to enter power optimized state */
>>   } __rte_cache_aligned;
>> +static struct pmd_core_cfg lcore_cfgs[RTE_MAX_LCORE];
>>
>> -static struct pmd_queue_cfg port_cfg[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
>> +static inline bool
>> +queue_equal(const union queue *l, const union queue *r)
>> +{
>> +     return l->val == r->val;
>> +}
>> +
>> +static inline void
>> +queue_copy(union queue *dst, const union queue *src)
>> +{
>> +     dst->val = src->val;
>> +}
>> +
>> +static struct queue_list_entry *
>> +queue_list_find(const struct pmd_core_cfg *cfg, const union queue *q)
>> +{
>> +     struct queue_list_entry *cur;
>> +
>> +     TAILQ_FOREACH(cur, &cfg->head, next) {
>> +             if (queue_equal(&cur->queue, q))
>> +                     return cur;
>> +     }
>> +     return NULL;
>> +}
>> +
>> +static int
>> +queue_list_add(struct pmd_core_cfg *cfg, const union queue *q)
>> +{
>> +     struct queue_list_entry *qle;
>> +
>> +     /* is it already in the list? */
>> +     if (queue_list_find(cfg, q) != NULL)
>> +             return -EEXIST;
>> +
>> +     qle = malloc(sizeof(*qle));
>> +     if (qle == NULL)
>> +             return -ENOMEM;
>> +     memset(qle, 0, sizeof(*qle));
>> +
>> +     queue_copy(&qle->queue, q);
>> +     TAILQ_INSERT_TAIL(&cfg->head, qle, next);
>> +     cfg->n_queues++;
>> +     qle->n_empty_polls = 0;
>> +
>> +     return 0;
>> +}
>> +
>> +static struct queue_list_entry *
>> +queue_list_take(struct pmd_core_cfg *cfg, const union queue *q)
>> +{
>> +     struct queue_list_entry *found;
>> +
>> +     found = queue_list_find(cfg, q);
>> +     if (found == NULL)
>> +             return NULL;
>> +
>> +     TAILQ_REMOVE(&cfg->head, found, next);
>> +     cfg->n_queues--;
>> +
>> +     /* freeing is responsibility of the caller */
>> +     return found;
>> +}
>>
>>   static void
>>   calc_tsc(void)
>> @@ -74,21 +152,56 @@ calc_tsc(void)
>>        }
>>   }
>>
>> +static inline void
>> +queue_reset(struct pmd_core_cfg *cfg, struct queue_list_entry *qcfg)
>> +{
>> +     /* reset empty poll counter for this queue */
>> +     qcfg->n_empty_polls = 0;
>> +     /* reset the sleep counter too */
>> +     cfg->n_queues_ready_to_sleep = 0;
>> +}
>> +
>> +static inline bool
>> +queue_can_sleep(struct pmd_core_cfg *cfg, struct queue_list_entry *qcfg)
>> +{
>> +     /* this function is called - that means we have an empty poll */
>> +     qcfg->n_empty_polls++;
>> +
>> +     /* if we haven't reached threshold for empty polls, we can't sleep */
>> +     if (qcfg->n_empty_polls <= EMPTYPOLL_MAX)
>> +             return false;
>> +
>> +     /* we're ready to sleep */
>> +     cfg->n_queues_ready_to_sleep++;
>> +
>> +     return true;
>> +}
>> +
>> +static inline bool
>> +lcore_can_sleep(struct pmd_core_cfg *cfg)
>> +{
>> +     /* are all queues ready to sleep? */
>> +     if (cfg->n_queues_ready_to_sleep != cfg->n_queues)
>> +             return false;
>> +
>> +     /* we've reached an iteration where we can sleep, reset sleep counter */
>> +     cfg->n_queues_ready_to_sleep = 0;
>> +
>> +     return true;
>> +}
> 
> As I can see it a slightly modified one from what was discussed.
> I understand that it seems simpler, but I think there are some problems with it:
> - each queue can be counted more than once at lcore_cfg->n_queues_ready_to_sleep
> - queues n_empty_polls are not reset after sleep().
> 

The latter is intentional: we *want* to sleep constantly once we pass 
the empty poll counter.

The former shouldn't be a big problem in the conventional case as i 
don't think there are situations where people would poll core-pinned 
queues in different orders, but you're right, this is a potential issue 
and should be fixed. I'll add back the n_sleeps in the next iteration.

> To illustrate the problem, let say we have 2 queues, and at some moment we have:
> q0.n_empty_polls == EMPTYPOLL_MAX + 1
> q1.n_empty_polls == EMPTYPOLL_MAX + 1
> cfg->n_queues_ready_to_sleep == 2
> 
> So lcore_can_sleep() returns 'true' and sets:
> cfg->n_queues_ready_to_sleep == 0
> 
> Now, after sleep():
> q0.n_empty_polls == EMPTYPOLL_MAX + 1
> q1.n_empty_polls == EMPTYPOLL_MAX + 1
> 
>   So after:
> queue_can_sleep(q0);
> queue_can_sleep(q1);
> 
> will have:
> cfg->n_queues_ready_to_sleep == 2
> again, and we'll go to another sleep after just one rx_burst() attempt for each queue.
> 
>> +
>>   static uint16_t
>>   clb_umwait(uint16_t port_id, uint16_t qidx, struct rte_mbuf **pkts __rte_unused,
>> -             uint16_t nb_rx, uint16_t max_pkts __rte_unused,
>> -             void *addr __rte_unused)
>> +             uint16_t nb_rx, uint16_t max_pkts __rte_unused, void *arg)
>>   {
>> +     struct queue_list_entry *queue_conf = arg;
>>
>> -     struct pmd_queue_cfg *q_conf;
>> -
>> -     q_conf = &port_cfg[port_id][qidx];
>> -
>> +     /* this callback can't do more than one queue, omit multiqueue logic */
>>        if (unlikely(nb_rx == 0)) {
>> -             q_conf->empty_poll_stats++;
>> -             if (unlikely(q_conf->empty_poll_stats > EMPTYPOLL_MAX)) {
>> +             queue_conf->n_empty_polls++;
>> +             if (unlikely(queue_conf->n_empty_polls > EMPTYPOLL_MAX)) {
>>                        struct rte_power_monitor_cond pmc;
>> -                     uint16_t ret;
>> +                     int ret;
>>
>>                        /* use monitoring condition to sleep */
>>                        ret = rte_eth_get_monitor_addr(port_id, qidx,
>> @@ -97,60 +210,77 @@ clb_umwait(uint16_t port_id, uint16_t qidx, struct rte_mbuf **pkts __rte_unused,
>>                                rte_power_monitor(&pmc, UINT64_MAX);
>>                }
>>        } else
>> -             q_conf->empty_poll_stats = 0;
>> +             queue_conf->n_empty_polls = 0;
>>
>>        return nb_rx;
>>   }
>>
>>   static uint16_t
>> -clb_pause(uint16_t port_id, uint16_t qidx, struct rte_mbuf **pkts __rte_unused,
>> -             uint16_t nb_rx, uint16_t max_pkts __rte_unused,
>> -             void *addr __rte_unused)
>> +clb_pause(uint16_t port_id __rte_unused, uint16_t qidx __rte_unused,
>> +             struct rte_mbuf **pkts __rte_unused, uint16_t nb_rx,
>> +             uint16_t max_pkts __rte_unused, void *arg)
>>   {
>> -     struct pmd_queue_cfg *q_conf;
>> +     const unsigned int lcore = rte_lcore_id();
>> +     struct queue_list_entry *queue_conf = arg;
>> +     struct pmd_core_cfg *lcore_conf;
>> +     const bool empty = nb_rx == 0;
>>
>> -     q_conf = &port_cfg[port_id][qidx];
>> +     lcore_conf = &lcore_cfgs[lcore];
>>
>> -     if (unlikely(nb_rx == 0)) {
>> -             q_conf->empty_poll_stats++;
>> -             /* sleep for 1 microsecond */
>> -             if (unlikely(q_conf->empty_poll_stats > EMPTYPOLL_MAX)) {
>> -                     /* use tpause if we have it */
>> -                     if (global_data.intrinsics_support.power_pause) {
>> -                             const uint64_t cur = rte_rdtsc();
>> -                             const uint64_t wait_tsc =
>> -                                             cur + global_data.tsc_per_us;
>> -                             rte_power_pause(wait_tsc);
>> -                     } else {
>> -                             uint64_t i;
>> -                             for (i = 0; i < global_data.pause_per_us; i++)
>> -                                     rte_pause();
>> -                     }
>> +     if (likely(!empty))
>> +             /* early exit */
>> +             queue_reset(lcore_conf, queue_conf);
>> +     else {
>> +             /* can this queue sleep? */
>> +             if (!queue_can_sleep(lcore_conf, queue_conf))
>> +                     return nb_rx;
>> +
>> +             /* can this lcore sleep? */
>> +             if (!lcore_can_sleep(lcore_conf))
>> +                     return nb_rx;
>> +
>> +             /* sleep for 1 microsecond, use tpause if we have it */
>> +             if (global_data.intrinsics_support.power_pause) {
>> +                     const uint64_t cur = rte_rdtsc();
>> +                     const uint64_t wait_tsc =
>> +                                     cur + global_data.tsc_per_us;
>> +                     rte_power_pause(wait_tsc);
>> +             } else {
>> +                     uint64_t i;
>> +                     for (i = 0; i < global_data.pause_per_us; i++)
>> +                             rte_pause();
>>                }
>> -     } else
>> -             q_conf->empty_poll_stats = 0;
>> +     }
>>
>>        return nb_rx;
>>   }
>>
>>   static uint16_t
>> -clb_scale_freq(uint16_t port_id, uint16_t qidx,
>> +clb_scale_freq(uint16_t port_id __rte_unused, uint16_t qidx __rte_unused,
>>                struct rte_mbuf **pkts __rte_unused, uint16_t nb_rx,
>> -             uint16_t max_pkts __rte_unused, void *_  __rte_unused)
>> +             uint16_t max_pkts __rte_unused, void *arg)
>>   {
>> -     struct pmd_queue_cfg *q_conf;
>> +     const unsigned int lcore = rte_lcore_id();
>> +     const bool empty = nb_rx == 0;
>> +     struct pmd_core_cfg *lcore_conf = &lcore_cfgs[lcore];
>> +     struct queue_list_entry *queue_conf = arg;
>>
>> -     q_conf = &port_cfg[port_id][qidx];
>> +     if (likely(!empty)) {
>> +             /* early exit */
>> +             queue_reset(lcore_conf, queue_conf);
>>
>> -     if (unlikely(nb_rx == 0)) {
>> -             q_conf->empty_poll_stats++;
>> -             if (unlikely(q_conf->empty_poll_stats > EMPTYPOLL_MAX))
>> -                     /* scale down freq */
>> -                     rte_power_freq_min(rte_lcore_id());
>> -     } else {
>> -             q_conf->empty_poll_stats = 0;
>> -             /* scale up freq */
>> +             /* scale up freq immediately */
>>                rte_power_freq_max(rte_lcore_id());
>> +     } else {
>> +             /* can this queue sleep? */
>> +             if (!queue_can_sleep(lcore_conf, queue_conf))
>> +                     return nb_rx;
>> +
>> +             /* can this lcore sleep? */
>> +             if (!lcore_can_sleep(lcore_conf))
>> +                     return nb_rx;
>> +
>> +             rte_power_freq_min(rte_lcore_id());
>>        }
>>
>>        return nb_rx;
>> @@ -167,11 +297,80 @@ queue_stopped(const uint16_t port_id, const uint16_t queue_id)
>>        return qinfo.queue_state == RTE_ETH_QUEUE_STATE_STOPPED;
>>   }
>>
>> +static int
>> +cfg_queues_stopped(struct pmd_core_cfg *queue_cfg)
>> +{
>> +     const struct queue_list_entry *entry;
>> +
>> +     TAILQ_FOREACH(entry, &queue_cfg->head, next) {
>> +             const union queue *q = &entry->queue;
>> +             int ret = queue_stopped(q->portid, q->qid);
>> +             if (ret != 1)
>> +                     return ret;
>> +     }
>> +     return 1;
>> +}
>> +
>> +static int
>> +check_scale(unsigned int lcore)
>> +{
>> +     enum power_management_env env;
>> +
>> +     /* only PSTATE and ACPI modes are supported */
>> +     if (!rte_power_check_env_supported(PM_ENV_ACPI_CPUFREQ) &&
>> +         !rte_power_check_env_supported(PM_ENV_PSTATE_CPUFREQ)) {
>> +             RTE_LOG(DEBUG, POWER, "Neither ACPI nor PSTATE modes are supported\n");
>> +             return -ENOTSUP;
>> +     }
>> +     /* ensure we could initialize the power library */
>> +     if (rte_power_init(lcore))
>> +             return -EINVAL;
>> +
>> +     /* ensure we initialized the correct env */
>> +     env = rte_power_get_env();
>> +     if (env != PM_ENV_ACPI_CPUFREQ && env != PM_ENV_PSTATE_CPUFREQ) {
>> +             RTE_LOG(DEBUG, POWER, "Neither ACPI nor PSTATE modes were initialized\n");
>> +             return -ENOTSUP;
>> +     }
>> +
>> +     /* we're done */
>> +     return 0;
>> +}
>> +
>> +static int
>> +check_monitor(struct pmd_core_cfg *cfg, const union queue *qdata)
>> +{
>> +     struct rte_power_monitor_cond dummy;
>> +
>> +     /* check if rte_power_monitor is supported */
>> +     if (!global_data.intrinsics_support.power_monitor) {
>> +             RTE_LOG(DEBUG, POWER, "Monitoring intrinsics are not supported\n");
>> +             return -ENOTSUP;
>> +     }
>> +
>> +     if (cfg->n_queues > 0) {
>> +             RTE_LOG(DEBUG, POWER, "Monitoring multiple queues is not supported\n");
>> +             return -ENOTSUP;
>> +     }
>> +
>> +     /* check if the device supports the necessary PMD API */
>> +     if (rte_eth_get_monitor_addr(qdata->portid, qdata->qid,
>> +                     &dummy) == -ENOTSUP) {
>> +             RTE_LOG(DEBUG, POWER, "The device does not support rte_eth_get_monitor_addr\n");
>> +             return -ENOTSUP;
>> +     }
>> +
>> +     /* we're done */
>> +     return 0;
>> +}
>> +
>>   int
>>   rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
>>                uint16_t queue_id, enum rte_power_pmd_mgmt_type mode)
>>   {
>> -     struct pmd_queue_cfg *queue_cfg;
>> +     const union queue qdata = {.portid = port_id, .qid = queue_id};
>> +     struct pmd_core_cfg *lcore_cfg;
>> +     struct queue_list_entry *queue_cfg;
>>        struct rte_eth_dev_info info;
>>        rte_rx_callback_fn clb;
>>        int ret;
>> @@ -202,9 +401,19 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
>>                goto end;
>>        }
>>
>> -     queue_cfg = &port_cfg[port_id][queue_id];
>> +     lcore_cfg = &lcore_cfgs[lcore_id];
>>
>> -     if (queue_cfg->pwr_mgmt_state != PMD_MGMT_DISABLED) {
>> +     /* check if other queues are stopped as well */
>> +     ret = cfg_queues_stopped(lcore_cfg);
>> +     if (ret != 1) {
>> +             /* error means invalid queue, 0 means queue wasn't stopped */
>> +             ret = ret < 0 ? -EINVAL : -EBUSY;
>> +             goto end;
>> +     }
>> +
>> +     /* if callback was already enabled, check current callback type */
>> +     if (lcore_cfg->pwr_mgmt_state != PMD_MGMT_DISABLED &&
>> +                     lcore_cfg->cb_mode != mode) {
>>                ret = -EINVAL;
>>                goto end;
>>        }
>> @@ -214,53 +423,20 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
>>
>>        switch (mode) {
>>        case RTE_POWER_MGMT_TYPE_MONITOR:
>> -     {
>> -             struct rte_power_monitor_cond dummy;
>> -
>> -             /* check if rte_power_monitor is supported */
>> -             if (!global_data.intrinsics_support.power_monitor) {
>> -                     RTE_LOG(DEBUG, POWER, "Monitoring intrinsics are not supported\n");
>> -                     ret = -ENOTSUP;
>> +             /* check if we can add a new queue */
>> +             ret = check_monitor(lcore_cfg, &qdata);
>> +             if (ret < 0)
>>                        goto end;
>> -             }
>>
>> -             /* check if the device supports the necessary PMD API */
>> -             if (rte_eth_get_monitor_addr(port_id, queue_id,
>> -                             &dummy) == -ENOTSUP) {
>> -                     RTE_LOG(DEBUG, POWER, "The device does not support rte_eth_get_monitor_addr\n");
>> -                     ret = -ENOTSUP;
>> -                     goto end;
>> -             }
>>                clb = clb_umwait;
>>                break;
>> -     }
>>        case RTE_POWER_MGMT_TYPE_SCALE:
>> -     {
>> -             enum power_management_env env;
>> -             /* only PSTATE and ACPI modes are supported */
>> -             if (!rte_power_check_env_supported(PM_ENV_ACPI_CPUFREQ) &&
>> -                             !rte_power_check_env_supported(
>> -                                     PM_ENV_PSTATE_CPUFREQ)) {
>> -                     RTE_LOG(DEBUG, POWER, "Neither ACPI nor PSTATE modes are supported\n");
>> -                     ret = -ENOTSUP;
>> +             /* check if we can add a new queue */
>> +             ret = check_scale(lcore_id);
>> +             if (ret < 0)
>>                        goto end;
>> -             }
>> -             /* ensure we could initialize the power library */
>> -             if (rte_power_init(lcore_id)) {
>> -                     ret = -EINVAL;
>> -                     goto end;
>> -             }
>> -             /* ensure we initialized the correct env */
>> -             env = rte_power_get_env();
>> -             if (env != PM_ENV_ACPI_CPUFREQ &&
>> -                             env != PM_ENV_PSTATE_CPUFREQ) {
>> -                     RTE_LOG(DEBUG, POWER, "Neither ACPI nor PSTATE modes were initialized\n");
>> -                     ret = -ENOTSUP;
>> -                     goto end;
>> -             }
>>                clb = clb_scale_freq;
>>                break;
>> -     }
>>        case RTE_POWER_MGMT_TYPE_PAUSE:
>>                /* figure out various time-to-tsc conversions */
>>                if (global_data.tsc_per_us == 0)
>> @@ -273,13 +449,23 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
>>                ret = -EINVAL;
>>                goto end;
>>        }
>> +     /* add this queue to the list */
>> +     ret = queue_list_add(lcore_cfg, &qdata);
>> +     if (ret < 0) {
>> +             RTE_LOG(DEBUG, POWER, "Failed to add queue to list: %s\n",
>> +                             strerror(-ret));
>> +             goto end;
>> +     }
>> +     /* new queue is always added last */
>> +     queue_cfg = TAILQ_LAST(&lcore_cfgs->head, queue_list_head);
>>
>>        /* initialize data before enabling the callback */
>> -     queue_cfg->empty_poll_stats = 0;
>> -     queue_cfg->cb_mode = mode;
>> -     queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
>> -     queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id,
>> -                     clb, NULL);
>> +     if (lcore_cfg->n_queues == 1) {
>> +             lcore_cfg->cb_mode = mode;
>> +             lcore_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
>> +     }
>> +     queue_cfg->cb = rte_eth_add_rx_callback(port_id, queue_id,
>> +                     clb, queue_cfg);
>>
>>        ret = 0;
>>   end:
>> @@ -290,7 +476,9 @@ int
>>   rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id,
>>                uint16_t port_id, uint16_t queue_id)
>>   {
>> -     struct pmd_queue_cfg *queue_cfg;
>> +     const union queue qdata = {.portid = port_id, .qid = queue_id};
>> +     struct pmd_core_cfg *lcore_cfg;
>> +     struct queue_list_entry *queue_cfg;
>>        int ret;
>>
>>        RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
>> @@ -306,24 +494,40 @@ rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id,
>>        }
>>
>>        /* no need to check queue id as wrong queue id would not be enabled */
>> -     queue_cfg = &port_cfg[port_id][queue_id];
>> +     lcore_cfg = &lcore_cfgs[lcore_id];
>>
>> -     if (queue_cfg->pwr_mgmt_state != PMD_MGMT_ENABLED)
>> +     /* check if other queues are stopped as well */
>> +     ret = cfg_queues_stopped(lcore_cfg);
>> +     if (ret != 1) {
>> +             /* error means invalid queue, 0 means queue wasn't stopped */
>> +             return ret < 0 ? -EINVAL : -EBUSY;
>> +     }
>> +
>> +     if (lcore_cfg->pwr_mgmt_state != PMD_MGMT_ENABLED)
>>                return -EINVAL;
>>
>> -     /* stop any callbacks from progressing */
>> -     queue_cfg->pwr_mgmt_state = PMD_MGMT_DISABLED;
>> +     /*
>> +      * There is no good/easy way to do this without race conditions, so we
>> +      * are just going to throw our hands in the air and hope that the user
>> +      * has read the documentation and has ensured that ports are stopped at
>> +      * the time we enter the API functions.
>> +      */
>> +     queue_cfg = queue_list_take(lcore_cfg, &qdata);
>> +     if (queue_cfg == NULL)
>> +             return -ENOENT;
>>
>> -     switch (queue_cfg->cb_mode) {
>> +     /* if we've removed all queues from the lists, set state to disabled */
>> +     if (lcore_cfg->n_queues == 0)
>> +             lcore_cfg->pwr_mgmt_state = PMD_MGMT_DISABLED;
>> +
>> +     switch (lcore_cfg->cb_mode) {
>>        case RTE_POWER_MGMT_TYPE_MONITOR: /* fall-through */
>>        case RTE_POWER_MGMT_TYPE_PAUSE:
>> -             rte_eth_remove_rx_callback(port_id, queue_id,
>> -                             queue_cfg->cur_cb);
>> +             rte_eth_remove_rx_callback(port_id, queue_id, queue_cfg->cb);
>>                break;
>>        case RTE_POWER_MGMT_TYPE_SCALE:
>>                rte_power_freq_max(lcore_id);
>> -             rte_eth_remove_rx_callback(port_id, queue_id,
>> -                             queue_cfg->cur_cb);
>> +             rte_eth_remove_rx_callback(port_id, queue_id, queue_cfg->cb);
>>                rte_power_exit(lcore_id);
>>                break;
>>        }
>> @@ -332,7 +536,18 @@ rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id,
>>         * ports before calling any of these API's, so we can assume that the
>>         * callbacks can be freed. we're intentionally casting away const-ness.
>>         */
>> -     rte_free((void *)queue_cfg->cur_cb);
>> +     rte_free((void *)queue_cfg->cb);
>> +     free(queue_cfg);
>>
>>        return 0;
>>   }
>> +
>> +RTE_INIT(rte_power_ethdev_pmgmt_init) {
>> +     size_t i;
>> +
>> +     /* initialize all tailqs */
>> +     for (i = 0; i < RTE_DIM(lcore_cfgs); i++) {
>> +             struct pmd_core_cfg *cfg = &lcore_cfgs[i];
>> +             TAILQ_INIT(&cfg->head);
>> +     }
>> +}
>> --
>> 2.25.1
> 


-- 
Thanks,
Anatoly


More information about the dev mailing list