[dpdk-dev] [PATCH v10 1/4] lib/librte_power: traffic pattern aware power control
Hunt, David
david.hunt at intel.com
Tue Oct 2 16:22:47 CEST 2018
On 2/10/2018 2:48 PM, Liang Ma wrote:
> 1. Abstract
>
> For packet processing workloads such as DPDK polling is continuous.
> This means CPU cores always show 100% busy independent of how much work
> those cores are doing. It is critical to accurately determine how busy
> a core is hugely important for the following reasons:
>
> * No indication of overload conditions.
>
> * User does not know how much real load is on a system, resulting
> in wasted energy as no power management is utilized.
>
> Compared to the original l3fwd-power design, instead of going to sleep
> after detecting an empty poll, the new mechanism just lowers the core
> frequency. As a result, the application does not stop polling the device,
> which leads to improved handling of bursts of traffic.
>
> When the system become busy, the empty poll mechanism can also increase the
> core frequency (including turbo) to do best effort for intensive traffic.
> This gives us more flexible and balanced traffic awareness over the
> standard l3fwd-power application.
>
> 2. Proposed solution
>
> The proposed solution focuses on how many times empty polls are executed.
> The less the number of empty polls, means current core is busy with
> processing workload, therefore, the higher frequency is needed. The high
> empty poll number indicates the current core not doing any real work
> therefore, we can lower the frequency to safe power.
>
> In the current implementation, each core has 1 empty-poll counter which
> assume 1 core is dedicated to 1 queue. This will need to be expanded in the
> future to support multiple queues per core.
>
> 2.1 Power state definition:
>
> LOW: Not currently used, reserved for future use.
>
> MED: the frequency is used to process modest traffic workload.
>
> HIGH: the frequency is used to process busy traffic workload.
>
> 2.2 There are two phases to establish the power management system:
>
> a.Initialization/Training phase. The training phase is necessary
> in order to figure out the system polling baseline numbers from
> idle to busy. The highest poll count will be during idle, where
> all polls are empty. These poll counts will be different between
> systems due to the many possible processor micro-arch, cache
> and device configurations, hence the training phase.
> In the training phase, traffic is blocked so the training
> algorithm can average the empty-poll numbers for the LOW, MED and
> HIGH power states in order to create a baseline.
> The core's counter are collected every 10ms, and the Training
> phase will take 2 seconds.
> Training is disabled as default configuration. The default
> parameter is applied. Sample App still can trigger training
> if that's needed. Once the training phase has been executed once on
> a system, the application can then be started with the relevant
> thresholds provided on the command line, allowing the application
> to start passing start traffic immediately
>
> b.Normal phase. Traffic starts immediately based on the default
> thresholds, or based on the user supplied thresholds via the
> command line parameters. The run-time poll counts are compared with
> the baseline and the decision will be taken to move to MED power
> state or HIGH power state. The counters are calculated every 10ms.
>
> 3. Proposed API
>
> 1. rte_power_empty_poll_stat_init(struct ep_params **eptr,
> uint8_t *freq_tlb, struct ep_policy *policy);
> which is used to initialize the power management system.
>
> 2. rte_power_empty_poll_stat_free(void);
> which is used to free the resource hold by power management system.
>
> 3. rte_power_empty_poll_stat_update(unsigned int lcore_id);
> which is used to update specific core empty poll counter, not thread safe
>
> 4. rte_power_poll_stat_update(unsigned int lcore_id, uint8_t nb_pkt);
> which is used to update specific core valid poll counter, not thread safe
>
> 5. rte_power_empty_poll_stat_fetch(unsigned int lcore_id);
> which is used to get specific core empty poll counter.
>
> 6. rte_power_poll_stat_fetch(unsigned int lcore_id);
> which is used to get specific core valid poll counter.
>
> 7. rte_empty_poll_detection(struct rte_timer *tim, void *arg);
> which is used to detect empty poll state changes then take action.
>
> ChangeLog:
> v2: fix some coding style issues.
> v3: rename the filename, API name.
> v4: no change.
> v5: no change.
> v6: re-work the code layout, update API.
> v7: fix minor typo and lift node num limit.
> v8: disable training as default option.
> v9: minor git log update.
> v10: update due to the code review comments.
>
> Signed-off-by: Liang Ma <liang.j.ma at intel.com>
>
> Reviewed-by: Lei Yao <lei.a.yao at intel.com>
> ---
Acked-by: David Hunt <david.hunt at intel.com>
More information about the dev
mailing list