[dpdk-dev] [PATCH v10 1/4] lib/librte_power: traffic pattern aware power control

Hunt, David david.hunt at intel.com
Tue Oct 2 16:22:47 CEST 2018


On 2/10/2018 2:48 PM, Liang Ma wrote:
> 1. Abstract
>
> For packet processing workloads such as DPDK polling is continuous.
> This means CPU cores always show 100% busy independent of how much work
> those cores are doing. It is critical to accurately determine how busy
> a core is hugely important for the following reasons:
>
>     * No indication of overload conditions.
>
>     * User does not know how much real load is on a system, resulting
>       in wasted energy as no power management is utilized.
>
> Compared to the original l3fwd-power design, instead of going to sleep
> after detecting an empty poll, the new mechanism just lowers the core
> frequency. As a result, the application does not stop polling the device,
> which leads to improved handling of bursts of traffic.
>
> When the system become busy, the empty poll mechanism can also increase the
> core frequency (including turbo) to do best effort for intensive traffic.
> This gives us more flexible and balanced traffic awareness over the
> standard l3fwd-power application.
>
> 2. Proposed solution
>
> The proposed solution focuses on how many times empty polls are executed.
> The less the number of empty polls, means current core is busy with
> processing workload, therefore, the higher frequency is needed. The high
> empty poll number indicates the current core not doing any real work
> therefore, we can lower the frequency to safe power.
>
> In the current implementation, each core has 1 empty-poll counter which
> assume 1 core is dedicated to 1 queue. This will need to be expanded in the
> future to support multiple queues per core.
>
> 2.1 Power state definition:
>
> 	LOW:  Not currently used, reserved for future use.
>
> 	MED:  the frequency is used to process modest traffic workload.
>
> 	HIGH: the frequency is used to process busy traffic workload.
>
> 2.2 There are two phases to establish the power management system:
>
> 	a.Initialization/Training phase. The training phase is necessary
> 	  in order to figure out the system polling baseline numbers from
> 	  idle to busy. The highest poll count will be during idle, where
> 	  all polls are empty. These poll counts will be different between
> 	  systems due to the many possible processor micro-arch, cache
> 	  and device configurations, hence the training phase.
>    	  In the training phase, traffic is blocked so the training
>    	  algorithm can average the empty-poll numbers for the LOW, MED and
>   	  HIGH  power states in order to create a baseline.
>    	  The core's counter are collected every 10ms, and the Training
>   	  phase will take 2 seconds.
>   	  Training is disabled as default configuration. The default
>   	  parameter is applied. Sample App still can trigger training
>   	  if that's needed. Once the training phase has been executed once on
>   	  a system, the application can then be started with the relevant
>   	  thresholds provided on the command line, allowing the application
>   	  to start passing start traffic immediately
>
> 	b.Normal phase. Traffic starts immediately based on the default
> 	  thresholds, or based on the user supplied thresholds via the
> 	  command line parameters. The run-time poll counts are compared with
> 	  the baseline and the decision will be taken to move to MED power
>    	  state or HIGH power state. The counters are calculated every 10ms.
>
> 3. Proposed  API
>
> 1.  rte_power_empty_poll_stat_init(struct ep_params **eptr,
> 		uint8_t *freq_tlb, struct ep_policy *policy);
> which is used to initialize the power management system.
>   
> 2.  rte_power_empty_poll_stat_free(void);
> which is used to free the resource hold by power management system.
>   
> 3.  rte_power_empty_poll_stat_update(unsigned int lcore_id);
> which is used to update specific core empty poll counter, not thread safe
>   
> 4.  rte_power_poll_stat_update(unsigned int lcore_id, uint8_t nb_pkt);
> which is used to update specific core valid poll counter, not thread safe
>   
> 5.  rte_power_empty_poll_stat_fetch(unsigned int lcore_id);
> which is used to get specific core empty poll counter.
>   
> 6.  rte_power_poll_stat_fetch(unsigned int lcore_id);
> which is used to get specific core valid poll counter.
>
> 7.  rte_empty_poll_detection(struct rte_timer *tim, void *arg);
> which is used to detect empty poll state changes then take action.
>
> ChangeLog:
> v2: fix some coding style issues.
> v3: rename the filename, API name.
> v4: no change.
> v5: no change.
> v6: re-work the code layout, update API.
> v7: fix minor typo and lift node num limit.
> v8: disable training as default option.
> v9: minor git log update.
> v10: update due to the code review comments.
>
> Signed-off-by: Liang Ma <liang.j.ma at intel.com>
>
> Reviewed-by: Lei Yao <lei.a.yao at intel.com>
> ---


Acked-by: David Hunt <david.hunt at intel.com>


More information about the dev mailing list