[dpdk-dev] [PATCH v16 00/11] Add PMD power management

Burakov, Anatoly anatoly.burakov at intel.com
Thu Jan 14 11:25:11 CET 2021


On 14-Jan-21 9:36 AM, David Marchand wrote:
> On Tue, Jan 12, 2021 at 6:37 PM Anatoly Burakov
> <anatoly.burakov at intel.com> wrote:
>>
>> This patchset proposes a simple API for Ethernet drivers to cause the
>> CPU to enter a power-optimized state while waiting for packets to
>> arrive. There are multiple proposed mechanisms to achieve said power
>> savings: simple frequency scaling, idle loop, and monitoring the Rx
>> queue for incoming packages. The latter is achieved through cooperation
>> with the NIC driver that will allow us to know address of wake up event,
>> and wait for writes on that address.
>>
>> On IA, this is achieved through using UMONITOR/UMWAIT instructions. They
>> are used in their raw opcode form because there is no widespread
>> compiler support for them yet. Still, the API is made generic enough to
>> hopefully support other architectures, if they happen to implement
>> similar instructions.
>>
>> To achieve power savings, there is a very simple mechanism used: we're
>> counting empty polls, and if a certain threshold is reached, we employ
>> one of the suggested power management schemes automatically, from within
>> a Rx callback inside the PMD. Once there's traffic again, the empty poll
>> counter is reset.
>>
>> This patchset also introduces a few changes into existing power
>> management-related intrinsics, namely to provide a native way of waking
>> up a sleeping core without application being responsible for it, as well
>> as general robustness improvements. There's quite a bit of locking going
>> on, but these locks are per-thread and very little (if any) contention
>> is expected, so the performance impact shouldn't be that bad (and in any
>> case the locking happens when we're about to sleep anyway).
>>
>> Why are we putting it into ethdev as opposed to leaving this up to the
>> application? Our customers specifically requested a way to do it with
>> minimal changes to the application code. The current approach allows to
>> just flip a switch and automatically have power savings.
>>
>> Things of note:
>>
>> - Only 1:1 core to queue mapping is supported, meaning that each lcore
>>    must at most handle RX on a single queue
> 
> If we want to save power, it is likely we would poll more rxqs on a thread.

We are investigating possibilities to make that happen, but for this 
patchset, this is the limitation.

> 
> 
>> - Support 3 type policies. Monitor/Pause/Frequency Scaling
>> - Power management is enabled per-queue
>> - The API doesn't extend to other device types
>>
>> v16:
>> - Implemented Konstantin's suggestions and comments
>> - Added return values to the API
> 
> - This revision breaks SPDK build (reported by UNH):
> http://mails.dpdk.org/archives/test-report/2021-January/174069.html
> 
> 
> - Build is broken for ARM and PPC at patch:
> 86491d5bd4 - (HEAD) eal: add monitor wakeup function (25 minutes ago)
> <Anatoly Burakov>
> 
> Only pasting the ARM failure:
> ninja: Entering directory `/home/dmarchan/builds/build-arm64-host-clang'
> [1/297] Compiling C object
> 'lib/76b5a35@@rte_eal at sta/librte_eal_arm_rte_power_intrinsics.c.o'.
> FAILED: lib/76b5a35@@rte_eal at sta/librte_eal_arm_rte_power_intrinsics.c.o
> aarch64-linux-gnu-gcc -Ilib/76b5a35@@rte_eal at sta -Ilib
> -I../../dpdk/lib -I. -I../../dpdk/ -Iconfig -I../../dpdk/config
> -Ilib/librte_eal/include -I../../dpdk/lib/librte_eal/include
> -Ilib/librte_eal/linux/include
> -I../../dpdk/lib/librte_eal/linux/include -Ilib/librte_eal/arm/include
> -I../../dpdk/lib/librte_eal/arm/include -Ilib/librte_eal/common
> -I../../dpdk/lib/librte_eal/common -Ilib/librte_eal
> -I../../dpdk/lib/librte_eal -Ilib/librte_kvargs
> -I../../dpdk/lib/librte_kvargs
> -Ilib/librte_telemetry/../librte_metrics
> -I../../dpdk/lib/librte_telemetry/../librte_metrics
> -Ilib/librte_telemetry -I../../dpdk/lib/librte_telemetry
> -fdiagnostics-color=always -pipe -D_FILE_OFFSET_BITS=64 -Wall
> -Winvalid-pch -Werror -O2 -g -include rte_config.h -Wextra -Wcast-qual
> -Wdeprecated -Wformat -Wformat-nonliteral -Wformat-security
> -Wmissing-declarations -Wmissing-prototypes -Wnested-externs
> -Wold-style-definition -Wpointer-arith -Wsign-compare
> -Wstrict-prototypes -Wundef -Wwrite-strings -Wno-packed-not-aligned
> -Wno-missing-field-initializers -D_GNU_SOURCE -fPIC -march=armv8-a+crc
> -DALLOW_EXPERIMENTAL_API -DALLOW_INTERNAL_API -Wno-format-truncation
> '-DABI_VERSION="21.1"' -DRTE_LIBEAL_USE_GETENTROPY -MD -MQ
> 'lib/76b5a35@@rte_eal at sta/librte_eal_arm_rte_power_intrinsics.c.o' -MF
> 'lib/76b5a35@@rte_eal at sta/librte_eal_arm_rte_power_intrinsics.c.o.d'
> -o 'lib/76b5a35@@rte_eal at sta/librte_eal_arm_rte_power_intrinsics.c.o'
> -c ../../dpdk/lib/librte_eal/arm/rte_power_intrinsics.c
> ../../dpdk/lib/librte_eal/arm/rte_power_intrinsics.c:35:1: error:
> conflicting types for ‘rte_power_monitor_wakeup’
>   rte_power_monitor_wakeup(const unsigned int lcore_id)
>   ^~~~~~~~~~~~~~~~~~~~~~~~
> In file included from
> ../../dpdk/lib/librte_eal/arm/include/rte_power_intrinsics.h:14,
>                   from ../../dpdk/lib/librte_eal/arm/rte_power_intrinsics.c:5:
> ../../dpdk/lib/librte_eal/include/generic/rte_power_intrinsics.h:79:5:
> note: previous declaration of ‘rte_power_monitor_wakeup’ was here
>   int rte_power_monitor_wakeup(const unsigned int lcore_id);
>       ^~~~~~~~~~~~~~~~~~~~~~~~
> ninja: build stopped: subcommand failed.

Woops, wrong return value in the .c files. Will fix!

> 
> 
> 
> - The ABI check is still not happy as I reported earlier.
> Reproduced on v16 (GHA had a hiccup on this revision, but previous
> ones had the failure too):
> 
> 1 Changed variable:
> 
>    [C] 'rte_eth_dev rte_eth_devices[]' was changed at rte_ethdev_core.h:196:1:
>      type of variable changed:
>        array element type 'struct rte_eth_dev' changed:
>          type size hasn't changed
>          1 data member change:
>            type of 'const eth_dev_ops* rte_eth_dev::dev_ops' changed:
>              in pointed to type 'const eth_dev_ops':
>                in unqualified underlying type 'struct eth_dev_ops' at
> rte_ethdev_driver.h:789:1:
>                  type size changed from 6208 to 6272 (in bits)
>                  1 data member insertion:
>                    'eth_get_monitor_addr_t
> eth_dev_ops::get_monitor_addr', at offset 6208 (in bits) at
> rte_ethdev_driver.h:940:1
>                  no data member changes (94 filtered);
>        type size hasn't changed
> 
> Error: ABI issue reported for 'abidiff --suppr
> /home/dmarchan/dpdk/devtools/../devtools/libabigail.abignore
> --no-added-syms --headers-dir1
> /home/dmarchan/abi/v20.11/build-gcc-static/usr/local/include
> --headers-dir2 /home/dmarchan/builds/build-gcc-static/install/usr/local/include
> /home/dmarchan/abi/v20.11/build-gcc-static/dump/librte_ethdev.dump
> /home/dmarchan/builds/build-gcc-static/install/dump/librte_ethdev.dump'
> 
> ABIDIFF_ABI_CHANGE, this change requires a review (abidiff flagged
> this as a potential issue).
> 
> One solution is to add an exception on the eth_dev_ops structure.
> 
> --- a/devtools/libabigail.abignore
> +++ b/devtools/libabigail.abignore
> @@ -7,3 +7,7 @@
>           symbol_version = INTERNAL
>   [suppress_variable]
>           symbol_version = INTERNAL
> +
> +; Explicit ignore for driver-only ABI
> +[suppress_type]
> +        name = eth_dev_ops
> 
> 

Right, OK. I didn't realize an "exception" is something you actually do 
in code, not an ad-hoc community process type thing :) I'll add this in 
the next revision.

-- 
Thanks,
Anatoly


More information about the dev mailing list