[dpdk-dev] [RFC 0/6] Power-optimized RX for Ethernet devices

Anatoly Burakov anatoly.burakov at intel.com
Wed May 27 19:02:00 CEST 2020


This patchset proposes a simple API for Ethernet drivers
to cause the CPU to enter a power-optimized state while
waiting for packets to arrive, along with a set of
(hopefully generic) intrinsics that facilitate that. This
is achieved through cooperation with the NIC driver that
will allow us to know address of the next NIC RX ring
packet descriptor, and wait for writes on it.

On IA, this is achieved through using UMONITOR/UMWAIT
instructions. They are used in their raw opcode form
because there is no widespread compiler support for
them yet. Still, the API is made generic enough to
hopefully support other architectures, if they happen
to implement similar instructions.

To achieve power savings, there is a very simple mechanism
used: we're counting empty polls, and if a certain threshold
is reached, we get the address of next RX ring descriptor
from the NIC driver, arm the monitoring hardware, and
enter a power-optimized state. We will then wake up when
either a timeout happens, or a write happens (or generally
whenever CPU feels like waking up - this is platform-
specific), and proceed as normal. The empty poll counter is
reset whenever we actually get packets, so we only go to
sleep when we know nothing is going on.

Why are we putting it into ethdev as opposed to leaving
this up to the application? Our customers specifically
requested a way to do it wit minimal changes to the
application code. The current approach allows to just
flip a switch and automagically have power savings.

There are certain limitations in this patchset right now:
- Currently, only 1:1 core to queue mapping is supported,
  meaning that each lcore must at most handle RX on a
  single queue
- Currently, power management is enabled per-port, not
  per-queue
- There is potential to greatly increase TX latency if we
  are buffering things, and go to sleep before sending
  packets
- The API is not perfect and could use some improvement
  and discussion
- The API doesn't extend to other device types
- The intrinsics are platform-specific, so ethdev has
  some platform-specific code in it
- Support was only implemented for devices using
  net/ixgbe, net/i40e and net/ice drivers

Hopefully this would generate enough feedback to clear
a path forward!

Anatoly Burakov (6):
  eal: add power management intrinsics
  ethdev: add simple power management API
  net/ixgbe: implement power management API
  net/i40e: implement power management API
  net/ice: implement power management API
  app/testpmd: add command for power management on a port

 app/test-pmd/cmdline.c                        |  48 +++++++
 drivers/net/i40e/i40e_ethdev.c                |   1 +
 drivers/net/i40e/i40e_rxtx.c                  |  23 +++
 drivers/net/i40e/i40e_rxtx.h                  |   2 +
 drivers/net/ice/ice_ethdev.c                  |   1 +
 drivers/net/ice/ice_rxtx.c                    |  23 +++
 drivers/net/ice/ice_rxtx.h                    |   2 +
 drivers/net/ixgbe/ixgbe_ethdev.c              |   1 +
 drivers/net/ixgbe/ixgbe_rxtx.c                |  22 +++
 drivers/net/ixgbe/ixgbe_rxtx.h                |   2 +
 .../include/generic/rte_power_intrinsics.h    |  64 +++++++++
 lib/librte_eal/include/meson.build            |   1 +
 lib/librte_eal/x86/include/meson.build        |   1 +
 lib/librte_eal/x86/include/rte_cpuflags.h     |   1 +
 .../x86/include/rte_power_intrinsics.h        | 134 ++++++++++++++++++
 lib/librte_eal/x86/rte_cpuflags.c             |   2 +
 lib/librte_ethdev/rte_ethdev.c                |  39 +++++
 lib/librte_ethdev/rte_ethdev.h                |  70 +++++++++
 lib/librte_ethdev/rte_ethdev_core.h           |  41 +++++-
 lib/librte_ethdev/rte_ethdev_version.map      |   4 +
 20 files changed, 480 insertions(+), 2 deletions(-)
 create mode 100644 lib/librte_eal/include/generic/rte_power_intrinsics.h
 create mode 100644 lib/librte_eal/x86/include/rte_power_intrinsics.h

-- 
2.17.1


More information about the dev mailing list