[PATCH v2 32/33] net/ena: control path pure polling mode
    shaibran at amazon.com 
    shaibran at amazon.com
       
    Mon Mar  4 13:29:41 CET 2024
    
    
  
From: Shai Brandes <shaibran at amazon.com>
This commit implements a new operation mode that enables purely
polling-based functionality, eliminating the need for interrupts in
the control path. This mode is not activated by default and can be
toggled using the "control_poll_interval" devarg. When operating in
this mode, periodic alarms are used to monitor the control queues.
A non-zero value for this devarg is mandatory for control path
functionality when binding ports to uio_pci_generic kernel module which
lacks interrupt support.
Signed-off-by: Shai Brandes <shaibran at amazon.com>
Reviewed-by: Amit Bernstein <amitbern at amazon.com>
---
 doc/guides/nics/ena.rst                |  52 +++++++++---
 doc/guides/rel_notes/release_24_03.rst |   2 +
 drivers/net/ena/ena_ethdev.c           | 108 ++++++++++++++++++++-----
 drivers/net/ena/ena_ethdev.h           |   5 ++
 4 files changed, 133 insertions(+), 34 deletions(-)
diff --git a/doc/guides/nics/ena.rst b/doc/guides/nics/ena.rst
index 53c9341859..d2dd4fa4a0 100644
--- a/doc/guides/nics/ena.rst
+++ b/doc/guides/nics/ena.rst
@@ -109,12 +109,16 @@ Runtime Configuration
 
    * **llq_policy** (default 1)
 
-     Controls whether use device recommended header policy or override it.
+     Controls whether use device recommended header policy or override it:
+
      0 - Disable LLQ.
-         **Use with extreme caution as it leads to a huge performance
-         degradation on AWS instances from 6th generation onwards.**
+     **Use with extreme caution as it leads to a huge performance
+     degradation on AWS instances from 6th generation onwards.**
+
      1 - Accept device recommended LLQ policy (Default).
+
      2 - Enforce normal LLQ policy.
+
      3 - Enforce large LLQ policy.
 
    * **miss_txc_to** (default 5)
@@ -126,6 +130,18 @@ Runtime Configuration
      timer service. Setting this parameter to 0 disables this feature. Maximum
      allowed value is 60 seconds.
 
+   * **control_poll_interval** (default 0)
+
+     Enable polling-based functionality of the admin queues, eliminating the
+     need for interrupts in the control-path:
+
+     0 - Disable (Admin queue will work in interrupt mode).
+
+     [1..1000] - Number of milliseconds to wait between periodic inspection of the admin queues.
+
+     **A non-zero value for this devarg is mandatory for control path functionality
+     when binding ports to uio_pci_generic kernel module which lacks interrupt support.**
+
 ENA Configuration Parameters
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
@@ -164,23 +180,23 @@ Prerequisites
 #. Prepare the system as recommended by DPDK suite.  This includes environment
    variables, hugepages configuration, tool-chains and configuration.
 
-#. ENA PMD can operate with ``vfio-pci``(*) or ``igb_uio`` driver.
+#. ENA PMD can operate with ``vfio-pci`` (*), ``igb_uio``, or ``uio_pci_generic`` driver.
 
    (*) ENAv2 hardware supports Low Latency Queue v2 (LLQv2). This feature
    reduces the latency of the packets by pushing the header directly through
    the PCI to the device, before the DMA is even triggered. For proper work
-   kernel PCI driver must support write combining (WC).
+   kernel PCI driver must support write-combining (WC).
    In DPDK ``igb_uio`` it must be enabled by loading module with
    ``wc_activate=1`` flag (example below). However, mainline's vfio-pci
-   driver in kernel doesn't have WC support yet (planed to be added).
+   driver in kernel doesn't have WC support yet (planned to be added).
    If vfio-pci is used user should follow `AWS ENA PMD documentation
    <https://github.com/amzn/amzn-drivers/tree/master/userspace/dpdk/README.md>`_.
 
-#. Insert ``vfio-pci`` or ``igb_uio`` kernel module using the command
-   ``modprobe vfio-pci`` or ``modprobe uio; insmod igb_uio.ko wc_activate=1``
-   respectively.
+#. For ``igb_uio``:
+   Insert ``igb_uio`` kernel module using the command ``modprobe uio; insmod igb_uio.ko wc_activate=1``
 
-#. For ``vfio-pci`` users only:
+#. For ``vfio-pci``:
+   Insert ``vfio-pci`` kernel module using the command ``modprobe vfio-pci``
    Please make sure that ``IOMMU`` is enabled in your system,
    or use ``vfio`` driver in ``noiommu`` mode::
 
@@ -189,7 +205,17 @@ Prerequisites
    To use ``noiommu`` mode, the ``vfio-pci`` must be built with flag
    ``CONFIG_VFIO_NOIOMMU``.
 
-#. Bind the intended ENA device to ``vfio-pci`` or ``igb_uio`` module.
+#. For ``uio_pci_generic``:
+   Insert ``uio_pci_generic`` kernel module using the command ``modprobe uio_pci_generic``.
+   Make sure that the IOMMU is disabled or is in passthrough mode.
+   For example: ``modprobe uio_pci_generic intel_iommu=off``.
+
+   Note that when launching the application, the ``control_poll_interval`` devarg must be used with a non-zero value (1000 is recommended)
+   as ``uio_pci_generic`` lacks interrupt support. The control-path (admin queues) of the ENA require poll-mode
+   to process command completion and asynchronous notification from the device.
+   For example: ``dpdk-app -a "00:06.0,control_path_poll_interval=1000"``.
+
+#. Bind the intended ENA device to ``vfio-pci``, ``igb_uio``, or ``uio_pci_generic`` module.
 
 At this point the system should be ready to run DPDK applications. Once the
 application runs to completion, the ENA can be detached from attached module if
@@ -198,7 +224,7 @@ necessary.
 **Rx interrupts support**
 
 ENA PMD supports Rx interrupts, which can be used to wake up lcores waiting for
-input. Please note that it won't work with ``igb_uio``, so to use this feature,
+input. Please note that it won't work with ``igb_uio`` and ``uio_pci_generic`` so to use this feature,
 the ``vfio-pci`` should be used.
 
 ENA handles admin interrupts and AENQ notifications on separate interrupt.
@@ -209,7 +235,7 @@ will fail.
 **Note about usage on \*.metal instances**
 
 On AWS, the metal instances are supporting IOMMU for both arm64 and x86_64
-hosts.
+hosts. Note that ``uio_pci_generic`` lacks IOMMU support and cannot be used for metal instances.
 
 * x86_64 (e.g. c5.metal, i3.metal):
    IOMMU should be disabled by default. In that situation, the ``igb_uio`` can
diff --git a/doc/guides/rel_notes/release_24_03.rst b/doc/guides/rel_notes/release_24_03.rst
index 9823616eeb..d01236097a 100644
--- a/doc/guides/rel_notes/release_24_03.rst
+++ b/doc/guides/rel_notes/release_24_03.rst
@@ -109,6 +109,8 @@ New Features
   * Replaced `enable_llq` and `large_llq_hdr` devargs with a new devarg `llq_policy`.
   * Added support for LLQ header size recommendation from the device.
   * Allowed large LLQ with 1024 entries when the device supports enlarged memory BAR.
+  * Added `control_poll_interval` devarg that configure control-path to work in poll-mode.
+  * Added support for binding ports to `uio_pci_generic` kernel module.
 
 * **Updated Atomic Rules' Arkville driver.**
 
diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
index 43693ee2ee..af1f6d6d05 100644
--- a/drivers/net/ena/ena_ethdev.c
+++ b/drivers/net/ena/ena_ethdev.c
@@ -3,6 +3,7 @@
  * All rights reserved.
  */
 
+#include <rte_alarm.h>
 #include <rte_string_fns.h>
 #include <rte_errno.h>
 #include <rte_version.h>
@@ -36,6 +37,8 @@
 
 #define ENA_MIN_RING_DESC	128
 
+#define USEC_PER_MSEC		1000UL
+
 #define BITS_PER_BYTE 8
 
 #define BITS_PER_TYPE(type) (sizeof(type) * BITS_PER_BYTE)
@@ -95,6 +98,14 @@ struct ena_stats {
  * considered as a missing.
  */
 #define ENA_DEVARG_MISS_TXC_TO "miss_txc_to"
+/*
+ * Controls the period of time (in milliseconds) between two consecutive inspections of
+ * the control queues when the driver is in poll mode and not using interrupts.
+ * By default, this value is zero, indicating that the driver will not be in poll mode and will
+ * use interrupts. A non-zero value for this argument is mandatory when using uio_pci_generic
+ * driver.
+ */
+#define ENA_DEVARG_CONTROL_PATH_POLL_INTERVAL "control_path_poll_interval"
 
 /*
  * Each rte_memzone should have unique name.
@@ -271,7 +282,8 @@ static uint64_t ena_get_rx_queue_offloads(struct ena_adapter *adapter);
 static uint64_t ena_get_tx_queue_offloads(struct ena_adapter *adapter);
 static int ena_infos_get(struct rte_eth_dev *dev,
 			 struct rte_eth_dev_info *dev_info);
-static void ena_interrupt_handler_rte(void *cb_arg);
+static void ena_control_path_handler(void *cb_arg);
+static void ena_control_path_poll_handler(void *cb_arg);
 static void ena_timer_wd_callback(struct rte_timer *timer, void *arg);
 static void ena_destroy_device(struct rte_eth_dev *eth_dev);
 static int eth_ena_dev_init(struct rte_eth_dev *eth_dev);
@@ -882,10 +894,14 @@ static int ena_close(struct rte_eth_dev *dev)
 		ret = ena_stop(dev);
 	adapter->state = ENA_ADAPTER_STATE_CLOSED;
 
-	rte_intr_disable(intr_handle);
-	rc = rte_intr_callback_unregister_sync(intr_handle, ena_interrupt_handler_rte, dev);
-	if (unlikely(rc != 0))
-		PMD_INIT_LOG(ERR, "Failed to unregister interrupt handler\n");
+	if (!adapter->control_path_poll_interval) {
+		rte_intr_disable(intr_handle);
+		rc = rte_intr_callback_unregister_sync(intr_handle, ena_control_path_handler, dev);
+		if (unlikely(rc != 0))
+			PMD_INIT_LOG(ERR, "Failed to unregister interrupt handler\n");
+	} else {
+		rte_eal_alarm_cancel(ena_control_path_poll_handler, dev);
+	}
 
 	ena_rx_queue_release_all(dev);
 	ena_tx_queue_release_all(dev);
@@ -1889,15 +1905,33 @@ static int ena_device_init(struct ena_adapter *adapter,
 	return rc;
 }
 
-static void ena_interrupt_handler_rte(void *cb_arg)
+static void ena_control_path_handler(void *cb_arg)
 {
 	struct rte_eth_dev *dev = cb_arg;
 	struct ena_adapter *adapter = dev->data->dev_private;
 	struct ena_com_dev *ena_dev = &adapter->ena_dev;
 
-	ena_com_admin_q_comp_intr_handler(ena_dev);
-	if (likely(adapter->state != ENA_ADAPTER_STATE_CLOSED))
+	if (likely(adapter->state != ENA_ADAPTER_STATE_CLOSED)) {
+		ena_com_admin_q_comp_intr_handler(ena_dev);
 		ena_com_aenq_intr_handler(ena_dev, dev);
+	}
+}
+
+static void ena_control_path_poll_handler(void *cb_arg)
+{
+	struct rte_eth_dev *dev = cb_arg;
+	struct ena_adapter *adapter = dev->data->dev_private;
+	int rc;
+
+	if (likely(adapter->state != ENA_ADAPTER_STATE_CLOSED)) {
+		ena_control_path_handler(cb_arg);
+		rc = rte_eal_alarm_set(adapter->control_path_poll_interval,
+				       ena_control_path_poll_handler, cb_arg);
+		if (unlikely(rc != 0)) {
+			PMD_DRV_LOG(ERR, "Failed to retrigger control path alarm\n");
+			ena_trigger_reset(adapter, ENA_REGS_RESET_GENERIC);
+		}
+	}
 }
 
 static void check_for_missing_keep_alive(struct ena_adapter *adapter)
@@ -2362,20 +2396,28 @@ static int eth_ena_dev_init(struct rte_eth_dev *eth_dev)
 
 	rte_spinlock_init(&adapter->admin_lock);
 
-	rte_intr_callback_register(intr_handle,
-				   ena_interrupt_handler_rte,
-				   eth_dev);
+	if (!adapter->control_path_poll_interval) {
+		/* Control path interrupt mode */
+		rte_intr_callback_register(intr_handle, ena_control_path_handler, eth_dev);
 	rte_intr_enable(intr_handle);
 	ena_com_set_admin_polling_mode(ena_dev, false);
 	ena_com_admin_aenq_enable(ena_dev);
-
+	} else {  /* Control path polling mode */
+		rc = rte_eal_alarm_set(adapter->control_path_poll_interval,
+				       ena_control_path_poll_handler, eth_dev);
+		if (unlikely(rc != 0)) {
+			PMD_DRV_LOG(ERR, "Failed to set control path alarm\n");
+			goto err_control_path_destroy;
+		}
+	}
 	rte_timer_init(&adapter->timer_wd);
 
 	adapters_found++;
 	adapter->state = ENA_ADAPTER_STATE_INIT;
 
 	return 0;
-
+err_control_path_destroy:
+	rte_free(adapter->drv_stats);
 err_rss_destroy:
 	ena_com_rss_destroy(ena_dev);
 err_delete_debug_area:
@@ -3656,9 +3698,9 @@ static int ena_process_uint_devarg(const char *key,
 {
 	struct ena_adapter *adapter = opaque;
 	char *str_end;
-	uint64_t uint_value;
+	uint64_t uint64_value;
 
-	uint_value = strtoull(value, &str_end, DECIMAL_BASE);
+	uint64_value = strtoull(value, &str_end, DECIMAL_BASE);
 	if (value == str_end) {
 		PMD_INIT_LOG(ERR,
 			"Invalid value for key '%s'. Only uint values are accepted.\n",
@@ -3667,12 +3709,12 @@ static int ena_process_uint_devarg(const char *key,
 	}
 
 	if (strcmp(key, ENA_DEVARG_MISS_TXC_TO) == 0) {
-		if (uint_value > ENA_MAX_TX_TIMEOUT_SECONDS) {
+		if (uint64_value > ENA_MAX_TX_TIMEOUT_SECONDS) {
 			PMD_INIT_LOG(ERR,
 				"Tx timeout too high: %" PRIu64 " sec. Maximum allowed: %d sec.\n",
-				uint_value, ENA_MAX_TX_TIMEOUT_SECONDS);
+				uint64_value, ENA_MAX_TX_TIMEOUT_SECONDS);
 			return -EINVAL;
-		} else if (uint_value == 0) {
+		} else if (uint64_value == 0) {
 			PMD_INIT_LOG(INFO,
 				"Check for missing Tx completions has been disabled.\n");
 			adapter->missing_tx_completion_to =
@@ -3680,9 +3722,27 @@ static int ena_process_uint_devarg(const char *key,
 		} else {
 			PMD_INIT_LOG(INFO,
 				"Tx packet completion timeout set to %" PRIu64 " seconds.\n",
-				uint_value);
+				uint64_value);
 			adapter->missing_tx_completion_to =
-				uint_value * rte_get_timer_hz();
+				uint64_value * rte_get_timer_hz();
+		}
+	} else if (strcmp(key, ENA_DEVARG_CONTROL_PATH_POLL_INTERVAL) == 0) {
+		if (uint64_value > ENA_MAX_CONTROL_PATH_POLL_INTERVAL_MSEC) {
+			PMD_INIT_LOG(ERR,
+				"Control path polling interval is too long: %" PRIu64 " msecs. "
+				"Maximum allowed: %d msecs.\n",
+				uint64_value, ENA_MAX_CONTROL_PATH_POLL_INTERVAL_MSEC);
+			return -EINVAL;
+		} else if (uint64_value == 0) {
+			PMD_INIT_LOG(INFO,
+				"Control path polling interval is set to zero. Operating in "
+				"interrupt mode.\n");
+				adapter->control_path_poll_interval = 0;
+		} else {
+			PMD_INIT_LOG(INFO,
+				"Control path polling interval is set to %" PRIu64 " msecs.\n",
+				uint64_value);
+				adapter->control_path_poll_interval = uint64_value * USEC_PER_MSEC;
 		}
 	}
 
@@ -3712,6 +3772,7 @@ static int ena_parse_devargs(struct ena_adapter *adapter, struct rte_devargs *de
 	static const char * const allowed_args[] = {
 		ENA_DEVARG_LLQ_POLICY,
 		ENA_DEVARG_MISS_TXC_TO,
+		ENA_DEVARG_CONTROL_PATH_POLL_INTERVAL,
 		NULL,
 	};
 	struct rte_kvargs *kvlist;
@@ -3734,6 +3795,10 @@ static int ena_parse_devargs(struct ena_adapter *adapter, struct rte_devargs *de
 		ena_process_uint_devarg, adapter);
 	if (rc != 0)
 		goto exit;
+	rc = rte_kvargs_process(kvlist, ENA_DEVARG_CONTROL_PATH_POLL_INTERVAL,
+		ena_process_uint_devarg, adapter);
+	if (rc != 0)
+		goto exit;
 
 exit:
 	rte_kvargs_free(kvlist);
@@ -3954,7 +4019,8 @@ RTE_PMD_REGISTER_PCI_TABLE(net_ena, pci_id_ena_map);
 RTE_PMD_REGISTER_KMOD_DEP(net_ena, "* igb_uio | uio_pci_generic | vfio-pci");
 RTE_PMD_REGISTER_PARAM_STRING(net_ena,
 	ENA_DEVARG_LLQ_POLICY "=<0|1|2|3> "
-	ENA_DEVARG_MISS_TXC_TO "=<uint>");
+	ENA_DEVARG_MISS_TXC_TO "=<uint>"
+	ENA_DEVARG_CONTROL_PATH_POLL_INTERVAL "=<0-1000>");
 RTE_LOG_REGISTER_SUFFIX(ena_logtype_init, init, NOTICE);
 RTE_LOG_REGISTER_SUFFIX(ena_logtype_driver, driver, NOTICE);
 #ifdef RTE_ETHDEV_DEBUG_RX
diff --git a/drivers/net/ena/ena_ethdev.h b/drivers/net/ena/ena_ethdev.h
index 6716f01ba5..85e816ae72 100644
--- a/drivers/net/ena/ena_ethdev.h
+++ b/drivers/net/ena/ena_ethdev.h
@@ -44,6 +44,8 @@
 #define ENA_MONITORED_TX_QUEUES		3
 #define ENA_DEFAULT_MISSING_COMP	256U
 
+#define ENA_MAX_CONTROL_PATH_POLL_INTERVAL_MSEC 1000
+
 /* While processing submitted and completed descriptors (rx and tx path
  * respectively) in a loop it is desired to:
  *  - perform batch submissions while populating submission queue
@@ -346,6 +348,9 @@ struct ena_adapter {
 
 	uint64_t memzone_cnt;
 
+	/* Time (in microseconds) of the control path queues monitoring interval */
+	uint64_t control_path_poll_interval;
+
 	/*
 	 * Helper variables for holding the information about the supported
 	 * metrics.
-- 
2.17.1
    
    
More information about the dev
mailing list