[dpdk-dev] [PATCH v5] eal: pick IOVA as PA if IOMMU is not available

David Marchand david.marchand at redhat.com
Tue Jul 30 09:21:49 CEST 2019


On Mon, Jul 29, 2019 at 5:03 PM Anatoly Burakov
<anatoly.burakov at intel.com> wrote:
>
> When IOMMU is not available, /sys/kernel/iommu_groups will not be
> populated. This is happening since at least 3.6 when VFIO support
> was added. If the directory is empty, EAL should not pick IOVA as
> VA as the default IOVA mode.
>
> Signed-off-by: Anatoly Burakov <anatoly.burakov at intel.com>
> Tested-by: Darek Stojaczyk <dariusz.stojaczyk at intel.com>
> Tested-by: Jerin Jacob <jerinj at marvell.com>
> Reviewed-by: Jerin Jacob <jerinj at marvell.com>
> ---
>
> Notes:
>     v5:
>     - Clarify docs on FreeBSD
>     - Move IOMMU detection code out of VFIO sources
>
>     v4:
>     - Fix indentation in release notes' known issues
>
>     v3:
>     - Add documentation changes
>     - Fix a typo pointed out by checkpatch
>
>     v2:
>     - Decouple IOMMU from VFIO
>     - Add a check for physical addresses availability
>
>  .../prog_guide/env_abstraction_layer.rst      | 27 ++++++----
>  doc/guides/rel_notes/known_issues.rst         | 26 ++++++++++
>  doc/guides/rel_notes/release_19_08.rst        | 16 ++++++
>  lib/librte_eal/linux/eal/eal.c                | 50 ++++++++++++++++++-
>  4 files changed, 107 insertions(+), 12 deletions(-)
>
> diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
> index 1487ea550..94f30fd5d 100644
> --- a/doc/guides/prog_guide/env_abstraction_layer.rst
> +++ b/doc/guides/prog_guide/env_abstraction_layer.rst
> @@ -425,7 +425,8 @@ IOVA Mode Detection
>  IOVA Mode is selected by considering what the current usable Devices on the
>  system require and/or support.
>
> -Below is the 2-step heuristic for this choice.
> +On FreeBSD, RTE_IOVA_PA is always the default. On Linux, the IOVA mode is
> +detected based on a 2-step heuristic detailed below.
>
>  For the first step, EAL asks each bus its requirement in terms of IOVA mode
>  and decides on a preferred IOVA mode.
> @@ -438,20 +439,26 @@ and decides on a preferred IOVA mode.
>    RTE_IOVA_VA), then the preferred IOVA mode is RTE_IOVA_DC (see below with the
>    check on Physical Addresses availability),
>
> +If the buses have expressed no preference on which IOVA mode to pick, then a
> +default is selected using the following logic:
> +
> +- if physical addresses are not available, RTE_IOVA_VA mode is used
> +- if /sys/kernel/iommu_groups is not empty, RTE_IOVA_VA mode is used
> +- otherwise, RTE_IOVA_PA mode is used
> +
> +In the case when the buses had disagreed on their preferred IOVA mode, part of
> +the buses won't work because of this decision.
> +
>  The second step checks if the preferred mode complies with the Physical
>  Addresses availability since those are only available to root user in recent
> -kernels.
> -
> -- if the preferred mode is RTE_IOVA_PA but there is no access to Physical
> -  Addresses, then EAL init fails early, since later probing of the devices
> -  would fail anyway,
> -- if the preferred mode is RTE_IOVA_DC then EAL selects the RTE_IOVA_VA mode.
> -  In the case when the buses had disagreed on the IOVA Mode at the first step,
> -  part of the buses won't work because of this decision.
> +kernels. Namely, if the preferred mode is RTE_IOVA_PA but there is no access to
> +Physical Addresses, then EAL init fails early, since later probing of the
> +devices would fail anyway.
>
>  .. note::
>
> -    The RTE_IOVA_VA mode is selected as the default for the following reasons:
> +    The RTE_IOVA_VA mode is preferred as the default in most cases for the
> +    following reasons:
>
>      - All drivers are expected to work in RTE_IOVA_VA mode, irrespective of
>        physical address availability.
> diff --git a/doc/guides/rel_notes/known_issues.rst b/doc/guides/rel_notes/known_issues.rst
> index 276327c15..0b50c8306 100644
> --- a/doc/guides/rel_notes/known_issues.rst
> +++ b/doc/guides/rel_notes/known_issues.rst
> @@ -861,3 +861,29 @@ AVX-512 support disabled
>
>  **Driver/Module**:
>      ALL.
> +
> +
> +Unsuitable IOVA mode may be picked as the default
> +----------------------------------------------------------------
> +**Description**
> +   Not all kernel drivers and not all devices support all IOVA modes. EAL will
> +   attempt to pick a reasonable default based on a number of factors, but there
> +   may be cases where the default may be unsuitable (for example, hotplugging
> +   devices using `igb_uio` driver while having picked IOVA as VA mode on EAL
> +   initialization).
> +
> +**Implication**
> +   Some devices (hotplugged or otherwise) may not work due to incompatible IOVA
> +   mode being automatically picked by EAL.
> +
> +**Resolution/Workaround**:
> +   It is possible to force EAL to pick a particular IOVA mode by using the
> +   `--iova-mode` command-line parameter. If conflicting requirements are present
> +   (such as one device requiring IOVA as PA and one requiring IOVA as VA mode),
> +   there is no workaround.
> +
> +**Affected Environment/Platform**:
> +   Linux.
> +
> +**Driver/Module**:
> +   ALL.
> diff --git a/doc/guides/rel_notes/release_19_08.rst b/doc/guides/rel_notes/release_19_08.rst
> index c9bd3ce18..b399ca536 100644
> --- a/doc/guides/rel_notes/release_19_08.rst
> +++ b/doc/guides/rel_notes/release_19_08.rst
> @@ -56,6 +56,12 @@ New Features
>       Also, make sure to start the actual text at the margin.
>       =========================================================
>
> +* **EAL will now pick IOVA as VA mode as the default in most cases.**
> +
> +  Previously, preferred default IOVA mode was selected to be IOVA as PA. The
> +  behavior has now been changed to handle IOVA mode detection in a more complex
> +  manner, and will default to IOVA as VA in most cases.
> +
>  * **Added MCS lock.**
>
>    MCS lock provides scalability by spinning on a CPU/thread local variable
> @@ -436,6 +442,16 @@ Known Issues
>     =========================================================
>
>
> +* **Unsuitable IOVA mode may be picked as the default**
> +
> +  Not all kernel drivers and not all devices support all IOVA modes. EAL will
> +  attempt to pick a reasonable default based on a number of factors, but
> +  there may be cases where the default may be unsuitable.
> +
> +  It is recommended to use the `--iova-mode` command-line parameter if the
> +  default is not suitable.
> +
> +
>  Tested Platforms
>  ----------------
>
> diff --git a/lib/librte_eal/linux/eal/eal.c b/lib/librte_eal/linux/eal/eal.c
> index 34db78753..6ed602c90 100644
> --- a/lib/librte_eal/linux/eal/eal.c
> +++ b/lib/librte_eal/linux/eal/eal.c
> @@ -66,6 +66,8 @@
>
>  #define SOCKET_MEM_STRLEN (RTE_MAX_NUMA_NODES * 10)
>
> +#define KERNEL_IOMMU_GROUPS_PATH "/sys/kernel/iommu_groups"
> +
>  /* Allow the application to print its usage message too if set */
>  static rte_usage_hook_t        rte_application_usage_hook = NULL;
>
> @@ -951,6 +953,33 @@ static void rte_eal_init_alert(const char *msg)
>         RTE_LOG(ERR, EAL, "%s\n", msg);
>  }
>
> +/*
> + * on Linux 3.6+, even if VFIO is not loaded, whenever IOMMU is enabled in the
> + * BIOS and in the kernel, /sys/kernel/iommu_groups path will contain kernel
> + * IOMMU groups. If IOMMU is not enabled, that path would be empty. Therefore,
> + * checking if the path is empty will tell us if IOMMU is enabled.
> + */
> +static bool
> +is_iommu_enabled(void)
> +{
> +       DIR *dir = opendir(KERNEL_IOMMU_GROUPS_PATH);
> +       struct dirent *d;
> +       int n = 0;
> +
> +       /* if directory doesn't exist, assume IOMMU is not enabled */
> +       if (dir == NULL)
> +               return false;
> +
> +       while ((d = readdir(dir)) != NULL) {
> +               /* skip dot and dot-dot */
> +               if (++n > 2)
> +                       break;
> +       }
> +       closedir(dir);
> +
> +       return n > 2;
> +}
> +
>  /* Launch threads, called at application init(). */
>  int
>  rte_eal_init(int argc, char **argv)
> @@ -1061,8 +1090,25 @@ rte_eal_init(int argc, char **argv)
>                 enum rte_iova_mode iova_mode = rte_bus_get_iommu_class();
>
>                 if (iova_mode == RTE_IOVA_DC) {
> -                       iova_mode = RTE_IOVA_VA;
> -                       RTE_LOG(DEBUG, EAL, "Buses did not request a specific IOVA mode, select IOVA as VA mode.\n");
> +                       RTE_LOG(DEBUG, EAL, "Buses did not request a specific IOVA mode.\n");
> +
> +                       if (!phys_addrs) {
> +                               /* if we have no access to physical addresses,
> +                                * pick IOVA as VA mode.
> +                                */
> +                               iova_mode = RTE_IOVA_VA;
> +                               RTE_LOG(DEBUG, EAL, "Physical addresses are unavailable, selecting IOVA as VA mode.\n");
> +                       } else if (is_iommu_enabled()) {
> +                               /* we have an IOMMU, pick IOVA as VA mode */
> +                               iova_mode = RTE_IOVA_VA;
> +                               RTE_LOG(DEBUG, EAL, "IOMMU is available, selecting IOVA as VA mode.\n");
> +                       } else {
> +                               /* physical addresses available, and no IOMMU
> +                                * found, so pick IOVA as PA.
> +                                */
> +                               iova_mode = RTE_IOVA_PA;
> +                               RTE_LOG(DEBUG, EAL, "IOMMU is not available, selecting IOVA as PA mode.\n");
> +                       }
>                 }
>  #ifdef RTE_LIBRTE_KNI
>                 /* Workaround for KNI which requires physical address to work */
> --
> 2.17.1

Reviewed-by: David Marchand <david.marchand at redhat.com>


-- 
David Marchand


More information about the dev mailing list