[dpdk-dev] [PATCH v7 2/5] vfio: add multi container support

Burakov, Anatoly anatoly.burakov at intel.com
Mon Apr 16 12:03:01 CEST 2018


On 15-Apr-18 4:33 PM, Xiao Wang wrote:
> This patch adds APIs to support container create/destroy and device
> bind/unbind with a container. It also provides API for IOMMU programing
> on a specified container.
> 
> A driver could use "rte_vfio_create_container" helper to create a

^^ wrong API name in commit message :)

> new container from eal, use "rte_vfio_bind_group" to bind a device
> to the newly created container. During rte_vfio_setup_device the
> container bound with the device will be used for IOMMU setup.
> 
> Signed-off-by: Junjie Chen <junjie.j.chen at intel.com>
> Signed-off-by: Xiao Wang <xiao.w.wang at intel.com>
> Reviewed-by: Maxime Coquelin <maxime.coquelin at redhat.com>
> Reviewed-by: Ferruh Yigit <ferruh.yigit at intel.com>
> ---
>   lib/librte_eal/bsdapp/eal/eal.c          |  52 +++++
>   lib/librte_eal/common/include/rte_vfio.h | 119 ++++++++++++
>   lib/librte_eal/linuxapp/eal/eal_vfio.c   | 316 +++++++++++++++++++++++++++++++
>   lib/librte_eal/rte_eal_version.map       |   6 +
>   4 files changed, 493 insertions(+)
> 
> diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
> index 727adc5d2..c5106d0d6 100644
> --- a/lib/librte_eal/bsdapp/eal/eal.c
> +++ b/lib/librte_eal/bsdapp/eal/eal.c
> @@ -769,6 +769,14 @@ int rte_vfio_noiommu_is_enabled(void);
>   int rte_vfio_clear_group(int vfio_group_fd);
>   int rte_vfio_dma_map(uint64_t vaddr, uint64_t iova, uint64_t len);
>   int rte_vfio_dma_unmap(uint64_t vaddr, uint64_t iova, uint64_t len);
> +int rte_vfio_container_create(void);
> +int rte_vfio_container_destroy(int container_fd);
> +int rte_vfio_bind_group(int container_fd, int iommu_group_no);
> +int rte_vfio_unbind_group(int container_fd, int iommu_group_no);

Maybe have these under "container" too? e.g. 
rte_vfio_container_group_bind/unbind? Seems like it would be more 
consistent that way - anything to do with custom containers would be 
under rte_vfio_container_* namespace.

> +int rte_vfio_container_dma_map(int container_fd, uint64_t vaddr,
> +		uint64_t iova, uint64_t len);
> +int rte_vfio_container_dma_unmap(int container_fd, uint64_t vaddr,
> +		uint64_t iova, uint64_t len);
>   
>   int rte_vfio_setup_device(__rte_unused const char *sysfs_base,
>   		      __rte_unused const char *dev_addr,
> @@ -818,3 +826,47 @@ rte_vfio_dma_unmap(uint64_t __rte_unused vaddr, uint64_t __rte_unused iova,
>   {
>   	return -1;
>   }
> +

<...>

> diff --git a/lib/librte_eal/common/include/rte_vfio.h b/lib/librte_eal/common/include/rte_vfio.h
> index d26ab01cb..0c1509b29 100644
> --- a/lib/librte_eal/common/include/rte_vfio.h
> +++ b/lib/librte_eal/common/include/rte_vfio.h
> @@ -168,6 +168,125 @@ rte_vfio_dma_map(uint64_t vaddr, uint64_t iova, uint64_t len);
>   int __rte_experimental
>   rte_vfio_dma_unmap(uint64_t vaddr, uint64_t iova, uint64_t len);
>   
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * Create a new container for device binding.

I would add a note that any newly allocated DPDK memory will not be 
mapped into these containers by default.

> + *
> + * @return
> + *   the container fd if successful
> + *   <0 if failed
> + */
> +int __rte_experimental
> +rte_vfio_container_create(void);
> +

<...>

> + *    0 if successful
> + *   <0 if failed
> + */
> +int __rte_experimental
> +rte_vfio_unbind_group(int container_fd, int iommu_group_no);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * Perform dma mapping for devices in a conainer.

Here and in other places: "dma" should be DMA, and typo: "conainer" :)

I think you should also add a note to the original API (not this one, 
but the old one) that DMA maps done via that API will only apply to 
default container and will not apply to any of the containers created 
via container_create(). IOW, documentation should make it clear that if 
you use this functionality, you're on your own and you have to manage 
your own DMA mappings for any containers you create.

> + *
> + * @param container_fd
> + *   the specified container fd
> + *
> + * @param vaddr
> + *   Starting virtual address of memory to be mapped.
> + *

<...>

> +
> +int __rte_experimental
> +rte_vfio_container_dma_map(int container_fd, uint64_t vaddr, uint64_t iova,
> +		uint64_t len)
> +{
> +	struct user_mem_map *new_map;
> +	struct vfio_config *vfio_cfg;
> +	struct user_mem_maps *user_mem_maps;
> +	int ret = 0;
> +
> +	if (len == 0) {
> +		rte_errno = EINVAL;
> +		return -1;
> +	}
> +
> +	vfio_cfg = get_vfio_cfg_by_container_fd(container_fd);
> +	if (vfio_cfg == NULL) {
> +		RTE_LOG(ERR, EAL, "Invalid container fd\n");
> +		return -1;
> +	}
> +
> +	user_mem_maps = &vfio_cfg->mem_maps;
> +	rte_spinlock_recursive_lock(&user_mem_maps->lock);
> +	if (user_mem_maps->n_maps == VFIO_MAX_USER_MEM_MAPS) {
> +		RTE_LOG(ERR, EAL, "No more space for user mem maps\n");
> +		rte_errno = ENOMEM;
> +		ret = -1;
> +		goto out;
> +	}
> +	/* map the entry */
> +	if (vfio_dma_mem_map(vfio_cfg, vaddr, iova, len, 1)) {
> +		/* technically, this will fail if there are currently no devices
> +		 * plugged in, even if a device were added later, this mapping
> +		 * might have succeeded. however, since we cannot verify if this
> +		 * is a valid mapping without having a device attached, consider
> +		 * this to be unsupported, because we can't just store any old
> +		 * mapping and pollute list of active mappings willy-nilly.
> +		 */
> +		RTE_LOG(ERR, EAL, "Couldn't map new region for DMA\n");
> +		ret = -1;
> +		goto out;
> +	}
> +	/* create new user mem map entry */
> +	new_map = &user_mem_maps->maps[user_mem_maps->n_maps++];
> +	new_map->addr = vaddr;
> +	new_map->iova = iova;
> +	new_map->len = len;
> +
> +	compact_user_maps(user_mem_maps);
> +out:
> +	rte_spinlock_recursive_unlock(&user_mem_maps->lock);
> +	return ret;

Please correct me if i'm wrong, but it looks like you've just duplicated 
the code for rte_vfio_dma_map() here and made a few small changes. It 
would be better if you moved most of this into a static function (e.g. 
static int container_dma_map(vfio_cfg, vaddr, iova, len)) and called it 
with either default vfio_cfg from rte_vfio_dma_map, or found vfio_cfg 
from rte_vfio_container_dma_map. Same applies to function below.

> +}
> +
> +int __rte_experimental
> +rte_vfio_container_dma_unmap(int container_fd, uint64_t vaddr, uint64_t iova,
> +		uint64_t len)
> +{
> +	struct user_mem_map *map, *new_map = NULL;
> +	struct vfio_config *vfio_cfg;
> +	struct user_mem_maps *user_mem_maps;
> +	int ret = 0;
> +
> +	if (len == 0) {
> +		rte_errno = EINVAL;
> +		return -1;
> +	}
> +

<...>

-- 
Thanks,
Anatoly


More information about the dev mailing list