[dpdk-dev] [PATCH v6 1/4] eal/vfio: add multiple container support

Burakov, Anatoly anatoly.burakov at intel.com
Thu Apr 12 16:03:38 CEST 2018


On 12-Apr-18 8:19 AM, Xiao Wang wrote:
> Currently eal vfio framework binds vfio group fd to the default
> container fd during rte_vfio_setup_device, while in some cases,
> e.g. vDPA (vhost data path acceleration), we want to put vfio group
> to a separate container and program IOMMU via this container.
> 
> This patch adds some APIs to support container creating and device
> binding with a container.
> 
> A driver could use "rte_vfio_create_container" helper to create a
> new container from eal, use "rte_vfio_bind_group" to bind a device
> to the newly created container.
> 
> During rte_vfio_setup_device, the container bound with the device
> will be used for IOMMU setup.
> 
> Signed-off-by: Junjie Chen <junjie.j.chen at intel.com>
> Signed-off-by: Xiao Wang <xiao.w.wang at intel.com>
> Reviewed-by: Maxime Coquelin <maxime.coquelin at redhat.com>
> Reviewed-by: Ferruh Yigit <ferruh.yigit at intel.com>
> ---

Apologies for late review. Some comments below.

<...>

>   
> +struct rte_memseg;
> +
>   /**
>    * Setup vfio_cfg for the device identified by its address.
>    * It discovers the configured I/O MMU groups or sets a new one for the device.
> @@ -131,6 +133,117 @@ rte_vfio_clear_group(int vfio_group_fd);
>   }
>   #endif
>   

<...>

> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * Perform dma mapping for devices in a conainer.
> + *
> + * @param container_fd
> + *   the specified container fd
> + *
> + * @param dma_type
> + *   the dma map type
> + *
> + * @param ms
> + *   the dma address region to map
> + *
> + * @return
> + *    0 if successful
> + *   <0 if failed
> + */
> +int __rte_experimental
> +rte_vfio_dma_map(int container_fd, int dma_type, const struct rte_memseg *ms);
> +

First of all, why memseg, instead of va/iova/len? This seems like 
unnecessary attachment to internals of DPDK memory representation. Not 
all memory comes in memsegs, this makes the API unnecessarily specific 
to DPDK memory.

Also, why providing DMA type? There's already a VFIO type pointer in 
vfio_config - you can set this pointer for every new created container, 
so the user wouldn't have to care about IOMMU type. Is it not possible 
to figure out DMA type from within EAL VFIO? If not, maybe provide an 
API to do so, e.g. rte_vfio_container_set_dma_type()?

This will also need to be rebased on top of latest HEAD because there 
already is a similar DMA map/unmap API added, only without the container 
parameter. Perhaps rename these new functions to 
rte_vfio_container_(create|destroy|dma_map|dma_unmap)?

> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * Perform dma unmapping for devices in a conainer.
> + *
> + * @param container_fd
> + *   the specified container fd
> + *
> + * @param dma_type
> + *    the dma map type
> + *
> + * @param ms
> + *   the dma address region to unmap
> + *
> + * @return
> + *    0 if successful
> + *   <0 if failed
> + */
> +int __rte_experimental
> +rte_vfio_dma_unmap(int container_fd, int dma_type, const struct rte_memseg *ms);
> +
>   #endif /* VFIO_PRESENT */
>   

<...>

> @@ -75,8 +53,8 @@ vfio_get_group_fd(int iommu_group_no)
>   		if (vfio_group_fd < 0) {
>   			/* if file not found, it's not an error */
>   			if (errno != ENOENT) {
> -				RTE_LOG(ERR, EAL, "Cannot open %s: %s\n", filename,
> -						strerror(errno));
> +				RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
> +					filename, strerror(errno));

This looks like unintended change.

>   				return -1;
>   			}
>   
> @@ -86,8 +64,10 @@ vfio_get_group_fd(int iommu_group_no)
>   			vfio_group_fd = open(filename, O_RDWR);
>   			if (vfio_group_fd < 0) {
>   				if (errno != ENOENT) {
> -					RTE_LOG(ERR, EAL, "Cannot open %s: %s\n", filename,
> -							strerror(errno));
> +					RTE_LOG(ERR, EAL,
> +						"Cannot open %s: %s\n",
> +						filename,
> +						strerror(errno));

This looks like unintended change.

>   					return -1;
>   				}
>   				return 0;
> @@ -95,21 +75,19 @@ vfio_get_group_fd(int iommu_group_no)
>   			/* noiommu group found */
>   		}
>   
> -		cur_grp->group_no = iommu_group_no;
> -		cur_grp->fd = vfio_group_fd;
> -		vfio_cfg.vfio_active_groups++;
>   		return vfio_group_fd;
>   	}
> -	/* if we're in a secondary process, request group fd from the primary
> +	/*
> +	 * if we're in a secondary process, request group fd from the primary
>   	 * process via our socket
>   	 */

This looks like unintended change.

>   	else {
> -		int socket_fd, ret;
> -
> -		socket_fd = vfio_mp_sync_connect_to_primary();
> +		int ret;
> +		int socket_fd = vfio_mp_sync_connect_to_primary();
>   
>   		if (socket_fd < 0) {
> -			RTE_LOG(ERR, EAL, "  cannot connect to primary process!\n");
> +			RTE_LOG(ERR, EAL,
> +				"  cannot connect to primary process!\n");

This looks like unintended change.

>   			return -1;
>   		}
>   		if (vfio_mp_sync_send_request(socket_fd, SOCKET_REQ_GROUP) < 0) {
> @@ -122,6 +100,7 @@ vfio_get_group_fd(int iommu_group_no)
>   			close(socket_fd);
>   			return -1;
>   		}
> +
>   		ret = vfio_mp_sync_receive_request(socket_fd);

This looks like unintended change.

(hint: "git revert -n HEAD && git add -p" is your friend :) )

>   		switch (ret) {
>   		case SOCKET_NO_FD:
> @@ -132,9 +111,6 @@ vfio_get_group_fd(int iommu_group_no)
>   			/* if we got the fd, store it and return it */
>   			if (vfio_group_fd > 0) {
>   				close(socket_fd);
> -				cur_grp->group_no = iommu_group_no;
> -				cur_grp->fd = vfio_group_fd;
> -				vfio_cfg.vfio_active_groups++;
>   				return vfio_group_fd;
>   			}
>   			/* fall-through on error */
> @@ -147,70 +123,349 @@ vfio_get_group_fd(int iommu_group_no)
>   	return -1;

<...>

> +int __rte_experimental
> +rte_vfio_create_container(void)
> +{
> +	struct vfio_config *vfio_cfg;
> +	int i;
> +
> +	/* Find an empty slot to store new vfio config */
> +	for (i = 1; i < VFIO_MAX_CONTAINERS; i++) {
> +		if (vfio_cfgs[i] == NULL)
> +			break;
> +	}
> +
> +	if (i == VFIO_MAX_CONTAINERS) {
> +		RTE_LOG(ERR, EAL, "exceed max vfio container limit\n");
> +		return -1;
> +	}
> +
> +	vfio_cfgs[i] = rte_zmalloc("vfio_container", sizeof(struct vfio_config),
> +		RTE_CACHE_LINE_SIZE);
> +	if (vfio_cfgs[i] == NULL)
> +		return -ENOMEM;

Is there a specific reason why 1) dynamic allocation is used (as opposed 
to just storing a static array), and 2) DPDK memory allocation is used? 
This seems like unnecessary complication.

Even if you were to decide to allocate memory instead of having a static 
array, you'll have to register for rte_eal_cleanup() to delete any 
allocated containers on DPDK exit. But, as i said, i think it would be 
better to keep it as static array.

> +
> +	RTE_LOG(INFO, EAL, "alloc container at slot %d\n", i);
> +	vfio_cfg = vfio_cfgs[i];
> +	vfio_cfg->vfio_active_groups = 0;
> +	vfio_cfg->vfio_container_fd = vfio_get_container_fd();
> +
> +	if (vfio_cfg->vfio_container_fd < 0) {
> +		rte_free(vfio_cfgs[i]);
> +		vfio_cfgs[i] = NULL;
> +		return -1;
> +	}
> +
> +	for (i = 0; i < VFIO_MAX_GROUPS; i++) {
> +		vfio_cfg->vfio_groups[i].group_no = -1;
> +		vfio_cfg->vfio_groups[i].fd = -1;
> +		vfio_cfg->vfio_groups[i].devices = 0;
> +	}

<...>

> @@ -665,41 +931,80 @@ vfio_get_group_no(const char *sysfs_base,
>   }
>   
>   static int
> -vfio_type1_dma_map(int vfio_container_fd)
> +do_vfio_type1_dma_map(int vfio_container_fd, const struct rte_memseg *ms)

<...>


> +static int
> +do_vfio_type1_dma_unmap(int vfio_container_fd, const struct rte_memseg *ms)

API's such as these two were recently added to DPDK.

-- 
Thanks,
Anatoly


More information about the dev mailing list