[dpdk-dev] [PATCH v6 1/4] eal/vfio: add multiple container support

Wang, Xiao W xiao.w.wang at intel.com
Thu Apr 12 18:07:41 CEST 2018


Hi Anatoly,

> -----Original Message-----
> From: Burakov, Anatoly
> Sent: Thursday, April 12, 2018 10:04 PM
> To: Wang, Xiao W <xiao.w.wang at intel.com>; Yigit, Ferruh
> <ferruh.yigit at intel.com>
> Cc: dev at dpdk.org; maxime.coquelin at redhat.com; Wang, Zhihong
> <zhihong.wang at intel.com>; Bie, Tiwei <tiwei.bie at intel.com>; Tan, Jianfeng
> <jianfeng.tan at intel.com>; Liang, Cunming <cunming.liang at intel.com>; Daly,
> Dan <dan.daly at intel.com>; thomas at monjalon.net; gaetan.rivet at 6wind.com;
> hemant.agrawal at nxp.com; Chen, Junjie J <junjie.j.chen at intel.com>
> Subject: Re: [PATCH v6 1/4] eal/vfio: add multiple container support
> 
> On 12-Apr-18 8:19 AM, Xiao Wang wrote:
> > Currently eal vfio framework binds vfio group fd to the default
> > container fd during rte_vfio_setup_device, while in some cases,
> > e.g. vDPA (vhost data path acceleration), we want to put vfio group
> > to a separate container and program IOMMU via this container.
> >
> > This patch adds some APIs to support container creating and device
> > binding with a container.
> >
> > A driver could use "rte_vfio_create_container" helper to create a
> > new container from eal, use "rte_vfio_bind_group" to bind a device
> > to the newly created container.
> >
> > During rte_vfio_setup_device, the container bound with the device
> > will be used for IOMMU setup.
> >
> > Signed-off-by: Junjie Chen <junjie.j.chen at intel.com>
> > Signed-off-by: Xiao Wang <xiao.w.wang at intel.com>
> > Reviewed-by: Maxime Coquelin <maxime.coquelin at redhat.com>
> > Reviewed-by: Ferruh Yigit <ferruh.yigit at intel.com>
> > ---
> 
> Apologies for late review. Some comments below.
> 
> <...>
> 
> >
> > +struct rte_memseg;
> > +
> >   /**
> >    * Setup vfio_cfg for the device identified by its address.
> >    * It discovers the configured I/O MMU groups or sets a new one for the
> device.
> > @@ -131,6 +133,117 @@ rte_vfio_clear_group(int vfio_group_fd);
> >   }
> >   #endif
> >
> 
> <...>
> 
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> notice
> > + *
> > + * Perform dma mapping for devices in a conainer.
> > + *
> > + * @param container_fd
> > + *   the specified container fd
> > + *
> > + * @param dma_type
> > + *   the dma map type
> > + *
> > + * @param ms
> > + *   the dma address region to map
> > + *
> > + * @return
> > + *    0 if successful
> > + *   <0 if failed
> > + */
> > +int __rte_experimental
> > +rte_vfio_dma_map(int container_fd, int dma_type, const struct
> rte_memseg *ms);
> > +
> 
> First of all, why memseg, instead of va/iova/len? This seems like
> unnecessary attachment to internals of DPDK memory representation. Not
> all memory comes in memsegs, this makes the API unnecessarily specific
> to DPDK memory.

Agree, will use va/iova/len.

> 
> Also, why providing DMA type? There's already a VFIO type pointer in
> vfio_config - you can set this pointer for every new created container,
> so the user wouldn't have to care about IOMMU type. Is it not possible
> to figure out DMA type from within EAL VFIO? If not, maybe provide an
> API to do so, e.g. rte_vfio_container_set_dma_type()?

It's possible, EAL VFIO should be able to figure out a container's DMA type.

> 
> This will also need to be rebased on top of latest HEAD because there
> already is a similar DMA map/unmap API added, only without the container
> parameter. Perhaps rename these new functions to
> rte_vfio_container_(create|destroy|dma_map|dma_unmap)?

OK, will check the latest HEAD and rebase on that.

> 
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> notice
> > + *
> > + * Perform dma unmapping for devices in a conainer.
> > + *
> > + * @param container_fd
> > + *   the specified container fd
> > + *
> > + * @param dma_type
> > + *    the dma map type
> > + *
> > + * @param ms
> > + *   the dma address region to unmap
> > + *
> > + * @return
> > + *    0 if successful
> > + *   <0 if failed
> > + */
> > +int __rte_experimental
> > +rte_vfio_dma_unmap(int container_fd, int dma_type, const struct
> rte_memseg *ms);
> > +
> >   #endif /* VFIO_PRESENT */
> >
> 
> <...>
> 
> > @@ -75,8 +53,8 @@ vfio_get_group_fd(int iommu_group_no)
> >   		if (vfio_group_fd < 0) {
> >   			/* if file not found, it's not an error */
> >   			if (errno != ENOENT) {
> > -				RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
> filename,
> > -						strerror(errno));
> > +				RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
> > +					filename, strerror(errno));
> 
> This looks like unintended change.
> 
> >   				return -1;
> >   			}
> >
> > @@ -86,8 +64,10 @@ vfio_get_group_fd(int iommu_group_no)
> >   			vfio_group_fd = open(filename, O_RDWR);
> >   			if (vfio_group_fd < 0) {
> >   				if (errno != ENOENT) {
> > -					RTE_LOG(ERR, EAL, "Cannot
> open %s: %s\n", filename,
> > -							strerror(errno));
> > +					RTE_LOG(ERR, EAL,
> > +						"Cannot open %s: %s\n",
> > +						filename,
> > +						strerror(errno));
> 
> This looks like unintended change.
> 
> >   					return -1;
> >   				}
> >   				return 0;
> > @@ -95,21 +75,19 @@ vfio_get_group_fd(int iommu_group_no)
> >   			/* noiommu group found */
> >   		}
> >
> > -		cur_grp->group_no = iommu_group_no;
> > -		cur_grp->fd = vfio_group_fd;
> > -		vfio_cfg.vfio_active_groups++;
> >   		return vfio_group_fd;
> >   	}
> > -	/* if we're in a secondary process, request group fd from the primary
> > +	/*
> > +	 * if we're in a secondary process, request group fd from the primary
> >   	 * process via our socket
> >   	 */
> 
> This looks like unintended change.
> 
> >   	else {
> > -		int socket_fd, ret;
> > -
> > -		socket_fd = vfio_mp_sync_connect_to_primary();
> > +		int ret;
> > +		int socket_fd = vfio_mp_sync_connect_to_primary();
> >
> >   		if (socket_fd < 0) {
> > -			RTE_LOG(ERR, EAL, "  cannot connect to primary
> process!\n");
> > +			RTE_LOG(ERR, EAL,
> > +				"  cannot connect to primary process!\n");
> 
> This looks like unintended change.
> 
> >   			return -1;
> >   		}
> >   		if (vfio_mp_sync_send_request(socket_fd,
> SOCKET_REQ_GROUP) < 0) {
> > @@ -122,6 +100,7 @@ vfio_get_group_fd(int iommu_group_no)
> >   			close(socket_fd);
> >   			return -1;
> >   		}
> > +
> >   		ret = vfio_mp_sync_receive_request(socket_fd);
> 
> This looks like unintended change.
> 
> (hint: "git revert -n HEAD && git add -p" is your friend :) )

Thanks, will remove these diff.

> 
> >   		switch (ret) {
> >   		case SOCKET_NO_FD:
> > @@ -132,9 +111,6 @@ vfio_get_group_fd(int iommu_group_no)
> >   			/* if we got the fd, store it and return it */
> >   			if (vfio_group_fd > 0) {
> >   				close(socket_fd);
> > -				cur_grp->group_no = iommu_group_no;
> > -				cur_grp->fd = vfio_group_fd;
> > -				vfio_cfg.vfio_active_groups++;
> >   				return vfio_group_fd;
> >   			}
> >   			/* fall-through on error */
> > @@ -147,70 +123,349 @@ vfio_get_group_fd(int iommu_group_no)
> >   	return -1;
> 
> <...>
> 
> > +int __rte_experimental
> > +rte_vfio_create_container(void)
> > +{
> > +	struct vfio_config *vfio_cfg;
> > +	int i;
> > +
> > +	/* Find an empty slot to store new vfio config */
> > +	for (i = 1; i < VFIO_MAX_CONTAINERS; i++) {
> > +		if (vfio_cfgs[i] == NULL)
> > +			break;
> > +	}
> > +
> > +	if (i == VFIO_MAX_CONTAINERS) {
> > +		RTE_LOG(ERR, EAL, "exceed max vfio container limit\n");
> > +		return -1;
> > +	}
> > +
> > +	vfio_cfgs[i] = rte_zmalloc("vfio_container", sizeof(struct vfio_config),
> > +		RTE_CACHE_LINE_SIZE);
> > +	if (vfio_cfgs[i] == NULL)
> > +		return -ENOMEM;
> 
> Is there a specific reason why 1) dynamic allocation is used (as opposed
> to just storing a static array), and 2) DPDK memory allocation is used?
> This seems like unnecessary complication.
> 
> Even if you were to decide to allocate memory instead of having a static
> array, you'll have to register for rte_eal_cleanup() to delete any
> allocated containers on DPDK exit. But, as i said, i think it would be
> better to keep it as static array.
>

Thanks for the suggestion, static array looks simpler and cleaner.
 
> > +
> > +	RTE_LOG(INFO, EAL, "alloc container at slot %d\n", i);
> > +	vfio_cfg = vfio_cfgs[i];
> > +	vfio_cfg->vfio_active_groups = 0;
> > +	vfio_cfg->vfio_container_fd = vfio_get_container_fd();
> > +
> > +	if (vfio_cfg->vfio_container_fd < 0) {
> > +		rte_free(vfio_cfgs[i]);
> > +		vfio_cfgs[i] = NULL;
> > +		return -1;
> > +	}
> > +
> > +	for (i = 0; i < VFIO_MAX_GROUPS; i++) {
> > +		vfio_cfg->vfio_groups[i].group_no = -1;
> > +		vfio_cfg->vfio_groups[i].fd = -1;
> > +		vfio_cfg->vfio_groups[i].devices = 0;
> > +	}
> 
> <...>
> 
> > @@ -665,41 +931,80 @@ vfio_get_group_no(const char *sysfs_base,
> >   }
> >
> >   static int
> > -vfio_type1_dma_map(int vfio_container_fd)
> > +do_vfio_type1_dma_map(int vfio_container_fd, const struct rte_memseg
> *ms)
> 
> <...>
> 
> 
> > +static int
> > +do_vfio_type1_dma_unmap(int vfio_container_fd, const struct
> rte_memseg *ms)
> 
> API's such as these two were recently added to DPDK.

Will check and rebase.

BRs,
Xiao

> 
> --
> Thanks,
> Anatoly


More information about the dev mailing list