[dpdk-dev] [PATCH v6 1/4] eal/vfio: add multiple container support
Wang, Xiao W
xiao.w.wang at intel.com
Thu Apr 12 18:07:41 CEST 2018
Hi Anatoly,
> -----Original Message-----
> From: Burakov, Anatoly
> Sent: Thursday, April 12, 2018 10:04 PM
> To: Wang, Xiao W <xiao.w.wang at intel.com>; Yigit, Ferruh
> <ferruh.yigit at intel.com>
> Cc: dev at dpdk.org; maxime.coquelin at redhat.com; Wang, Zhihong
> <zhihong.wang at intel.com>; Bie, Tiwei <tiwei.bie at intel.com>; Tan, Jianfeng
> <jianfeng.tan at intel.com>; Liang, Cunming <cunming.liang at intel.com>; Daly,
> Dan <dan.daly at intel.com>; thomas at monjalon.net; gaetan.rivet at 6wind.com;
> hemant.agrawal at nxp.com; Chen, Junjie J <junjie.j.chen at intel.com>
> Subject: Re: [PATCH v6 1/4] eal/vfio: add multiple container support
>
> On 12-Apr-18 8:19 AM, Xiao Wang wrote:
> > Currently eal vfio framework binds vfio group fd to the default
> > container fd during rte_vfio_setup_device, while in some cases,
> > e.g. vDPA (vhost data path acceleration), we want to put vfio group
> > to a separate container and program IOMMU via this container.
> >
> > This patch adds some APIs to support container creating and device
> > binding with a container.
> >
> > A driver could use "rte_vfio_create_container" helper to create a
> > new container from eal, use "rte_vfio_bind_group" to bind a device
> > to the newly created container.
> >
> > During rte_vfio_setup_device, the container bound with the device
> > will be used for IOMMU setup.
> >
> > Signed-off-by: Junjie Chen <junjie.j.chen at intel.com>
> > Signed-off-by: Xiao Wang <xiao.w.wang at intel.com>
> > Reviewed-by: Maxime Coquelin <maxime.coquelin at redhat.com>
> > Reviewed-by: Ferruh Yigit <ferruh.yigit at intel.com>
> > ---
>
> Apologies for late review. Some comments below.
>
> <...>
>
> >
> > +struct rte_memseg;
> > +
> > /**
> > * Setup vfio_cfg for the device identified by its address.
> > * It discovers the configured I/O MMU groups or sets a new one for the
> device.
> > @@ -131,6 +133,117 @@ rte_vfio_clear_group(int vfio_group_fd);
> > }
> > #endif
> >
>
> <...>
>
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> notice
> > + *
> > + * Perform dma mapping for devices in a conainer.
> > + *
> > + * @param container_fd
> > + * the specified container fd
> > + *
> > + * @param dma_type
> > + * the dma map type
> > + *
> > + * @param ms
> > + * the dma address region to map
> > + *
> > + * @return
> > + * 0 if successful
> > + * <0 if failed
> > + */
> > +int __rte_experimental
> > +rte_vfio_dma_map(int container_fd, int dma_type, const struct
> rte_memseg *ms);
> > +
>
> First of all, why memseg, instead of va/iova/len? This seems like
> unnecessary attachment to internals of DPDK memory representation. Not
> all memory comes in memsegs, this makes the API unnecessarily specific
> to DPDK memory.
Agree, will use va/iova/len.
>
> Also, why providing DMA type? There's already a VFIO type pointer in
> vfio_config - you can set this pointer for every new created container,
> so the user wouldn't have to care about IOMMU type. Is it not possible
> to figure out DMA type from within EAL VFIO? If not, maybe provide an
> API to do so, e.g. rte_vfio_container_set_dma_type()?
It's possible, EAL VFIO should be able to figure out a container's DMA type.
>
> This will also need to be rebased on top of latest HEAD because there
> already is a similar DMA map/unmap API added, only without the container
> parameter. Perhaps rename these new functions to
> rte_vfio_container_(create|destroy|dma_map|dma_unmap)?
OK, will check the latest HEAD and rebase on that.
>
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> notice
> > + *
> > + * Perform dma unmapping for devices in a conainer.
> > + *
> > + * @param container_fd
> > + * the specified container fd
> > + *
> > + * @param dma_type
> > + * the dma map type
> > + *
> > + * @param ms
> > + * the dma address region to unmap
> > + *
> > + * @return
> > + * 0 if successful
> > + * <0 if failed
> > + */
> > +int __rte_experimental
> > +rte_vfio_dma_unmap(int container_fd, int dma_type, const struct
> rte_memseg *ms);
> > +
> > #endif /* VFIO_PRESENT */
> >
>
> <...>
>
> > @@ -75,8 +53,8 @@ vfio_get_group_fd(int iommu_group_no)
> > if (vfio_group_fd < 0) {
> > /* if file not found, it's not an error */
> > if (errno != ENOENT) {
> > - RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
> filename,
> > - strerror(errno));
> > + RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
> > + filename, strerror(errno));
>
> This looks like unintended change.
>
> > return -1;
> > }
> >
> > @@ -86,8 +64,10 @@ vfio_get_group_fd(int iommu_group_no)
> > vfio_group_fd = open(filename, O_RDWR);
> > if (vfio_group_fd < 0) {
> > if (errno != ENOENT) {
> > - RTE_LOG(ERR, EAL, "Cannot
> open %s: %s\n", filename,
> > - strerror(errno));
> > + RTE_LOG(ERR, EAL,
> > + "Cannot open %s: %s\n",
> > + filename,
> > + strerror(errno));
>
> This looks like unintended change.
>
> > return -1;
> > }
> > return 0;
> > @@ -95,21 +75,19 @@ vfio_get_group_fd(int iommu_group_no)
> > /* noiommu group found */
> > }
> >
> > - cur_grp->group_no = iommu_group_no;
> > - cur_grp->fd = vfio_group_fd;
> > - vfio_cfg.vfio_active_groups++;
> > return vfio_group_fd;
> > }
> > - /* if we're in a secondary process, request group fd from the primary
> > + /*
> > + * if we're in a secondary process, request group fd from the primary
> > * process via our socket
> > */
>
> This looks like unintended change.
>
> > else {
> > - int socket_fd, ret;
> > -
> > - socket_fd = vfio_mp_sync_connect_to_primary();
> > + int ret;
> > + int socket_fd = vfio_mp_sync_connect_to_primary();
> >
> > if (socket_fd < 0) {
> > - RTE_LOG(ERR, EAL, " cannot connect to primary
> process!\n");
> > + RTE_LOG(ERR, EAL,
> > + " cannot connect to primary process!\n");
>
> This looks like unintended change.
>
> > return -1;
> > }
> > if (vfio_mp_sync_send_request(socket_fd,
> SOCKET_REQ_GROUP) < 0) {
> > @@ -122,6 +100,7 @@ vfio_get_group_fd(int iommu_group_no)
> > close(socket_fd);
> > return -1;
> > }
> > +
> > ret = vfio_mp_sync_receive_request(socket_fd);
>
> This looks like unintended change.
>
> (hint: "git revert -n HEAD && git add -p" is your friend :) )
Thanks, will remove these diff.
>
> > switch (ret) {
> > case SOCKET_NO_FD:
> > @@ -132,9 +111,6 @@ vfio_get_group_fd(int iommu_group_no)
> > /* if we got the fd, store it and return it */
> > if (vfio_group_fd > 0) {
> > close(socket_fd);
> > - cur_grp->group_no = iommu_group_no;
> > - cur_grp->fd = vfio_group_fd;
> > - vfio_cfg.vfio_active_groups++;
> > return vfio_group_fd;
> > }
> > /* fall-through on error */
> > @@ -147,70 +123,349 @@ vfio_get_group_fd(int iommu_group_no)
> > return -1;
>
> <...>
>
> > +int __rte_experimental
> > +rte_vfio_create_container(void)
> > +{
> > + struct vfio_config *vfio_cfg;
> > + int i;
> > +
> > + /* Find an empty slot to store new vfio config */
> > + for (i = 1; i < VFIO_MAX_CONTAINERS; i++) {
> > + if (vfio_cfgs[i] == NULL)
> > + break;
> > + }
> > +
> > + if (i == VFIO_MAX_CONTAINERS) {
> > + RTE_LOG(ERR, EAL, "exceed max vfio container limit\n");
> > + return -1;
> > + }
> > +
> > + vfio_cfgs[i] = rte_zmalloc("vfio_container", sizeof(struct vfio_config),
> > + RTE_CACHE_LINE_SIZE);
> > + if (vfio_cfgs[i] == NULL)
> > + return -ENOMEM;
>
> Is there a specific reason why 1) dynamic allocation is used (as opposed
> to just storing a static array), and 2) DPDK memory allocation is used?
> This seems like unnecessary complication.
>
> Even if you were to decide to allocate memory instead of having a static
> array, you'll have to register for rte_eal_cleanup() to delete any
> allocated containers on DPDK exit. But, as i said, i think it would be
> better to keep it as static array.
>
Thanks for the suggestion, static array looks simpler and cleaner.
> > +
> > + RTE_LOG(INFO, EAL, "alloc container at slot %d\n", i);
> > + vfio_cfg = vfio_cfgs[i];
> > + vfio_cfg->vfio_active_groups = 0;
> > + vfio_cfg->vfio_container_fd = vfio_get_container_fd();
> > +
> > + if (vfio_cfg->vfio_container_fd < 0) {
> > + rte_free(vfio_cfgs[i]);
> > + vfio_cfgs[i] = NULL;
> > + return -1;
> > + }
> > +
> > + for (i = 0; i < VFIO_MAX_GROUPS; i++) {
> > + vfio_cfg->vfio_groups[i].group_no = -1;
> > + vfio_cfg->vfio_groups[i].fd = -1;
> > + vfio_cfg->vfio_groups[i].devices = 0;
> > + }
>
> <...>
>
> > @@ -665,41 +931,80 @@ vfio_get_group_no(const char *sysfs_base,
> > }
> >
> > static int
> > -vfio_type1_dma_map(int vfio_container_fd)
> > +do_vfio_type1_dma_map(int vfio_container_fd, const struct rte_memseg
> *ms)
>
> <...>
>
>
> > +static int
> > +do_vfio_type1_dma_unmap(int vfio_container_fd, const struct
> rte_memseg *ms)
>
> API's such as these two were recently added to DPDK.
Will check and rebase.
BRs,
Xiao
>
> --
> Thanks,
> Anatoly
More information about the dev
mailing list