[dpdk-dev] [PATCH v3] vfio: fix sPAPR IOMMU DMA window size

Jonas Pfefferle1 JPF at zurich.ibm.com
Tue Aug 8 11:29:45 CEST 2017


"Burakov, Anatoly" <anatoly.burakov at intel.com> wrote on 08/08/2017 11:15:24
AM:

> From: "Burakov, Anatoly" <anatoly.burakov at intel.com>
> To: Jonas Pfefferle <jpf at zurich.ibm.com>
> Cc: "dev at dpdk.org" <dev at dpdk.org>, "aik at ozlabs.ru" <aik at ozlabs.ru>
> Date: 08/08/2017 11:18 AM
> Subject: RE: [PATCH v3] vfio: fix sPAPR IOMMU DMA window size
>
> From: Jonas Pfefferle [mailto:jpf at zurich.ibm.com]
> > Sent: Tuesday, August 8, 2017 9:41 AM
> > To: Burakov, Anatoly <anatoly.burakov at intel.com>
> > Cc: dev at dpdk.org; aik at ozlabs.ru; Jonas Pfefferle <jpf at zurich.ibm.com>
> > Subject: [PATCH v3] vfio: fix sPAPR IOMMU DMA window size
> >
> > DMA window size needs to be big enough to span all memory segment's
> > physical addresses. We do not need multiple levels of IOMMU tables
> > as we already span ~70TB of physical memory with 16MB hugepages.
> >
> > Signed-off-by: Jonas Pfefferle <jpf at zurich.ibm.com>
> > ---
> > v2:
> > * roundup to next power 2 function without loop.
> >
> > v3:
> > * Replace roundup_next_pow2 with rte_align64pow2
> >
> >  lib/librte_eal/linuxapp/eal/eal_vfio.c | 13 ++++++++++---
> >  1 file changed, 10 insertions(+), 3 deletions(-)
> >
> > diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c
> > b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> > index 946df7e..550c41c 100644
> > --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
> > +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
> > @@ -759,10 +759,12 @@ vfio_spapr_dma_map(int vfio_container_fd)
> >        return -1;
> >     }
> >
> > -   /* calculate window size based on number of hugepages configured
> > */
> > -   create.window_size = rte_eal_get_physmem_size();
> > +   /* physicaly pages are sorted descending i.e. ms[0].phys_addr is
max
> > */
>
> Do we always expect that to be the case in the future? Maybe it
> would be safer to walk the memsegs list.
>
> Thanks,
> Anatoly

I had this loop in before but removed it in favor of simplicity.
If we believe that the ordering is going to change in the future
I'm happy to bring back the loop. Is there other code which is
relying on the fact that the memsegs are sorted by their physical
addresses?

>
> > +   /* create DMA window from 0 to max(phys_addr + len) */
> > +   /* sPAPR requires window size to be a power of 2 */
> > +   create.window_size = rte_align64pow2(ms[0].phys_addr +
> > ms[0].len);
> >     create.page_shift = __builtin_ctzll(ms->hugepage_sz);
> > -   create.levels = 2;
> > +   create.levels = 1;
> >
> >     ret = ioctl(vfio_container_fd, VFIO_IOMMU_SPAPR_TCE_CREATE,
> > &create);
> >     if (ret) {
> > @@ -771,6 +773,11 @@ vfio_spapr_dma_map(int vfio_container_fd)
> >        return -1;
> >     }
> >
> > +   if (create.start_addr != 0) {
> > +      RTE_LOG(ERR, EAL, "  DMA window start address != 0\n");
> > +      return -1;
> > +   }
> > +
> >     /* map all DPDK segments for DMA. use 1:1 PA to IOVA mapping */
> >     for (i = 0; i < RTE_MAX_MEMSEG; i++) {
> >        struct vfio_iommu_type1_dma_map dma_map;
> > --
> > 2.7.4
>


More information about the dev mailing list