[dpdk-dev] eal/pci: Improve automatic selection of IOVA mode

David Marchand david.marchand at redhat.com
Fri Jun 14 10:42:22 CEST 2019

On Mon, Jun 3, 2019 at 6:44 PM Walker, Benjamin <benjamin.walker at intel.com>

> On Mon, 2019-06-03 at 12:48 +0200, David Marchand wrote:
> > Hello,
> >
> > On Thu, May 30, 2019 at 7:48 PM Ben Walker <benjamin.walker at intel.com>
> wrote:
> > > In SPDK, not all drivers are registered with DPDK at start up time.
> > > Previously, that meant DPDK always chose to set itself up in IOVA_PA
> > > mode. Instead, when the correct iova choice is unclear based on the
> > > devices and drivers known to DPDK at start up time, use other
> heuristics
> > > (such as whether /proc/self/pagemap is accessible) to make a better
> > > choice.
> > >
> > > This enables SPDK to run as an unprivileged user again without
> requiring
> > > users to explicitly set the iova mode on the command line.
> > >
> >
> > Interesting, I got a bz on something similar the day you sent this
> patchset ;-
> > )
> >
> >
> > - When a dpdk process is started, either it has access to physical
> addresses
> > or not, and this won't change for the rest of its life.
> > Your fix on defaulting to VA based on a rte_eal_using_phys_addrs() check
> makes
> > sense to me.
> > It is the most encountered situation when running ovs as non root on
> recent
> > kernels.
> >
> >
> > - However, I fail to see the need for all of this detection code wrt
> drivers
> > and devices.
> >
> > On one side of the equation, when dpdk starts, it checks physical address
> > availability.
> > On the other side of the equation, we have the drivers that will be
> invoked
> > when probing devices (either at dpdk init, or when hotplugging a device).
> >
> > At this point, the probing call should check the driver requirement wrt
> to the
> > kernel driver the device is attached to.
> > If this requirement is not fulfilled, then the probing fails.
> >
> >
> > - This leaves the --iova-va forcing option.
> > Why do we need it?
> > If we don't have access to physical addresses, no choice but run in VA
> mode.
> > If we have access to physical addresses, the only case would be that you
> want
> > to downgrade from PA to VA.
> > But well, your process can still access it, not sure what the benefit is.
> All of the complexity here, at least as far as I understand it, stems from
> supporting hot insert of devices. This is very important to SPDK because
> storage
> devices get hot inserted all the time, so we very much appreciate that
> DPDK has
> put in so much effort in this area and continues to accept our patches to
> improve it. I know hot insert is not nearly as important for network
> devices.
> When DPDK starts up, it needs to select whether to use virtual addresses or
> physical addresses in its memory maps. It can do that by answering the
> following
> questions:
> 1. Does the system only have buses that support an IOMMU?
> 2. Is the IOMMU sufficiently fast for the use case?
> 3. Will all of the devices that will be used with DPDK throughout the
> application's lifetime work with an IOMMU?
> If these three things are true, then the best choice is to use virtual
> addresses
> in the memory translations. However, if any of the above are not true it
> needs
> to fall back to physical addresses.
> #1 is checked by simply asking all of the buses, which are known up front.
> #2 is
> just assumed to be true. But #3 is not possible to check fully because of
> hot
> insert.
> The code currently approximates the #3 check by looking at the devices
> present
> at initialization time. If a device exists that's bound to vfio-pci, and no
> other devices exist that are bound to a uio driver, and DPDK has a
> registered
> driver that's actually going to load against the vfio-pci devices, then it
> will
> elect to use virtual addresses. This is purely a heuristic - it's not a
> definitive answer because the user could later hot insert a device that
> gets
> bound to uio.
> The user, of course, knows the answer to which addressing scheme to use
> typically. For example, these checks assume #2 is true, but there may be
> hardware implementations where it is not and the user wants to force
> physical
> addresses. Or the user may know that they are going to hot insert a device
> at
> run time that doesn't work with the IOMMU. That's why it's important to
> maintain
> the ability for the user to override the default heuristic's decision via
> the
> command line.
> My patch series is simply improving the heuristic in a few ways. First,
> previously each bus when queried would return either virtual or physical
> addresses as its choice. However, often the bus just does not have enough
> information to formulate any preference at all (and PCI was defaulting to
> physical addresses in this case). Instead, I made it so that the bus can
> return
> that it doesn't care, which pushes the decision up to a higher level. That
> higher level then makes the decision by checking whether it can access
> /proc/self/pagemap. Second, I narrowed the uio check such that physical
> addresses will only be selected if a device bound to uio exists and there
> is a
> driver registered to use it. Previously if any device was bound to uio it
> would
> select physical addresses, even if DPDK never ended up loading against that
> device.
> I think these two things make the heuristic choose the right thing more
> often,
> but it still won't always get it right so the command line option needs to
> remain.
After some exchanges offlist, on irc and taking some time looking at the
code, here are my conclusions.
Copying bus drivers maintainers/connaisseurs.

We have cases where we prefer using VA even if PA are available (for fslmc
where translating from iova as PA to VA is more costly).

I worked on Ben patches and summarised it as two main issues with the
current code:
- physical addresses availability is not taken into account early enough in
EAL init, and we end up with memory subsystem complaining later which is
not that user friendly.
  A collateral is that the init could have fallen back to using VA in most
cases if there were no strong requirement on PA.
- pci bus driver looks at all devices on the system, with no consideration
on the pci white/blacklist and no consideration on the fact that dpdk has a
driver that supports the device

I prepared a new series that I will send shortly.
I am currently considering the backport potential for it.

Else, reviews are welcome.


David Marchand

More information about the dev mailing list