[dpdk-dev] eal/pci: Improve automatic selection of IOVA mode
david.marchand at redhat.com
Fri Jun 14 10:42:22 CEST 2019
On Mon, Jun 3, 2019 at 6:44 PM Walker, Benjamin <benjamin.walker at intel.com>
> On Mon, 2019-06-03 at 12:48 +0200, David Marchand wrote:
> > Hello,
> > On Thu, May 30, 2019 at 7:48 PM Ben Walker <benjamin.walker at intel.com>
> > > In SPDK, not all drivers are registered with DPDK at start up time.
> > > Previously, that meant DPDK always chose to set itself up in IOVA_PA
> > > mode. Instead, when the correct iova choice is unclear based on the
> > > devices and drivers known to DPDK at start up time, use other
> > > (such as whether /proc/self/pagemap is accessible) to make a better
> > > choice.
> > >
> > > This enables SPDK to run as an unprivileged user again without
> > > users to explicitly set the iova mode on the command line.
> > >
> > Interesting, I got a bz on something similar the day you sent this
> patchset ;-
> > )
> > - When a dpdk process is started, either it has access to physical
> > or not, and this won't change for the rest of its life.
> > Your fix on defaulting to VA based on a rte_eal_using_phys_addrs() check
> > sense to me.
> > It is the most encountered situation when running ovs as non root on
> > kernels.
> > - However, I fail to see the need for all of this detection code wrt
> > and devices.
> > On one side of the equation, when dpdk starts, it checks physical address
> > availability.
> > On the other side of the equation, we have the drivers that will be
> > when probing devices (either at dpdk init, or when hotplugging a device).
> > At this point, the probing call should check the driver requirement wrt
> to the
> > kernel driver the device is attached to.
> > If this requirement is not fulfilled, then the probing fails.
> > - This leaves the --iova-va forcing option.
> > Why do we need it?
> > If we don't have access to physical addresses, no choice but run in VA
> > If we have access to physical addresses, the only case would be that you
> > to downgrade from PA to VA.
> > But well, your process can still access it, not sure what the benefit is.
> All of the complexity here, at least as far as I understand it, stems from
> supporting hot insert of devices. This is very important to SPDK because
> devices get hot inserted all the time, so we very much appreciate that
> DPDK has
> put in so much effort in this area and continues to accept our patches to
> improve it. I know hot insert is not nearly as important for network
> When DPDK starts up, it needs to select whether to use virtual addresses or
> physical addresses in its memory maps. It can do that by answering the
> 1. Does the system only have buses that support an IOMMU?
> 2. Is the IOMMU sufficiently fast for the use case?
> 3. Will all of the devices that will be used with DPDK throughout the
> application's lifetime work with an IOMMU?
> If these three things are true, then the best choice is to use virtual
> in the memory translations. However, if any of the above are not true it
> to fall back to physical addresses.
> #1 is checked by simply asking all of the buses, which are known up front.
> #2 is
> just assumed to be true. But #3 is not possible to check fully because of
> The code currently approximates the #3 check by looking at the devices
> at initialization time. If a device exists that's bound to vfio-pci, and no
> other devices exist that are bound to a uio driver, and DPDK has a
> driver that's actually going to load against the vfio-pci devices, then it
> elect to use virtual addresses. This is purely a heuristic - it's not a
> definitive answer because the user could later hot insert a device that
> bound to uio.
> The user, of course, knows the answer to which addressing scheme to use
> typically. For example, these checks assume #2 is true, but there may be
> hardware implementations where it is not and the user wants to force
> addresses. Or the user may know that they are going to hot insert a device
> run time that doesn't work with the IOMMU. That's why it's important to
> the ability for the user to override the default heuristic's decision via
> command line.
> My patch series is simply improving the heuristic in a few ways. First,
> previously each bus when queried would return either virtual or physical
> addresses as its choice. However, often the bus just does not have enough
> information to formulate any preference at all (and PCI was defaulting to
> physical addresses in this case). Instead, I made it so that the bus can
> that it doesn't care, which pushes the decision up to a higher level. That
> higher level then makes the decision by checking whether it can access
> /proc/self/pagemap. Second, I narrowed the uio check such that physical
> addresses will only be selected if a device bound to uio exists and there
> is a
> driver registered to use it. Previously if any device was bound to uio it
> select physical addresses, even if DPDK never ended up loading against that
> I think these two things make the heuristic choose the right thing more
> but it still won't always get it right so the command line option needs to
After some exchanges offlist, on irc and taking some time looking at the
code, here are my conclusions.
Copying bus drivers maintainers/connaisseurs.
We have cases where we prefer using VA even if PA are available (for fslmc
where translating from iova as PA to VA is more costly).
I worked on Ben patches and summarised it as two main issues with the
- physical addresses availability is not taken into account early enough in
EAL init, and we end up with memory subsystem complaining later which is
not that user friendly.
A collateral is that the init could have fallen back to using VA in most
cases if there were no strong requirement on PA.
- pci bus driver looks at all devices on the system, with no consideration
on the pci white/blacklist and no consideration on the fact that dpdk has a
driver that supports the device
I prepared a new series that I will send shortly.
I am currently considering the backport potential for it.
Else, reviews are welcome.
More information about the dev