[dpdk-dev] eal/pci: Improve automatic selection of IOVA mode

Walker, Benjamin benjamin.walker at intel.com
Mon Jun 3 18:44:25 CEST 2019


On Mon, 2019-06-03 at 12:48 +0200, David Marchand wrote:
> Hello, 
> 
> On Thu, May 30, 2019 at 7:48 PM Ben Walker <benjamin.walker at intel.com> wrote:
> > In SPDK, not all drivers are registered with DPDK at start up time.
> > Previously, that meant DPDK always chose to set itself up in IOVA_PA
> > mode. Instead, when the correct iova choice is unclear based on the
> > devices and drivers known to DPDK at start up time, use other heuristics
> > (such as whether /proc/self/pagemap is accessible) to make a better
> > choice.
> > 
> > This enables SPDK to run as an unprivileged user again without requiring
> > users to explicitly set the iova mode on the command line.
> > 
> 
> Interesting, I got a bz on something similar the day you sent this patchset ;-
> )
> 
> 
> - When a dpdk process is started, either it has access to physical addresses
> or not, and this won't change for the rest of its life.
> Your fix on defaulting to VA based on a rte_eal_using_phys_addrs() check makes
> sense to me.
> It is the most encountered situation when running ovs as non root on recent
> kernels.
> 
> 
> - However, I fail to see the need for all of this detection code wrt drivers
> and devices.
> 
> On one side of the equation, when dpdk starts, it checks physical address
> availability.
> On the other side of the equation, we have the drivers that will be invoked
> when probing devices (either at dpdk init, or when hotplugging a device).
> 
> At this point, the probing call should check the driver requirement wrt to the
> kernel driver the device is attached to.
> If this requirement is not fulfilled, then the probing fails.
> 
> 
> - This leaves the --iova-va forcing option. 
> Why do we need it?
> If we don't have access to physical addresses, no choice but run in VA mode.
> If we have access to physical addresses, the only case would be that you want
> to downgrade from PA to VA.
> But well, your process can still access it, not sure what the benefit is.

All of the complexity here, at least as far as I understand it, stems from
supporting hot insert of devices. This is very important to SPDK because storage
devices get hot inserted all the time, so we very much appreciate that DPDK has
put in so much effort in this area and continues to accept our patches to
improve it. I know hot insert is not nearly as important for network devices.

When DPDK starts up, it needs to select whether to use virtual addresses or
physical addresses in its memory maps. It can do that by answering the following
questions:

1. Does the system only have buses that support an IOMMU?
2. Is the IOMMU sufficiently fast for the use case?
3. Will all of the devices that will be used with DPDK throughout the
application's lifetime work with an IOMMU?

If these three things are true, then the best choice is to use virtual addresses
in the memory translations. However, if any of the above are not true it needs
to fall back to physical addresses.

#1 is checked by simply asking all of the buses, which are known up front. #2 is
just assumed to be true. But #3 is not possible to check fully because of hot
insert.

The code currently approximates the #3 check by looking at the devices present
at initialization time. If a device exists that's bound to vfio-pci, and no
other devices exist that are bound to a uio driver, and DPDK has a registered
driver that's actually going to load against the vfio-pci devices, then it will
elect to use virtual addresses. This is purely a heuristic - it's not a
definitive answer because the user could later hot insert a device that gets
bound to uio.

The user, of course, knows the answer to which addressing scheme to use
typically. For example, these checks assume #2 is true, but there may be
hardware implementations where it is not and the user wants to force physical
addresses. Or the user may know that they are going to hot insert a device at
run time that doesn't work with the IOMMU. That's why it's important to maintain
the ability for the user to override the default heuristic's decision via the
command line.

My patch series is simply improving the heuristic in a few ways. First,
previously each bus when queried would return either virtual or physical
addresses as its choice. However, often the bus just does not have enough
information to formulate any preference at all (and PCI was defaulting to
physical addresses in this case). Instead, I made it so that the bus can return
that it doesn't care, which pushes the decision up to a higher level. That
higher level then makes the decision by checking whether it can access
/proc/self/pagemap. Second, I narrowed the uio check such that physical
addresses will only be selected if a device bound to uio exists and there is a
driver registered to use it. Previously if any device was bound to uio it would
select physical addresses, even if DPDK never ended up loading against that
device.

I think these two things make the heuristic choose the right thing more often,
but it still won't always get it right so the command line option needs to
remain.

Thanks,
Ben

> 
> 
> Jerin, I can see in the history you worked on this.
> What did I miss?
> Is there something wrong with dropping the detection code?
> 
> 
> 
> -- 
> David Marchand



More information about the dev mailing list