[dpdk-dev] [EXT] Re: [PATCH] bus/pci: fix IOVA as VA mode selection

Burakov, Anatoly anatoly.burakov at intel.com
Tue Jul 9 16:37:05 CEST 2019


On 09-Jul-19 3:00 PM, Jerin Jacob Kollanukkaran wrote:
>> -----Original Message-----
>> From: Burakov, Anatoly <anatoly.burakov at intel.com>
>> Sent: Tuesday, July 9, 2019 7:00 PM
>> To: Jerin Jacob Kollanukkaran <jerinj at marvell.com>; David Marchand
>> <david.marchand at redhat.com>
>> Cc: dev <dev at dpdk.org>; Thomas Monjalon <thomas at monjalon.net>; Ben
>> Walker <benjamin.walker at intel.com>
>> Subject: Re: [EXT] Re: [dpdk-dev] [PATCH] bus/pci: fix IOVA as VA mode
>> selection
>>
>> On 09-Jul-19 1:11 PM, Jerin Jacob Kollanukkaran wrote:
>>>> -----Original Message-----
>>>> From: Burakov, Anatoly <anatoly.burakov at intel.com>
>>>> Sent: Tuesday, July 9, 2019 5:10 PM
>>>> To: Jerin Jacob Kollanukkaran <jerinj at marvell.com>; David Marchand
>>>> <david.marchand at redhat.com>
>>>> Cc: dev <dev at dpdk.org>; Thomas Monjalon <thomas at monjalon.net>;
>> Ben
>>>> Walker <benjamin.walker at intel.com>
>>>> Subject: Re: [EXT] Re: [dpdk-dev] [PATCH] bus/pci: fix IOVA as VA
>>>> mode selection
>>>>>>> ________________________________________
>>>>>>>
>>>>>>> On Mon, Jul 8, 2019 at 4:25 PM <mailto:jerinj at marvell.com> wrote:
>>>>>>> From: Jerin Jacob <mailto:jerinj at marvell.com>
>>>>>>>
>>>>>>> Existing logic fails to select IOVA mode as VA if driver request
>>>>>>> to enable IOVA as VA.
>>>>>>>
>>>>>>> IOVA as VA has more strict requirement than other modes, so
>>>>>>> enabling positive logic for IOVA as VA selection.
>>>>>>>
>>>>>>> This patch also updates the default IOVA mode as PA for PCI
>>>>>>> devices as it has to deal with DMA engines unlike the virtual
>>>>>>> devices that may need only IOVA as DC.
>>>>>>>
>>>>>>> We have three cases:
>>>>>>> - driver/hw supports IOVA as PA only
>>>>>>>
>>>>>>> [Jerin] It is not driver cap, it is more of system cap(IOMMU vs
>>>>>>> non IOMMU). We are already addressing that case
>>>>>>
>>>>>> I don't get how this works. How does "system capability" affect
>>>>>> what the device itself supports? Are we to assume that *all*
>>>>>> hardware support IOVA as VA by default? "System capability" is more
>>>>>> of a bus issue than an individual device issue, is it not?
>>>>>
>>>>> What I meant is, supporting VA vs PA is function of IOMMU(not the
>>>>> device
>>>> attribute).
>>>>> Ie. Device makes the  bus master request, if IOMMU available and
>>>>> enabled in the SYSTEM , It goes over IOMMU  and translate the IOVA
>>>>> to
>>>> physical address.
>>>>>
>>>>> Another way to put is, Is there any _PCIe_ device which
>>>>> need/requires RTE_PCI_DRV_NEED_IOVA_AS_PA in
>>>>> rte_pci_driver.drv_flags
>>>>>
>>>>>
>>>>
>>>> Previously, as far as i can tell, the flag was used to indicate
>>>> support for IOVA as VA mode, not *requirement* for IOVA as VA mode.
>>>> For example, there are multiple patches [1][2][3][4] (i'm sure i can
>>>> find more!) that added IOVA as VA support to various drivers, and
>>>> they all were worded it in this exact way
>>>> - "support for IOVA as VA mode", not "require IOVA as VA mode". As
>>>> far as i can tell, none of these drivers *require* IOVA as VA mode -
>>>> they merely use this flag to indicate support for it.
>>>
>>> Some class of devices NEED IOVA as VA for performance reasons.
>>> Specially the devices has HW mempool allocators. On those devices If
>>> we don’t use IOVA as VA, Upon getting packet from device, It needs to
>>> go over rte_mem_iova2virt() per packet see driver/net/dppa2. Which has
>> real performance issue.
>>
>> I wouldn't classify this as "needing" IOVA. "Need" implies it cannot work
>> without it, whereas in this case it's more of a "highly recommended" rather
>> than "need".
> 
> It is "need" as performance is horrible without it as is per packet SW translation.
> A "need" for DPDK performance perspective.

Would the driver fail to initialize if it detects running as IOVA as PA?

> 
>>
>>>>
>>>> Now suddenly it turns out that someone somewhere "knew" that "IOVA
>> as
>>>> VA" flag in PCI drivers is supposed to indicate *requirement* and not
>>>> support, and it appears that this knowledge was not communicated nor
>>>> documented anywhere, and is now treated as common knowledge.
>>>
>>> I think, the confusion here is,  I was under impression that # If
>>> device supports IOVA as VA and system runs with IOMMU then the  dpdk
>>> should run in IOVA as VA mode.
>>> If above statement true then we don’t really need a new flag.
>>
>> Exactly. And the flag used to indicate that the device *supports* IOVA as VA,
>> not that it *requires* it.
>>
>>>
>>> Couple of points to make forward progress:
>>> # If we think, there is a use case where device is IOVA as VA And
>>> system runs in IOMMU mode then for some reason DPDK needs to run in
>> PA
>>> mode. If so, we need to create two flags RTE_PCI_DRV_IOVA_AS_VA - it
>>> can run either modes
>>
>> There are use cases - KNI and igb_uio come to mind. Whether IOMMU uses
>> VA or PA is a different from whether IOMMU is in use - there is no law that
>> states that, when using IOMMU, IOVA have to have 1:1 mapping with VA.
>> IOMMU requirement does not necessarily imply IOVA as VA - it is perfectly
>> legal to program IOMMU to use IOVA as PA (which we currently do when we
>> e.g. use VFIO for some devices and igb_uio for others).
> 
> For KNI, we already submitted a patch to support IOVA as VA.

Yep, point being that it *didn't work* before, hence we may want to 
account for possible future use cases like this (however admittedly 
hacky they are). There are valid use cases to enforce IOVA as VA support 
only (such as for performance reasons, even though it would be 
technically possible to use IOVA as PA), and there could be valid use 
cases to enforce IOVA as PA support only (for example, i seem to 
remember that crypto/qat driver at one point didn't support VFIO driver, 
effectively rendering it not supporting IOVA as VA).

> I don’t know about igb_uio, if IOVA as PA, we might as well disable IOMMU.
> Is igb_uio enables IOMMU? I don’t see any reference.
> grep -ri "iommu" kernel/linux/igb_uio/

igb_uio can work with IOMMU with pass-through mode. When Linux is booted 
up in pass-through, IOMMU is enabled and igb_uio will work, and VFIO can 
use both IOVA as PA and IOVA as VA, while igb_uio can only use IOVA as 
PA. So yes, igb_uio does enable IOMMU in a very limited way, but only to 
set up 1:1 mapping of IOVA with PA.

Also, some other use cases will also require IOVA as PA while having 
full IOMMU support. An example of this would be systems with limited 
IOMMU width (such as VM's) - even though the IOMMU is technically 
supported, we may not have the necessary address width to run all 
devices in IOVA as VA mode, and would need to fall back to IOVA as PA. 
Since we cannot *require* IOVA as VA in current codebase, any driver 
that expects IOVA as VA to always be enabled will presumably not work.

> 
> Again, it is not device attribute, it is system attribute.

If it's a system attribute, why is it a device driver flag then? The 
system may or may not support IOMMU, the device itself probably doesn't 
care since bus address looks the same in both cases, *but the driver 
might* (such as would be in your case - requiring IOVA as VA and 
disallowing IOVA as PA for performance reasons).

> In current KNI case it
> Fall backs to PA irrespective of device capability so we don’t need any
> separate flag from driver.
> 
> Even if we introduce a flag, what it is supposed to do?

The same thing you are suggesting for one of your HW mempool drivers: a 
"need" to only run in IOVA as VA mode.

The logic in this case (as far as the driver is concerned, disregarding 
the kernel driver issue for now) would be:

has_pa = (drv->flags & SUPPORTS_PA) != 0;
has_va = (drv->flags & SUPPORTS_VA) != 0;
if (has_va && has_pa)
     return RTE_IOVA_DC; // don't care, supports both
if (has_va)
     return RTE_IOVA_VA; // only supports VA - your case
return RTE_IOVA_PA; // only supports PA

Currently (again, disregarding your interpretation of how IOVA as VA 
works and looking at the actual commit history), we always seem to imply 
that IOVA as PA works for all devices, and we use IOVA_AS_VA flag to 
indicate that the device *also* supports IOVA as VA mode.

But we don't have any way to express a *requirement* for IOVA as VA mode 
- only for IOVA as PA mode. That is the purpose of the new flag. You are 
stating that the IOVA_AS_VA drv flag is an expression of that 
requirement, but that is not reflected in the codebase - our commit 
history indicates that we don't treat IOVA as VA as hard requirement 
whenever this flag is specified (and i would argue that we shouldn't).

> 
>>
>>> RTE_PCI_DRV_NEED_IOVA_AS_VA - it can run only on IOVA as VA
>>
>> If we're adding a flag, we might as well not create a confusion and do it
>> consistently. If IOVA as PA is supported, have a flag to indicate that. If IOVA
>> as VA is supported, have a flag to indicate that. Absence of either flag implies
> 
> So in what category i40e driver comes? By default, pci bus should return PA for class.
> If VA supported then return VA.
> So how new flag will help?

We seem to be in agreement that we need *two* flags to express all three 
of the above. The question is, which flags. You suggest to have 
"supports IOVA as VA" and "requires IOVA as VA" as two options, while i 
am suggesting to have "supports IOVA as PA" and "supports IOVA as VA" as 
flags. This requires modification to all existing drivers and is perhaps 
undesirable from that point of view (this isn't my decision), but it's 
less confusing than having two IOVA-as-VA flags that differ slightly in 
their meaning (supports VA vs. requires VA).

Going back to your i40e example, AFAIK i40e supports both IOVA as VA and 
IOVA as PA - so in this case it should return RTE_IOVA_DC (i.e. use 
whatever's available). If other devices also don't care, then push the 
decision to the upper layers and not decide anything at the bus level [1].

[1] http://patchwork.dpdk.org/patch/54801/

> 
>> inability to work in that mode. I don't see how this is less clear and self-
>> documenting than having two IOVA as VA-related flags that have slightly
>> different meaning and imply things not otherwise stated explicitly.
>>
>>> # With top of tree, Currently it never runs in IOVA as VA mode.
>>> That’s a separate problem to fix. Which effect all the devices
>>> Currently supporting RTE_PCI_DRV_IOVA_AS_VA. Ie even though Device
>>> support RTE_PCI_DRV_IOVA_AS_VA, it is not running With IOMMU
>>> protection and/or root privilege is required to run DPDK.
> 
> What's your view on this existing problem?

My view would be to always run in IOVA as VA by default and only falling 
back to IOVA as PA if there is a need to do that. Yet, it seems that 
whenever i try to bring this up, the response (not necessarily from you, 
so this is not directed at you specifically) seems to be that because of 
hotplug, we have to start in the "safest" (from device support point of 
view) mode - that is, in IOVA as PA. Seeing how, as you claim, some 
devices require IOVA as VA, then IOVA as PA is no longer the "safe" 
default that all devices will support. Perhaps we can use this 
opportunity to finally make IOVA as VA the default :)

> 
> 
>>>
>>>
>>>>
>>>> [1] http://patchwork.dpdk.org/patch/53206/
>>>> [2] http://patchwork.dpdk.org/patch/50274/
>>>> [3] http://patchwork.dpdk.org/patch/50991/
>>>> [4] http://patchwork.dpdk.org/patch/46134/
>>>>
>>>> --
>>>> Thanks,
>>>> Anatoly
>>
>>
>> --
>> Thanks,
>> Anatoly


-- 
Thanks,
Anatoly


More information about the dev mailing list