[dpdk-dev] [PATCH v6 08/11] eal: pci: introduce RTE_KDRV_VFIO_NOIOMMUi driver mode

Santosh Shukla sshukla at mvista.com
Tue Jan 26 15:05:46 CET 2016


On Tue, Jan 26, 2016 at 6:30 PM, Thomas Monjalon
<thomas.monjalon at 6wind.com> wrote:
> 2016-01-26 15:56, Santosh Shukla:
>> On Mon, Jan 25, 2016 at 8:59 PM, Thomas Monjalon
>> <thomas.monjalon at 6wind.com> wrote:
>> > 2016-01-21 22:47, Santosh Shukla:
>> >> On Thu, Jan 21, 2016 at 8:16 PM, Thomas Monjalon
>> >> <thomas.monjalon at 6wind.com> wrote:
>> >> > 2016-01-21 17:34, Santosh Shukla:
>> >> >> On Thu, Jan 21, 2016 at 4:58 PM, Thomas Monjalon
>> >> >> <thomas.monjalon at 6wind.com> wrote:
>> >> >> > 2016-01-21 16:43, Santosh Shukla:
>> >> >> >> David Marchand <david.marchand at 6wind.com> wrote:
>> >> >> >> > This is a mode (specific to vfio), not a new kernel driver.
>> >> >> >> >
>> >> >> >> Yes, Specific to VFIO and this is why noiommu appended after vfio i.e..
>> >> >> >> __VFIO and __VFIO_NOIOMMU.
>> >> >> >
>> >> >> > Woaaa! Your logic is really disappointing :)
>> >> >> > Specific to VFIO => append _NOIOMMU
>> >> >> > If it's for VFIO, it should be called VFIO (that's my logic).
>> >> >> >
>> >> >> I am confused by reading your comment. vfio works for default iommu
>> >> >> and now with noiommu. drv->kdrv need to know driver mode for vfio
>> >> >> case. So that user can simply read drv->kdrv value in their driver and
>> >> >> accordingly use vfio rd/wr api for example {pread/pwrite}. This is how
>> >> >> rte_eal_pci_vfio_read/write_bar() api implemented.
>> >> >
>> >> > Sorry I don't understand. Why EAL read/write functions should be different
>> >> > depending of the VFIO mode?
>> >>
>> >> no, EAL rd/wr functions are not different for vfio or vfio modes {same
>> >> for iommu or noiommu}. Pl. see pci_eal_read/write_bar() api. Those
>> >> apis currently used for VFIO, Irrespective of vfio mode. If required,
>> >> we can add UIO bar_rd/wr api too. pci_eal_rd/wr_bar() are abstract
>> >> apis. Underneath implementation can be vfio or uio type.
>> >
>> > It means you agree the suffix _NOIOMMU is not needed?
>> > It seems we go nowhere in this discussion. You said
>> > "drv->kdrv need to know driver mode for vfio"
>>
>> In my observation, currently virtio work for vfio-noiommu, that's why
>> said drv->kdrv need to know vfio mode.
>
> It is your observation. It may change in near future.
>

so that mean till then, virtio support for non-x86 arch has to wait?
We have working model with vfio-noiommu, don't you think it make sense
to let vfio_noiommu implementation exist and later in-case
virtio+iommu gets mainline then switch to vfio __mode__ agnostic
approach. And for that All it takes to replace __noiommu suffix with
default.

>> > and after
>> > "Those apis currently used for VFIO, Irrespective of vfio mode"
>> > That's why I assume your first assumption was wrong.
>> >
>>
>> Newly introduced dpdk global api pci_eal_rd/wr_bar(),  can be used for
>> vfio and uio both; can be used for vfio w/IOMMU and vfio w/o IOMMU
>> both.
>>
>> >> >> > Why do we care to parse noiommu only?
>> >> >>
>> >> >> Because pmd drivers example virtio can work with vfio only in
>> >> >> _noiommu_ mode. In particular, virtio spec 0.95 / legacy virtio.
>> >> >
>> >> > Please could you explain the limitation (except IOMMU availability)?
>> >>
>> >> Ok.
>> >>
>> >> I believe - we both agree that noiommu mode is a need for pmd drivers
>> >> like virtio, right? if so then other reason is implementation driven
>> >
>> > No, noiommu is a need for some environment having no IOMMU.
>> > But in my understanding, virtio could run with a nested IOMMU.
>>
>> Interesting, like to understand nested one, I did tried in past by
>> passing "iommu=pt intel_iommu=on kvm-intel.nested=1" in cmdline for
>> x86 (for guest/host both), but virtio pci device binding to vfio-pci
>> driver fails. Tried on 4.2 kernel (qemu version 2.5), is it working
>> for >4.2 kernel/ qemu-version?
>
> I haven't tried.
>
>> >> i.e..
>> >>
>> >> Pl. look at virtio_pci.c in this patch.. VIRTIO_RD/WR/_1/2/4()
>> >> implementation. They are in-use and applicable to  virtio spec 0.95,
>> >> so far support uio/ioport-way rd/wr. Now to support vfio-way rd/wr -
>> >> need to check drv->kdrv value, that value should be of vfio_noiommu
>> >> types __not__  generic _vfio types.
>> >
>> > I still don't understand why it would not work with VFIO w/IOMMU.
>>
>> with vfio+iommu; binding virtio pci device to vfio-pci driver fail;
>> giving below error:
>> [   53.053464] VFIO - User Level meta-driver version: 0.3
>> [   73.077805] vfio-pci: probe of 0000:00:03.0 failed with error -22
>> [   73.077852] vfio-pci: probe of 0000:00:03.0 failed with error -22
>>
>> vfio_pci_probe() --> vfio_iommu_group_get() --> iommu_group_get()
>> fails: iommu doesn't have group for virtio pci device.
>
> Yes it fails when binding.
> So the later check in the virtio PMD is useless.
>

Which check?

>> In case of noiommu, it prepares the group / add device to iommu group,
>> so it passes.
>>
>> Jason in other thread mentioned that he is working on virtio+iommu
>> approach [1], Patches are not merged and I am yet to evaluate his
>> patches for virtio pmd driver for iommu(+vfio). so wondering how
>> virtio pci device could work unless jason patches used?
>>
>> [1] https://www.mail-archive.com/qemu-devel@nongnu.org/msg337079.html
>
> I haven't tried nested IOMMU.
> All this thread was about the kernel module in use, i.e. VFIO.
> We are saying that virtio could work in both VFIO modes.

I agree and I am looking at stuff that works, vfio-noiommu is in
linus-next, and soon be in mainline. This series will let non-x86
archs to use virtio pmd driver in first place and could take benefits
of for example vhost, ovs-dpdk-vhost-user etc..

> Furthermore restricting virtio to no-iommu mode doesn't bring
> any improvement.

We're not __restricting__, as soon as virtio+iommu gets working state,
we'll simply replace __noiommu with default. Then its upto user to try
out virtio with vfio default or vfio_noiommu.

> That's why I suggest to keep the initial semantic of kdrv and
> not pollute it with VFIO modes.
>

I am okay to live with default and forget suffix __noiommu but there
are implementation problem which was discussed in other thread
- Virtio pmd driver should avoid interface parsing i.e.
virtio_resource_init_uio/vfio() etc.. For vfio case - We could easily
get rid of by moving /sys parsing to pci_eal layer, Right? If so then
virtio currently works with vfio-noiommu, it make sense to me that
pci_eal layer does parsing for pmd driver before that pmd driver get
initialized.

- Another case could be: iommu-less-pmd-driver. eal layer to do
parsing before updating drv->kdrv.

>> >> >> So at
>> >> >> the initialization (example .. virtio-net) of such pmd driver, pmd
>> >> >> driver should know that vfio-with-noiommu mode enabled or not? for
>> >> >> that pmd driver simply checks drv->kdrv value.
>> >> >
>> >> > If a check is needed, I would prefer using your function
>> >> > pci_vfio_is_noiommu() and remove driver modes from struct rte_kernel_driver.
>> >>
>> >> I don't think calling pci_vfio_no_iommu() inside
>> >> virtio_reg_rd/wr_1/2/3() would be a good idea.
>> >
>> > Why? The value may be cached in the priv properties.
>> >
>> pci_vfio_is_noiommu() parses /sys for
>> - enable_noiommu param
>> - attached driver name is vfio-noiommu or not.
>>
>> It does file operation for that, I meant to say that calling this api
>> within register_rd/wr function is not correct. It would be better if
>> those low level register_rd/wr api only checks driver_types.
>
> Yes, that's why I said the return of pci_vfio_is_noiommu() may be cached
> to keep efficiency.

I am not convinced though, Still find pmd driver checking driver_types
using drv->kdrv is better approach than introducing a new global
variable which may look something like;

At pci_eal layer ----
bool vfio_mode;
vfio_mode = pci_vfio_is_noiommu();

At virtio pmd driver layer ----
Checking value at vfio_mode variable before doing virtio_rd/wr for
vfio interface.

Instead virtio pmd driver doing

virtio_reg_rd/wr_1/2/4()
{
if (drv->kdrv == VFIO)
      do pread()/pwrite()
else
      in()/out()
}

is better approach.

Let me know if you still think former is better than latter then I'll
send patch revision right-away.


More information about the dev mailing list