[dpdk-dev] [PATCH v1 1/1] kernel/linux: introduce vfio_pf kernel module

Thomas Monjalon thomas at monjalon.net
Thu Oct 31 18:03:53 CET 2019


We don't get enough attention on this topic.
Let me rephrase the issue and the proposals with more people Cc'ed.

We are talking about SR-IOV VFs in VMs
with a PF managed on the host by DPDK.
The PF driver is either a (1) bifurcated (Mellanox case),
or (2) bound to UIO with igb_uio, or (3) bound to VFIO.

In case 1, the PF is still managed by a kernel driver, so no issue.

In case 2, the PF is managed by UIO.
There is no SR-IOV support in upstream UIO,
but the out-of-tree module igb_uio works.
However we would like to drop this legacy module from DPDK.
Some (most) Linux distributions do not package igb_uio anyway.
The other issue is that igb_uio is using physical addressing,
which is not acceptable with OCTEON TX2 for performance reason.

In case 3, the PF is managed by VFIO. This is the case we want to fix.
VFIO does not allow to create VFs.
The workaround is to create VFs before binding the PF to VFIO.
But since Linux 4.19, VFIO forbids any SR-IOV VF management.
There is a security concern about allowing userspace to manage SR-IOV
VF messages and taking the responsibility for VFs in the guest.

It is desired to allow the system admin deciding the security levels,
by adding a flag in VFIO "let me manage VFs, I know what I am doing".
Reference of "recent" discussion: https://lkml.org/lkml/2018/3/6/855
For now, there is no upstream solution merged.

This patch is proposing a solution using an out-of-tree module.
In this case, the admin will decide explicitly to bind the PF to vfio_pf.
Unfortunately this solution won't work in environments which
forbid any out-of-tree module.
Another concern is that it looks like DPDK-only solution.

We have an issue but we do not want to propose a half-solution
which would harm other projects and users.
So the question is:
Do we accept this patch as a temporary solution?
Or can we get an agreement soon for an upstream kernel solution?

Thanks for reading and giving your (clear) opinion.


06/09/2019 15:27, Jerin Jacob Kollanukkaran:
> From: Thomas Monjalon <thomas at monjalon.net>
> > 06/09/2019 11:12, vattunuru at marvell.com:
> > > From: Vamsi Attunuru <vattunuru at marvell.com>
> > >
> > > The DPDK use case such as VF representer or OVS offload etc would call
> > > for PF and VF PCIe devices to bind vfio-pci module to enable IOMMU
> > > protection.
> > >
> > > In addition to vSwitch use case, unlike, other PCI class of devices,
> > > Network class of PCIe devices would have additional responsibility on
> > > the PF devices such as promiscuous mode support etc.
> > >
> > > The above use cases demand VFIO needs bound to PF and its VF devices.
> > > This is use case is not supported in Linux kernel, due to a security
> > > issue where it is possible to have DoS in case if VF attached to guest
> > > over vfio-pci and netdev kernel driver runs on it and which something
> > > VF representer would like to enable it.
> > >
> > > Since we can not differentiate, the vfio-pci bounded VF devices runs
> > > DPDK application or netdev driver in guest, we can not introduce any
> > > scheme to fix DoS case and therefore not have proper support of this
> > > in the upstream kernel.
> > >
> > > The igb_uio enables such PF and VF binding support for non-iommu
> > > devices to make VF representer or OVS offload run on non-iommu devices
> > > with DoS vulnerability for netdev driver as VF.
> > >
> > > This kernel module, facilitate to enable SRIOV on PF devices,
> > > therefore, to run both PF and VF devices in VFIO mode knowing its
> > > impacts like igb_uio driver functions of non-iommu devices.
> > >
> > > Signed-off-by: Vamsi Attunuru <vattunuru at marvell.com>
> > > Signed-off-by: Jerin Jacob <jerinj at marvell.com>
> > 
> > Sorry I fail to properly understand the explanation above.
> > Please try to split in shorter sentences.
> > 
> > About the request to add an out-of-tree Linux kernel driver, I guess Jerin is well
> > aware that we don't want such anymore.
> 
> Yes. I am aware of it. I don't like the out of tree modules either. But, This case,
> I suggested Vamsi to have out of tree module.
> 
> Let me describe the issue and let us discuss how to tackle the  problem:
> 
> # Linux kernel wont allow VFIO PF to have SRIOV enable.
> 
> Patches and on going discussion are here:
> https://patchwork.kernel.org/patch/10522381/
> https://lwn.net/Articles/748526/
> 
> Based on my understanding the reason for NOT allowing the
> VFIO PF to have SRIOV enable is genuine from kernel point of
> View but not from DPDK point of view.
> 
> Here is the sequence  to describe the problem
> 1) Consider Linux kernel allowed VFIO PCI SRIOV enable
> 2) PF bound to vfio-pci
> 3) using SRIOV infrastructure of vfio-pci  PF driver,
> VFs  are created
> 4) DPDK application bound to PF and VF, No issue here.
> 5) Assume DPDK application bound to PF and VF bound
> To netdev kernel driver. Now, there is a genuine  concern
> From kernel point of view that, DPDK PF can intercept,
> VF mailbox message or so and deny the Kernel request
> Or what if DPDK PF application crashes?
> 
> To avoid the case (5), (3) is not allowed in stock kernel.
> Which makes sense IMO.
> 
> Now, From DPDK PoV, step 5 is valid as we have
> Rte_flow's VF action etc used to enable such case.
> Where, user can program the PF's rte_flow to steer
> Some traffic to VF, where VF can be, DPDK application or
> Linux kernel netdev driver.
> 
> This patch enables the step (3) to enable step (5) from DPDK
> PoV. i.e DPDK needs to allow PF to bind to DPDK with VFs.
> 
> Why this issue now:
> - igb_uio kernel driver is used as enabling step (3)
> See store_max_vfs() kernel/linux/igb_uio/igb_uio.c
>  This is fine for non-iommu device, IOMMU devices
> needs VFIO.
> - We would like support VFIO for IOMMU protection
> And enable step (5) as DPDK supports form the spec level.
> i.e need to fix feature disparity between iommu vs
> non-iommu based devices.
> 
> Note:
> We may not need a  brand new kernel module, we could move
> this logic to igb_uio if maintenance is concern.





More information about the dev mailing list