回复:[PATCH v1 0/8] Support VFIO cdev API in DPDK

Dimon dimon.zhao at nebula-matrix.com
Wed Oct 29 10:50:01 CET 2025


Hello Anatoly,
I tested this patch series and encountered the same error on both Intel E810 and Nebulamatrix NICs, as follows:
I used GDB for tracing and debugging, and found that there might be a slight issue with the code of vfio_group_assign_device function? I won't insist.
(1) vfio_device_create will alloc a vfio_device dev
(2) vfio_group_setup_device_fd will set dev->fd
(3) DEVICE_FOREACH_ACTIVE(cfg, idev) iterates through each idev->fd in cfg to check if it is the same as dev->fd, but at this point idev is actually dev.
 So it will report the error "Device 0000:08:00.0 already assigned to this container".
------------------------------------------------------------------
发件人:Anatoly Burakov <anatoly.burakov at intel.com>
发送时间:2025年10月29日(周三) 00:43
收件人:dev<dev at dpdk.org>
主 题:[PATCH v1 0/8] Support VFIO cdev API in DPDK
This patchset introduces a major refactor of the VFIO subsystem in DPDK to
support character device (cdev) interface introduced in Linux kernel, as well as
make the API more streamlined and useful. The goal is to simplify device
management, improve compatibility, and clarify API responsibilities.
The following sections outline the key issues addressed by this patchset and the
corresponding changes introduced.
1. Only group mode is supported
===============================
Since kernel version 4.14.327 (LTS), VFIO supports the new character device
(cdev)-based way of working with VFIO devices (otherwise known as IOMMUFD). This
is a device-centric mode and does away with all the complexity regarding groups
and IOMMU types, delegating it all to the kernel, and exposes a much simpler
interface to userspace.
The old group interface is still around, and will need to be kept in DPDK both
for compatibility reasons, as well as supporting special cases (FSLMC bus, NBL
driver, etc.).
To enable this, VFIO is heavily refactored, so that the code can support both
modes while relying on (mostly) common infrastructure.
Note that the existing `rte_vfio_device_setup/release` model is fundamentally
incompatible with cdev mode, because for custom container cases, the expected
flow is that the user binds the IOMMU group (and thus, implicitly, the device
itself) to a specific container using `rte_vfio_container_group_bind`, whereas
this step is not needed for cdev as the device fd is assigned to the container
straight away.
Therefore, what we do instead is introduce a new API for container device
assignment which, semantically, will assign a device to specified container, so
that when it is mapped using `rte_pci_map_device`, the appropriate container is
selected. Under the hood though, we essentially transition to getting device fd
straight away at assign stage, so that by the time the PCI bus attempts to map
the device, it is already mapped and we just return an fd.
Additionally, a new `rte_vfio_get_mode` API is added for those cases that need
some introspection into VFIO's internals, with three new modes: group
(old-style), no-iommu (old-style but without IOMMU), and cdev (the new mode).
Although no-IOMMU is technically a variant of group mode, the distinction is
largely irrelevant to the user, as all usages of noiommu checks in our codebase
are for deciding whether to use IOVA or PA, not anything to do with managing
groups. The current plan for kernel community is to *not* introduce no-IOMMU
cdev implementation, which is why this will be kept for compatibility for these
use cases.
As for special cases that rely on group mode, the old group-based API calls are
kept, but will be marked as deprecated, and will only work in group/noiommu
mode. This has little practical consequences, as even users such as NBL or FSLMC
bus do not actually use any of VFIO functionality, they just create a container
and proceed to do their own thing.
2. There is duplication among API's
===================================
Over time, several VFIO API's have been added that perform overlapping
functions:
* `rte_vfio_get_group_fd` does the same thing as `rte_vfio_container_group_bind`
* `rte_vfio_clear_group` does the same thing as rte_vfio_container_group_unbind`
The only difference between them is that for the former API's, the container
selection is implicit (create in default if doesn't exist, delete from any
container). It really only makes sense to keep container versions around, but
because we don't really need any of them any more, all of them will be
deprecated.
3. The API responsibilities aren't clear and bleed into each other
==================================================================
Some API's do multiple things at once. In particular:
* `rte_vfio_get_group_fd` opens a new group if it doesn't exist
* `rte_vfio_container_group_bind/unbind` return group fd
* `rte_vfio_get_device_info` will setup the device
These API's have been adjusted as follows:
* `rte_vfio_get_group_fd` will *not* open any fd's, it will *only* return those
 previously bound to a container by `rte_vfio_container_group_bind`
* `rte_vfio_container_group_bind` will *not* return any fd's (users should call
 `rte_vfio_get_group_fd` to get it)
* `rte_vfio_get_device_info` will *not* set up the device (users should call
 `rte_vfio_container_device_setup` prior to calling this API)
All current users of these API's were adjusted, and grou-related API's were
marked as deprecated.
Feedback and suggestions are welcome, especially from maintainers of drivers
that depend on VFIO infrastructure.
Anatoly Burakov (8):
 uapi: update to v6.17 and add iommufd.h
 vfio: add container device assignment API
 vhost: remove group-related API from drivers
 vfio: do not setup the device on get device info
 vfio: cleanup and refactor
 vfio: introduce cdev mode
 doc: deprecate VFIO group-based APIs
 vfio: deprecate group-based API
 config/arm/meson.build | 1 +
 config/meson.build | 1 +
 doc/guides/rel_notes/deprecation.rst | 26 +
 drivers/bus/cdx/cdx_vfio.c | 13 +-
 drivers/bus/fslmc/fslmc_bus.c | 10 +-
 drivers/bus/fslmc/fslmc_vfio.c | 2 +-
 drivers/bus/pci/linux/pci.c | 2 +-
 drivers/bus/pci/linux/pci_vfio.c | 17 +-
 drivers/crypto/bcmfs/bcmfs_vfio.c | 6 +-
 drivers/net/hinic3/base/hinic3_hwdev.c | 2 +-
 drivers/net/nbl/nbl_common/nbl_userdev.c | 18 +-
 drivers/net/nbl/nbl_include/nbl_include.h | 1 +
 drivers/net/ntnic/ntnic_ethdev.c | 2 +-
 drivers/net/ntnic/ntnic_vfio.c | 30 +-
 drivers/vdpa/ifc/ifcvf_vdpa.c | 34 +-
 drivers/vdpa/mlx5/mlx5_vdpa.c | 1 -
 drivers/vdpa/nfp/nfp_vdpa.c | 37 +-
 drivers/vdpa/sfc/sfc_vdpa.c | 39 +-
 drivers/vdpa/sfc/sfc_vdpa.h | 2 -
 kernel/linux/uapi/linux/iommufd.h | 1292 ++++++++++
 kernel/linux/uapi/linux/vduse.h | 2 +-
 kernel/linux/uapi/linux/vfio.h | 12 +-
 kernel/linux/uapi/version | 2 +-
 lib/eal/freebsd/eal.c | 36 +
 lib/eal/include/rte_vfio.h | 414 +++-
 lib/eal/linux/eal_vfio.c | 2640 +++++++++------------
 lib/eal/linux/eal_vfio.h | 170 +-
 lib/eal/linux/eal_vfio_cdev.c | 387 +++
 lib/eal/linux/eal_vfio_group.c | 981 ++++++++
 lib/eal/linux/eal_vfio_mp_sync.c | 91 +-
 lib/eal/linux/meson.build | 2 +
 lib/vhost/vdpa_driver.h | 3 -
 32 files changed, 4484 insertions(+), 1792 deletions(-)
 create mode 100644 kernel/linux/uapi/linux/iommufd.h
 create mode 100644 lib/eal/linux/eal_vfio_cdev.c
 create mode 100644 lib/eal/linux/eal_vfio_group.c
-- 
2.47.3
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mails.dpdk.org/archives/dev/attachments/20251029/572bca6c/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: temp4cj.png
Type: application/octet-stream
Size: 57368 bytes
Desc: not available
URL: <http://mails.dpdk.org/archives/dev/attachments/20251029/572bca6c/attachment-0005.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: temp4cj.png
Type: application/octet-stream
Size: 25935 bytes
Desc: not available
URL: <http://mails.dpdk.org/archives/dev/attachments/20251029/572bca6c/attachment-0006.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: temp4cj.png
Type: application/octet-stream
Size: 40546 bytes
Desc: not available
URL: <http://mails.dpdk.org/archives/dev/attachments/20251029/572bca6c/attachment-0007.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: temp4cj.png
Type: application/octet-stream
Size: 24221 bytes
Desc: not available
URL: <http://mails.dpdk.org/archives/dev/attachments/20251029/572bca6c/attachment-0008.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: temp4cj.png
Type: application/octet-stream
Size: 60955 bytes
Desc: not available
URL: <http://mails.dpdk.org/archives/dev/attachments/20251029/572bca6c/attachment-0009.obj>


More information about the dev mailing list