[dpdk-dev] [RFC PATCH 0/6] Restructure EAL device model for bus support
David Marchand
david.marchand at 6wind.com
Sun Nov 20 16:30:06 CET 2016
On Thu, Nov 17, 2016 at 6:29 AM, Shreyansh Jain <shreyansh.jain at nxp.com> wrote:
> DPDK has been inherently a PCI inclined framework. Because of this, the
> design of device tree (or list) within DPDK is also PCI inclined. A non-PCI
> device doesn't have a way of being expressed without using hooks started from
> EAL to PMD.
>
> With this cover letter, some patches are presented which try to break this
> strict linkage of EAL with PCI devices. Aim is to generalize the device
> hierarchy on the lines of how Linux handles it:
>
> device A1
> |
> +==.===='==============.============+ Bus A.
> | `--> driver A11 \
> device A2 `-> driver A12 \______
> |CPU |
> /`````
> device A1 /
> | /
> +==.===='==============.============+ Bus A`
> | `--> driver B11
> device A2 `-> driver B12
>
> Simply put:
> - a bus is connect to CPU (or core)
> - devices are conneted to Bus
> - drivers are running instances which manage one or more devices
> - bus is responsible for identifying devices (and interrupt propogation)
> - driver is responsible for initializing the device
>
> (*Reusing text from email [1])
> In context of DPDK EAL:
> - a generic bus (not a driver, not a device). I don't know how to categorize
> a bus. It is certainly not a device, and then handler for a bus (physical)
> can be considered a 'bus driver'. So, just 'rte_bus'.
> - there is a bus for each physical implementation (or virtual). So, a rte_bus
> Object for PCI, VDEV, ABC, DEF and so on.
> - Buses are registered just like a PMD - RTE_PMD_BUS_REGISTER()
> - Each registered bus is part of a doubly list.
> -- Each device inherits rte_bus
> -- Each driver inherits rte_bus
> -- Device and Drivers lists are part of rte_bus
> - eth_driver is no more required - it was just a holder for PMDs to register
> themselves. It can be replaced with rte_xxx_driver and corresponding init/
> uninit moved to rte_driver
> - rte_eth_dev modified to disassociate itself from rte_pci_device and connect
> to generic rte_device
>
> Once again, improvising from [1]:
>
> __ rte_bus_list
> /
> +----------'---+
> |rte_bus |
> | driver_list------> List of rte_bus specific
> | device_list---- devices
> | scan | `-> List of rte_bus associated
> | match | drivers
> | dump |
> | ..some refcnt| (#)
> +--|------|----+
> _________/ \_________
> +--------/----+ +-\---------------+
> |rte_device | |rte_driver |
> | rte_bus | | rte_bus |
> | rte_driver |(#) | init |
> | | | uninit |
> | devargs | | dev_private_size|
> +---||--------+ | drv_flags |(#)
> || | intr_handle(2*) |(#)
> | \ +----------\\\----+
> | \_____________ \\\
> | \ |||
> +------|---------+ +----|----------+ |||
> |rte_pci_device | |rte_xxx_device | (4*) |||
> | PCI specific | | xxx device | |||
> | info (mem,) | | specific fns | / | \
> +----------------+ +---------------+ / | \
> _____________________/ / \
> / ___/ \
> +-------------'--+ +------------'---+ +--'------------+
> |rte_pci_driver | |rte_vdev_driver | |rte_xxx_driver |
> | PCI id table, | | <probably, | | .... |
> | other driver | | nothing> | +---------------+
> | data | +----------------+
> +----------------+
>
> (1*) Problem is that probe functions have different arguments. So,
> generalizing them might be some rework in the respective device
> layers
> (2*) Interrupt handling for each driver type might be different. I am not
> sure how to generalize that either. This is grey area for me.
> (3*) Probably exposing a bitmask for device capabilities. Nothing similar
> exists now to relate it. Don't know if that is useful. Allowing
> applications to question a device about what it supports and what it
> doesn't - making it more flexible at application layer (but more code
> as well.)
> (4*) Even vdev would be an instantiated as a device. It is not being done
> currently.
> (#) Items which are still pending
>
> With this cover letter, some patches have been posted. These are _only_ for
> discussion purpose. They are not complete and neither compilable.
> All the patches, except 0001, have sufficient context about what the changes
> are and rationale for same. Obviously, code is best answer.
As said by Jan B., I too think that splitting the patches into smaller
patchsets is important.
Looking at your datamodel, some quick comments :
- Why init/uninit in rte_driver ? Why not probe/remove ?
- I don't think dev_private_size makes sense in rte_driver. The bus is
responsible for creating rte_device objects (embedded or not in
rte_$bus_device objects). If a driver needs to allocate some priv (for
whatever needs), the driver should do the allocation itself (or maybe
a helper for current eth_driver), then reference the original
rte_$bus_device object it got from the probe method.
For a first patchset, I would see:
- introduce the rte_bus object. In rte_eal_init, for each bus, we call
the scan method. Then, for each bus, we find the appropriate
rte_driver using the bus match method then call the probe method. If
the probe succeeds, the rte_device points to the associated
rte_driver,
- migrate the pci scan code to a pci bus (scan looks at sysfs for
linux / ioctl for bsd + devargs for blacklist / whitelist ?), match is
the same at what is done in rte_eal_pci_probe_one_driver() at the
moment,
- migrate the vdev init code to a vdev bus (scan looks at devargs):
this is new, we must create rte_device objects for vdev drivers to use
later
Then we can talk about the next steps once the bus is in place.
--
David Marchand
More information about the dev
mailing list