[dpdk-dev] [RFC PATCH 0/6] Restructure EAL device model for bus support
shreyansh.jain at nxp.com
Wed Nov 23 10:45:02 CET 2016
I should have replied to this earlier, apologies.
On Sunday 20 November 2016 09:00 PM, David Marchand wrote:
> On Thu, Nov 17, 2016 at 6:29 AM, Shreyansh Jain <shreyansh.jain at nxp.com> wrote:
>> DPDK has been inherently a PCI inclined framework. Because of this, the
>> design of device tree (or list) within DPDK is also PCI inclined. A non-PCI
>> device doesn't have a way of being expressed without using hooks started from
>> EAL to PMD.
>> With this cover letter, some patches are presented which try to break this
>> strict linkage of EAL with PCI devices. Aim is to generalize the device
>> hierarchy on the lines of how Linux handles it:
>> device A1
>> +==.===='==============.============+ Bus A.
>> | `--> driver A11 \
>> device A2 `-> driver A12 \______
>> |CPU |
>> device A1 /
>> | /
>> +==.===='==============.============+ Bus A`
>> | `--> driver B11
>> device A2 `-> driver B12
>> Simply put:
>> - a bus is connect to CPU (or core)
>> - devices are conneted to Bus
>> - drivers are running instances which manage one or more devices
>> - bus is responsible for identifying devices (and interrupt propogation)
>> - driver is responsible for initializing the device
>> (*Reusing text from email )
>> In context of DPDK EAL:
>> - a generic bus (not a driver, not a device). I don't know how to categorize
>> a bus. It is certainly not a device, and then handler for a bus (physical)
>> can be considered a 'bus driver'. So, just 'rte_bus'.
>> - there is a bus for each physical implementation (or virtual). So, a rte_bus
>> Object for PCI, VDEV, ABC, DEF and so on.
>> - Buses are registered just like a PMD - RTE_PMD_BUS_REGISTER()
>> - Each registered bus is part of a doubly list.
>> -- Each device inherits rte_bus
>> -- Each driver inherits rte_bus
>> -- Device and Drivers lists are part of rte_bus
>> - eth_driver is no more required - it was just a holder for PMDs to register
>> themselves. It can be replaced with rte_xxx_driver and corresponding init/
>> uninit moved to rte_driver
>> - rte_eth_dev modified to disassociate itself from rte_pci_device and connect
>> to generic rte_device
>> Once again, improvising from :
>> __ rte_bus_list
>> |rte_bus |
>> | driver_list------> List of rte_bus specific
>> | device_list---- devices
>> | scan | `-> List of rte_bus associated
>> | match | drivers
>> | dump |
>> | ..some refcnt| (#)
>> _________/ \_________
>> +--------/----+ +-\---------------+
>> |rte_device | |rte_driver |
>> | rte_bus | | rte_bus |
>> | rte_driver |(#) | init |
>> | | | uninit |
>> | devargs | | dev_private_size|
>> +---||--------+ | drv_flags |(#)
>> || | intr_handle(2*) |(#)
>> | \ +----------\\\----+
>> | \_____________ \\\
>> | \ |||
>> +------|---------+ +----|----------+ |||
>> |rte_pci_device | |rte_xxx_device | (4*) |||
>> | PCI specific | | xxx device | |||
>> | info (mem,) | | specific fns | / | \
>> +----------------+ +---------------+ / | \
>> _____________________/ / \
>> / ___/ \
>> +-------------'--+ +------------'---+ +--'------------+
>> |rte_pci_driver | |rte_vdev_driver | |rte_xxx_driver |
>> | PCI id table, | | <probably, | | .... |
>> | other driver | | nothing> | +---------------+
>> | data | +----------------+
>> (1*) Problem is that probe functions have different arguments. So,
>> generalizing them might be some rework in the respective device
>> (2*) Interrupt handling for each driver type might be different. I am not
>> sure how to generalize that either. This is grey area for me.
>> (3*) Probably exposing a bitmask for device capabilities. Nothing similar
>> exists now to relate it. Don't know if that is useful. Allowing
>> applications to question a device about what it supports and what it
>> doesn't - making it more flexible at application layer (but more code
>> as well.)
>> (4*) Even vdev would be an instantiated as a device. It is not being done
>> (#) Items which are still pending
>> With this cover letter, some patches have been posted. These are _only_ for
>> discussion purpose. They are not complete and neither compilable.
>> All the patches, except 0001, have sufficient context about what the changes
>> are and rationale for same. Obviously, code is best answer.
> As said by Jan B., I too think that splitting the patches into smaller
> patchsets is important.
> Looking at your datamodel, some quick comments :
> - Why init/uninit in rte_driver ? Why not probe/remove ?
No specific reason. At first I wasn't planning to replace driver->init
with driver->probe (which rte_pci_driver) is doing. But, I will do this
> - I don't think dev_private_size makes sense in rte_driver. The bus is
> responsible for creating rte_device objects (embedded or not in
> rte_$bus_device objects). If a driver needs to allocate some priv (for
> whatever needs), the driver should do the allocation itself (or maybe
> a helper for current eth_driver), then reference the original
> rte_$bus_device object it got from the probe method.
first off, this came as a suggestion on the ML somewhere as far as I
Secondly, your point makes sense. I will move it back.
> For a first patchset, I would see:
> - introduce the rte_bus object. In rte_eal_init, for each bus, we call
> the scan method. Then, for each bus, we find the appropriate
> rte_driver using the bus match method then call the probe method. If
> the probe succeeds, the rte_device points to the associated
> - migrate the pci scan code to a pci bus (scan looks at sysfs for
> linux / ioctl for bsd + devargs for blacklist / whitelist ?), match is
> the same at what is done in rte_eal_pci_probe_one_driver() at the
> - migrate the vdev init code to a vdev bus (scan looks at devargs):
> this is new, we must create rte_device objects for vdev drivers to use
Thanks for outlay - I agree with that.
> Then we can talk about the next steps once the bus is in place.
I will post the new set probably tomorrow or day-after.
More information about the dev