[dpdk-dev] Proposal for a big eal / ethdev cleanup

Declan Doherty declan.doherty at intel.com
Mon Jan 18 15:54:32 CET 2016


On 14/01/16 10:38, David Marchand wrote:
> Hello all,
>
> Here is a proposal of a big cleanup in ethdev (cryptodev would have to
> follow) and eal structures.
> This is something I wanted to do for quite some time and the arrival of
> a new bus makes me think we need it.
>


> This is an alternative to what Jan proposed [1].
>
> ABI is most likely broken with this, but I think this discussion can come later.
>
>
> First some context on how dpdk is initialized at the moment :
>
> Let's imagine a system with one ixgbe pci device and take some
> part of ixgbe driver as an example.
>
> static struct eth_driver rte_ixgbe_pmd = {
>          .pci_drv = {
>                  .name = "rte_ixgbe_pmd",
>                  .id_table = pci_id_ixgbe_map,
>                  .drv_flags = RTE_PCI_DRV_NEED_MAPPING |
> RTE_PCI_DRV_INTR_LSC | RTE_PCI_DRV_DETACHABLE,
>          },
>          .eth_dev_init = eth_ixgbe_dev_init,
>          .eth_dev_uninit = eth_ixgbe_dev_uninit,
>          .dev_private_size = sizeof(struct ixgbe_adapter),
> };
>
> static int
> rte_ixgbe_pmd_init(const char *name __rte_unused, const char *params
> __rte_unused)
> {
>          PMD_INIT_FUNC_TRACE();
>          rte_eth_driver_register(&rte_ixgbe_pmd);
>          return 0;
> }
>
> static struct rte_driver rte_ixgbe_driver = {
>          .type = PMD_PDEV,
>          .init = rte_ixgbe_pmd_init,
> };
>
> PMD_REGISTER_DRIVER(rte_ixgbe_driver)
>
>
> DPDK initialisation goes as follows (focusing on ixgbe driver):
>
> PMD_REGISTER_DRIVER(rte_ixgbe_driver) which adds it to dev_driver_list
>
> rte_eal_init()
>   -> rte_eal_pci_init()
>    -> rte_eal_pci_scan() which fills pci_device_list
>
>   -> rte_eal_dev_init()
>    -> for each rte_driver (first vdev, then pdev), call driver->init()
>       so here rte_ixgbe_pmd_init(NULL, NULL)
>     -> rte_eth_driver_register(&rte_ixgbe_pmd);
>      -> fills rte_ixgbe_pmd.pci_drv.devinit = rte_eth_dev_init
>      -> call rte_eal_pci_register() which adds it to pci_driver_list
>
>   -> rte_eal_pci_probe()
>    -> for each rte_pci_device found in rte_eal_pci_scan(), and for all
>       rte_pci_driver registered, call devinit(dr, dev),
>       so here rte_eth_dev_init(dr, dev)
>     -> creates a new eth_dev (which is a rte_eth_dev), then adds
>        reference to passed dev pointer (which is a rte_pci_device), to
>        passed dr pointer (which is a rte_pci_driver cast as a eth_driver)
>     -> call eth_drv->eth_dev_init(eth_dev)
>        so here eth_ixgbe_dev_init(eth_dev)
>      -> fills other parts of eth_dev
>      -> rte_eth_copy_pci_info(eth_dev, pci_dev)
>
>
> By the way, when invoking ethdev init, the pci-specific stuff is only
> handled in functions called from the drivers themselves, which already
> know that they are dealing with pci resources.
>
>
> Later in the life of the application, ethdev api is called for hotplug.
>
> int rte_eth_dev_attach(const char *devargs, uint8_t *port_id);
>
> A devargs is used to identify a vdev/pdev driver and call it to create a
> new rte_eth_dev.
> Current code goes as far as parsing devargs to understand if this is a
> pci device or a vdev.
> This part should be moved to eal since this is where all the "bus" logic
> is.
>
>
>
> So now, what I had in mind is something like this.
> It is far from perfect and I certainly did some shortcuts in my
> reasoning.
>
>
> Generic device/driver
>
> - introduce a rte_device structure,
> - a rte_device has a name, that identifies it in a unique way across
> all buses, maybe something like pci:0000:00:01.0, and for vdev,
> vdev:name
> - a rte_device references a rte_driver,
> - a rte_device references devargs
> - a rte_device embeds a intr_handle
> - rte_device objects are created by "buses"
> - a function to retrieve rte_device objects based on name is added
>
> - current rte_driver does not need to know about the pmd_type
> (pdev/vdev), this is only a way to order drivers init in eal, we could
> use the rte_driver names to order them or maybe remove this ordering
> - init and uninit functions are changed to take a pointer to a
> rte_device
>
> rte_device and rte_driver structures are at the "bus" level.
> Those are the basic structures we will build the other objects on.
>
>
> Impact on PCI device/driver
>
> - rte_pci_device is modified to embed a rte_device (embedding makes it
> possible later to cast the rte_device and get the rte_pci_device in pci
> specific functions)
> - no need for a rte_pci_driver reference in rte_pci_device, since we
> have the rte_device driver
>
> - rte_pci_driver is modified to embed a rte_driver
> - no more devinit and devuninit functions in rte_pci_driver, they can
> be moved as init / uninit functions in rte_driver
>
> - pci scan code creates rte_pci_device objects, associates them to
> rte_pci_driver, fills the driver field of the rte_driver then pass
> them to rte_driver init function.
>
> rte_pci_device and rte_pci_driver are specific implementation of
> rte_device and rte_driver.
> There are there to maintain pci private methods, create upper layer
> objects (ethdev / crypto) etc..
>
>
> Impact on vdev
>
> - introduce a rte_vdev_driver structure
> - a rte_vdev_driver embeds a rte_driver
> - a rte_vdev_driver has a priv size for vdev objects creation
>
> - no need for a vdev device object, this is specific to vdev drivers
>
> - eal init code associates devargs to vdev drivers, creates a
> rte_device object (using the priv size), fills the driver field then
> pass them to rte_driver init function.
>
>
> Impact on ethdev
>
> - a rte_eth_dev object references a rte_device in place of
> rte_pci_device
> - no more information about a driver in rte_eth_dev, this is the
> rte_device that has a reference to its rte_driver
> - eth_driver can be removed, it is only a wrapper of a rte_pci_driver
> at the moment, maybe some init function wrappers can stay in ethdev
> with dev_private_size to be handled in the rte_driver
> - rte_eth_dev objects are created by rte_drivers during probe (pci
> scan, vdev init, hotplug)
> - ethdev ops are populated by rte_drivers
>
>
> Impact on hotplug
>
> - hotplug moves from ethdev to eal
> - a notification framework is added to ethdev when hotplugging
>
> - eal uses the name (remember the pci:0000:00:01.0 / vdev:name
> example) in devargs to identify the right bus (pci / vdev)
> - pci scan code is reused to create a rte_pci_device object etc...
> - vdev init code is reused
>
>
> We end up with something like this.
> An arrow means that the structure contains a pointer to an object of the
> other struct.
> I only wrote the fields I mentioned in this mail, for pci device a lot
> of other fields are omitted.
>
> - for ethdev on top of pci devices
>
>                 +------------------+ +-------------------------------+
>                 |                  | |                               |
>                 | rte_pci_device   | | rte_pci_driver                |
>                 |                  | |                               |
> +-------------+ | +--------------+ | | +---------------------------+ |
> |             | | |              | | | |                           | |
> | rte_eth_dev +---> rte_device   +-----> rte_driver                | |
> |             | | |  char name[] | | | |  char name[]              | |
> +-------------+ | |              | | | |  int init(rte_device *)   | |
>                 | +--------------+ | | |  int uninit(rte_device *) | |
>                 |                  | | |                           | |
>                 +------------------+ | +---------------------------+ |
>                                      |                               |
>                                      +-------------------------------+
>
> - for ethdev on top of vdev devices
>
>                 +------------------+ +-------------------------------+
>                 |                  | |                               |
>                 | drv specific     | | rte_vdev_driver               |
>                 |                  | |                               |
> +-------------+ | +--------------+ | | +---------------------------+ |
> |             | | |              | | | |                           | |
> | rte_eth_dev +---> rte_device   +-----> rte_driver                | |
> |             | | |  char name[] | | | |  char name[]              | |
> +-------------+ | |              | | | |  int init(rte_device *)   | |
>                 | +--------------+ | | |  int uninit(rte_device *) | |
>                 |                  | | |                           | |
>                 +------------------+ | +---------------------------+ |
>                                      |                               |
>                                      |   int priv_size               |
>                                      |                               |
>                                      +-------------------------------+
>
>
> Thanks for reading.
> Comments ?
>
>

Hey David,

form the work we done on the cryptodev I definitely agree this is 
something we need to look at. There is some duplication of code between 
ethdev and cryptodev which would disappear with a common rte_device 
abstraction. I just haven't had time to write anything down and didn't 
want to further complicate the cryptodev patches by introducing EAL 
changes with it as well.

In your proposal above, having an bus type enumeration in the rte_device 
which specifies the bus type might be simpler than having to parse a 
specific name formatting.

Moving the hot-plugging infrastructure to the EAL would be great as this 
is an element we didn't address in the cryptodev layer as currently we 
would need to re-implement a lot of the same logic which is done in 
lib_ethdev. Again this was something on our list of things to look at.

One thing that we are working on at the moment and it will affect your 
proposed solution above quite a lot is the ability to support devices 
with share-able buses in DPDK, we will have a RFC for this proposal in 
the next few days but I'll give a quick overview now. The Quick Assist 
device which we currently support a symmetric crypto cryptodev PMD for 
in DPDK is a mufti-function device, in that it supports symmetric and 
asymmetric crypto and compression functionality. These features are 
supported from a single device instance through different hardware 
queues. We want to provide each service through as a separate dpdk 
device but with and underlying shared bus (in our case PCI), this way a 
device/queue will only provide 1 service type. What we are going to 
propose is to allow a rte_pci_device to be shared my multiple pdevs, to 
do this we would register multiple drivers against the same pci_id and 
with a small modification to the EAL rte_eal_pci_probe_all()/ 
rte_eal_pci_probe() functions we create a device instance for each 
matched driver as long as the diver has a share-able flag.

So with the current DPDK architecture where the rte_cryptodev (also the 
same for the rte_eth_dev)  has a pointer from the rte_pci_driver to the 
rte_pci_device and a back pointer from the rte_pci_device to the 
rte_pci_driver, we are proposing to remove the back pointer from 
rte_pci_device and then allow multiple rte_pci_driver. This will require 
an API change to the PCI device uninit function. We'll outline all of 
this in the RFC, but I just wanted to make you aware of it, as something 
to keep in mind. Our solution will logically look something a bit like 
below.



+-----------------------+  +------------------------+
|      qat_sym_pmd      |  |      qat_asym_pmd      |
+-----------------------+  +------------------------+
             |                          |
             V                          V
+-----------------------+  +------------------------+
|    rte_cryptodev      |  |      rte_cryptodev     |
+-----------------------+  +------------------------+
             |                           |
             V                           V
    +----------------+           +----------------+
    | rte_pci_driver |           | rte_pci_driver |
    +----------------+           +----------------+
         ^     |                      |     ^
         |     ------------------------     |
         /                |                 /
         /                V                 /
         |        +----------------+        |
         ---------| rte_pci_device |---------
                  +----------------+


I guess the main change from your proposal to allow similar 
functionality would be that dependency chain is reversed with the
rte_driver having the reference to the rte_device which could be shared 
by multiple drivers.

                 +----------------------------+   +----------------+
                 |                            |   |                |
                 | drv specific               |   | bus specific   |
                 |                            |   |                |
+-------------+ |+--------------------------+|   |+--------------+|
|             | ||                          ||   ||              ||
| rte_eth_dev +--> rte_driver               +--+--> rte_device   ||
|             | ||  char name[]             || | ||  char name[] ||
+-------------+ ||  int init(rte_device *)  || | ||  bool shared ||
                 ||  int uninit(rte_device *)|| | ||  mutex lock  ||
                 ||                          || | ||              ||
                 |+--------------------------+| | |+--------------+|
                 |                            | | +----------------+
                 +----------------------------+ |
                                                |
                                                |
                 +----------------------------+ |
                 |                            | |
                 | drv specific               | |
                 |                            | |
+-------------+ |+--------------------------+| |
|             | ||                          || |
| rte_eth_dev +--> rte_driver               +--+
|             | || char name[]              ||
+-------------+ || int init(rte_device *)   ||
                 || int uninit(rte_device *) ||
                 ||                          ||
                 |+--------------------------+|
                 |                            |
                 +----------------------------+


More information about the dev mailing list