[dpdk-dev] [dpdk-dev, RFC] drivers: advertise kmod dependencies in pmdinfo

Neil Horman nhorman at tuxdriver.com
Fri Sep 2 15:33:27 CEST 2016


On Fri, Sep 02, 2016 at 09:19:26AM +0000, Trahe, Fiona wrote:
> 
> 
> > -----Original Message-----
> > From: Neil Horman [mailto:nhorman at tuxdriver.com]
> > Sent: Thursday, September 1, 2016 8:16 PM
> > To: Stephen Hemminger <stephen at networkplumber.org>
> > Cc: Trahe, Fiona <fiona.trahe at intel.com>; dev at dpdk.org; Olivier Matz
> > <olivier.matz at 6wind.com>; Thomas Monjalon
> > <thomas.monjalon at 6wind.com>
> > Subject: Re: [dpdk-dev] [dpdk-dev, RFC] drivers: advertise kmod dependencies
> > in pmdinfo
> > 
> > On Thu, Sep 01, 2016 at 10:41:22AM -0700, Stephen Hemminger wrote:
> > > On Thu, 1 Sep 2016 13:35:19 -0400
> > > Neil Horman <nhorman at tuxdriver.com> wrote:
> > >
> > > > On Thu, Sep 01, 2016 at 12:55:27PM +0000, Trahe, Fiona wrote:
> > > > > Hi Neil and Olivier,
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Olivier
> > > > > > Matz
> > > > > > Sent: Wednesday, August 31, 2016 2:40 PM
> > > > > > To: Neil Horman <nhorman at tuxdriver.com>
> > > > > > Cc: dev at dpdk.org; thomas.monjalon at 6wind.com
> > > > > > Subject: Re: [dpdk-dev] [dpdk-dev, RFC] drivers: advertise kmod
> > > > > > dependencies in pmdinfo
> > > > > >
> > > > > > Hi Neil,
> > > > > >
> > > > > > On 08/31/2016 03:27 PM, Neil Horman wrote:
> > > > > > > On Wed, Aug 31, 2016 at 11:21:18AM +0200, Olivier Matz wrote:
> > > > > > >> Hi Neil,
> > > > > > >>
> > > > > > >> On 08/30/2016 03:23 PM, Neil Horman wrote:
> > > > > > >>> On Fri, Aug 26, 2016 at 03:20:46PM +0200, Olivier Matz wrote:
> > > > > > >>>> Add a new macro DRIVER_REGISTER_KMOD_DEP() that allows a
> > > > > > >>>> driver to declare the list of kernel modules required to run properly.
> > > > > > >>>>
> > > > > > >>>> Today, most PCI drivers require uio/vfio.
> > > > > > >>>>
> > > > > > >>>> Signed-off-by: Olivier Matz <olivier.matz at 6wind.com>
> > > > > > >>>>
> > > > > > >>>> ---
> > > > > > >>>> In this RFC, I supposed that all PCI drivers require a the
> > > > > > >>>> loading of a uio/vfio module (except mlx*), this may be wrong.
> > > > > > >>>> Comments are welcome!
> > > > > > >>>>
> > > > > > >>>>
> > > > > > >>>>  buildtools/pmdinfogen/pmdinfogen.c      |  1 +
> > > > > > >>>>  buildtools/pmdinfogen/pmdinfogen.h      |  1 +
> > > > > > >>>>  drivers/crypto/qat/rte_qat_cryptodev.c  |  2 ++
> > > > > > >>>>  drivers/net/bnx2x/bnx2x_ethdev.c        |  4 ++++
> > > > > > >>>>  drivers/net/bnxt/bnxt_ethdev.c          |  2 ++
> > > > > > >>>>  drivers/net/cxgbe/cxgbe_ethdev.c        |  2 ++
> > > > > > >>>>  drivers/net/e1000/em_ethdev.c           |  2 ++
> > > > > > >>>>  drivers/net/e1000/igb_ethdev.c          |  4 ++++
> > > > > > >>>>  drivers/net/ena/ena_ethdev.c            |  2 ++
> > > > > > >>>>  drivers/net/enic/enic_ethdev.c          |  2 ++
> > > > > > >>>>  drivers/net/fm10k/fm10k_ethdev.c        |  2 ++
> > > > > > >>>>  drivers/net/i40e/i40e_ethdev.c          |  2 ++
> > > > > > >>>>  drivers/net/i40e/i40e_ethdev_vf.c       |  2 ++
> > > > > > >>>>  drivers/net/ixgbe/ixgbe_ethdev.c        |  4 ++++
> > > > > > >>>>  drivers/net/mlx4/mlx4.c                 |  2 ++
> > > > > > >>>>  drivers/net/mlx5/mlx5.c                 |  3 +++
> > > > > > >>>>  drivers/net/nfp/nfp_net.c               |  2 ++
> > > > > > >>>>  drivers/net/qede/qede_ethdev.c          |  4 ++++
> > > > > > >>>>  drivers/net/szedata2/rte_eth_szedata2.c |  2 ++
> > > > > > >>>>  drivers/net/thunderx/nicvf_ethdev.c     |  2 ++
> > > > > > >>>>  drivers/net/virtio/virtio_ethdev.c      |  2 ++
> > > > > > >>>>  drivers/net/vmxnet3/vmxnet3_ethdev.c    |  2 ++
> > > > > > >>>>  lib/librte_eal/common/include/rte_dev.h | 14 ++++++++++++++
> > > > > > >>>>  tools/dpdk-pmdinfo.py                   |  5 ++++-
> > > > > > >>>>  24 files changed, 69 insertions(+), 1 deletion(-)
> > > > > > >>>>
> > > > > > >>>
> > > > > > >>> Generally speaking, I like the idea, it makes sense to me in
> > > > > > >>> terms of using pmdinfo to export this information
> > > > > > >>>
> > > > > > >>> That said, This may need to be a set of macros.  By that I
> > > > > > >>> mean (and correct
> > > > > > me
> > > > > > >>> if I'm wrong here), but the relationship between pmd's and
> > > > > > >>> kernel modules
> > > > > > is in
> > > > > > >>> some cases, more complex than a 'requires' or 'depends'
> > > > > > >>> relationship.  That
> > > > > > is
> > > > > > >>> to say, some pmd may need user space hardware access, but
> > > > > > >>> can use either
> > > > > > uio OR
> > > > > > >>> vfio, but doesn't need both, and can continue to function if
> > > > > > >>> only one is available.  Other PMD's may be able to use vfio
> > > > > > >>> or uio, but can still function without either.  And some, as
> > > > > > >>> your patch implements, simply require one or
> > > > > > the
> > > > > > >>> other to function.  As such it seems like you may want a few
> > > > > > >>> macros, in the
> > > > > > form
> > > > > > >>> of:
> > > > > > >>>
> > > > > > >>> DRIVER_REGISTER_KMOD_REQUEST - List of modules to attempt
> > > > > > >>> loading,
> > > > > > ignore any
> > > > > > >>> failures
> > > > > > >>> DRIVER_REGISTER_KMOD_REQUIRE - List of modules required to
> > > > > > >>> be
> > > > > > loaded after
> > > > > > >>> request macro completes, fail if any are not loaded
> > > > > > >>>
> > > > > > >>> Thats just spitballing, mind you, theres probably a better
> > > > > > >>> way to do it, but
> > > > > > the
> > > > > > >>> idea is to list a set of modules you would like to have, and
> > > > > > >>> then create a parsable syntax to describe the modules that
> > > > > > >>> need to be loaded after the
> > > > > > request
> > > > > > >>> is complete so that you can accurately codify the situations
> > > > > > >>> I described
> > > > > > above.
> > > > > > >>
> > > > > > >> Thank you for your feedback.
> > > > > > >> However, I'm not sure I'm perfectly getting what you suggest.
> > > > > > >>
> > > > > > >> Do you think some PMDs could request a kernel module without
> > > > > > >> really requiring it? Do you have an example in mind?
> > > > > > >>
> > > > > > > Yes, thats precisely it.  The most clear example I could think
> > > > > > > of (though I'm not sure if any pmd currently supports this),
> > > > > > > is a pmd that supports both UIO and VFIO communication with
> > > > > > > the kernel.  Such a PMD requires that one of
> > > > > > those
> > > > > > > two modules be loaded, but only one (i.e. both are not
> > > > > > > required), so if only
> > > > > > the
> > > > > > > uio kernel module loads is a success case, likewise if only
> > > > > > > the vfio module loads can be treated as success.  Both loading
> > > > > > > are clearly successful.  Only if neither load do we have a
> > > > > > > failure case.  I'm suggesting that the grammer that your
> > > > > > > exports define should take those cases into account.  Its not always as
> > simple as "I must have the following modules"
> > > > > > >
> > > > > > >> The syntax I've submitted lets you define several lists of
> > > > > > >> modules, so that the user or the script that starts the
> > > > > > >> application can decide which kmod list is better according to the
> > environment.
> > > > > > >>
> > > > > > > If you have a human intervening in the module load process,
> > > > > > > sure, then its
> > > > > > fine.
> > > > > > > But it seems that this particular feature that you're
> > > > > > > implemnting might have automated uses.  That is to say the
> > > > > > > dpdk core library might be interested in parsing this
> > > > > > > particular information to direct module autoloading, and if
> > > > > > > thats desireable then you need to define these lists such that you can
> > codify failure and success conditions.
> > > > > > >
> > > > > > >> For example, most drivers will advertise
> > > > > > >> "uio,igb_uio:uio,uio_pci_generic:vfio,vfio-pci", and the user
> > > > > > >> or script will have to choose between loading:
> > > > > > >> - uio igb_uio
> > > > > > >> - uio uio_pci_generic
> > > > > > >> - vfio vfio-pci
> > > > > > >>
> > > > > > > Oh, I see, so your list is a colon delimited list of module
> > > > > > > load sets, where at least one set must succeed by loading all
> > > > > > > modules in its set, but the failure of any one set isn't fatal to the
> > process?  e.g. a string like this:
> > > > > > >
> > > > > > > uio,igb_uio:vfio,vfio-pci
> > > > > > >
> > > > > > > could be interpreted to mean "I must load (uio AND igb_uio) OR
> > > > > > > (vfio AND vfio-pci).  If the evaluation of that statement
> > > > > > > results in false, then the operation fails, otherwise it succedes.
> > > > > > >
> > > > > > > If thats the case, then, apologies, we're on the same page,
> > > > > > > and this will work just fine.
> > > > > >
> > > > > > Yep, that's the idea.
> > > > > >
> > > > > > Colon and commas are the best separators I've thought about, but
> > > > > > any idea to make the syntax clearer is welcome ;)
> > > > > >
> > > > > > Maybe a syntax like is clearer:
> > > > > >   "(mod1 & mod2)|(mod3 & mod4)" ?
> > > > > > But it would let the user think that more complex expressions
> > > > > > are valid, like "(mod1 & (mod2 | mod3)) | mod4", which is probably
> > overkill.
> > > > > >
> > > > > > Regards,
> > > > > > Olivier
> > > > >
> > > > > This RFC seems like a good idea - and something the Intel QuickAssist PMD
> > could benefit from.
> > > > > However the (mod1 & mod2) can handle the QAT case better in my
> > opinion.
> > > > > i.e.
> > > > > as well as needing one of
> > > > > * uio igb_uio
> > > > > * uio uio_pci_generic
> > > > > * vfio vfio-pci
> > > > > QAT PMD also needs one of (depending on which physical device is
> > > > > plugged)
> > > > >  * qat_dh895xcc
> > > > >  * qat_c62x
> > > > >  * qat_c3xxx
> > > > >
> > > > > So the original syntax would result in a very long list of possible variations.
> > > > > What really reflects the dependencies would be ((uio & igb_uio) |
> > > > > (uio & uio_pci_generic) | (vfio & vfio_pci)) & (qat_dh895xcc |
> > > > > qat_c62x | qat_c3xxx)
> > > > >
> > > > Ah, I didn't consider that hardware specifics might create a use
> > > > case where a pmd must have one or more kernel modules available for
> > > > hw support.  Perhaps it is worthwhile to automate hardware support -
> > > > that is to say, any module loading script should automatically look
> > > > at the pci table exported from a pmd, and, if found, load any
> > > > modules that claim support for that device:vendor tuple?  Though
> > > > that might break in the case of uio, if there are separate driver modules that
> > support native hardware and uio access.
> 
> Actually if the script output was intended to be used to auto-load dependent kmods, 
> then even the above would not suffice for the QAT driver (and presumably for other
> PMDs with specific HW dependencies). i.e. the qat_dhxxxx modules have further dependencies 
> themselves on an intel_qat module, and there are other steps documented in the 
But any dependency chain such as what you describe is covered in the next step
of the chain.  That is to say if the qat pmd has a hardware dependency on
qat_dhxxx (or qat_cxxx, etc), and those modules depend on intel_qat, the pmd
doesn't need to know that, because qat_dhxxx and companions should all list
intel_qat as a dependency that modprobe will resolve when installing the kernel
module.

> guide which must be taken after loading the kmods. 
I'm not sure what you mean by this.  Are you referring to the qat documentation
that comes with the DPDK?  I only see three additional items there to address

1) Removing other modules when using the 01.org kernel modules

2) installation of firmware

3) Binding of the device to user space for VFIO/UIO

All three of these tasks fall outside the scope of what this macro is meant to
do.  We could try to create macros for them to export information for use in a
loading script if you like, but I wouldn't.  All three of the above items fall
in my mind under the category of administrative responsibilities.  That is to
say, they are orthogonoal to defining a module dependency structure, and if
they're arent properly completed, the module dependency chain won't matter
anyway.

> The use-case I'd addressed was for the script to identify and just throw an error where 
> dependent modules are missing. 
> 

That doesn't really add much value then, since missing modules already result in
errors when the PMD tries to initalize.

> I don't see a simple solution, but also don't see a strong need to find one. 
> Documentation and if necessary a driver-specific script seem sufficient to me.
> 
> My conclusion is the RFC is a nice feature for some drivers, but if introduced needs 
> to be optional as it doesn't handle the complexities of all drivers. 
> 

I agree its an optional export. If there are no dependencies, or if the author
wishes to to simply not supply any, thats fine, the results will be in
accordance with that, but I strongly disagree that its optional implies the fact
that we can ignore the complexities of the depedencies that can be exported.

The more I think about it the more I like Stephens idea, possibly with some
macro assistance.  That is to say:

1) Start by loading hardware specific modules, the information for which is
already available.  You can parse the pci table that a pmd exports and match it
with the pci aliases retrieved via modinfo

2) Load a special virt driver if no hardware is found on the system in (1).
special virt drivers might be worth tagging with a VIRT/VFIO/UIO tag export for
pmdinfo

That allows to set asside the complexities of our dependency chain, as we can
assume hardware support modules will codify any real dependencies there, and a
VIRT tag will let us find any modules needed for hardware the is assigned into
our guest.

Neil

Neil

> > >
> > > I ended up writing a script that went the other way.
> > > First look at the hardware and load VFIO if IOMMU is available.
> > > Then look for special driver needed for Xen and HyperV Lastly fallback
> > > to loading igb_uio if no VFIO and PCI device present.
> > >
> > > In other words it is a system not driver issue.
> > >
> > That sounds like a reasonable approach, yes.
> > Neil
> > 
> > >
> 


More information about the dev mailing list