[dpdk-dev] [PATCH v3 2/5] Link Bonding PMD Library (librte_eal/librte_ether link bonding support changes)

Doherty, Declan declan.doherty at intel.com
Mon Jun 16 10:59:25 CEST 2014


> -----Original Message-----
> From: Neil Horman [mailto:nhorman at tuxdriver.com]
> Sent: Friday, June 13, 2014 8:38 PM
> To: Doherty, Declan
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v3 2/5] Link Bonding PMD Library
> (librte_eal/librte_ether link bonding support changes)
> 
> On Fri, Jun 13, 2014 at 06:34:04PM +0000, Doherty, Declan wrote:
> > > -----Original Message-----
> > > From: Neil Horman [mailto:nhorman at tuxdriver.com]
> > > Sent: Friday, June 13, 2014 5:08 PM
> > > To: Doherty, Declan
> > > Cc: dev at dpdk.org
> > > Subject: Re: [dpdk-dev] [PATCH v3 2/5] Link Bonding PMD Library
> > > (librte_eal/librte_ether link bonding support changes)
> > >
> > > On Fri, Jun 13, 2014 at 03:41:59PM +0100, Declan Doherty wrote:
> > > > Updating functionality in EAL to support adding link bonding
> > > > devices via –vdev option. Link bonding devices will be
> > > > initialized after all physical devices have been probed and
> > > > initialized.
> > > >
> > > > Signed-off-by: Declan Doherty <declan.doherty at intel.com>
> > > > ---
> > > >  lib/librte_eal/common/eal_common_dev.c      | 66
> > > +++++++++++++++++++++++++++--
> > > >  lib/librte_eal/common/eal_common_pci.c      |  6 +++
> > > >  lib/librte_eal/common/include/eal_private.h |  7 +++
> > > >  lib/librte_eal/common/include/rte_dev.h     |  1 +
> > > >  lib/librte_ether/rte_ethdev.c               | 34 +++++++++++++--
> > > >  lib/librte_ether/rte_ethdev.h               |  7 ++-
> > > >  lib/librte_pmd_pcap/rte_eth_pcap.c          | 22 +++++-----
> > > >  lib/librte_pmd_ring/rte_eth_ring.c          | 32 +++++++-------
> > > >  lib/librte_pmd_ring/rte_eth_ring.h          |  3 +-
> > > >  lib/librte_pmd_xenvirt/rte_eth_xenvirt.c    |  2 +-
> > > >  10 files changed, 144 insertions(+), 36 deletions(-)
> > > >
> > > > diff --git a/lib/librte_eal/common/eal_common_dev.c
> > > b/lib/librte_eal/common/eal_common_dev.c
> > > > index eae5656..b50c908 100644
> > > > --- a/lib/librte_eal/common/eal_common_dev.c
> > > > +++ b/lib/librte_eal/common/eal_common_dev.c
> > > > @@ -75,14 +75,28 @@ rte_eal_dev_init(void)
> > > >
> > > >  	/* call the init function for each virtual device */
> > > >  	TAILQ_FOREACH(devargs, &devargs_list, next) {
> > > > +		uint8_t bdev = 0;
> > > >
> > > >  		if (devargs->type != RTE_DEVTYPE_VIRTUAL)
> > > >  			continue;
> > > >
> > > >  		TAILQ_FOREACH(driver, &dev_driver_list, next) {
> > > > -			if (driver->type != PMD_VDEV)
> > > > +			/* RTE_DEVTYPE_VIRTUAL can only be a virtual or bonded
> > > device*/
> > > > +			if (driver->type != PMD_VDEV && driver->type !=
> > > PMD_BDEV)
> > > >  				continue;
> > > >
> > > > +			/*
> > > > +			 * Bonded devices are not initialize here, we do it later in
> > > > +			 * rte_eal_bonded_dev_init() after all physical devices
> > > have been
> > > > +			 * probed and initialized
> > > > +			 */
> > > > +			if (driver->type == PMD_BDEV &&
> > > > +					!strncmp(driver->name, devargs-
> > > >virtual.drv_name,
> > > > +							strlen(driver->name))) {
> > > > +				bdev = 1;
> > > > +				break;
> > > > +			}
> > > > +
> > > I really don't think you need to add a new device type for bonded devs.  Its got
> > > no specific hardware that it drives, and you configure it with a --vdev
> command,
> > > so treat it as one here.  I understand that you need to pass additional
> > > information about slaves to a bonded device, which is fine, but you can do that
> > > with kvargs pretty easily, at which point its just another vdev. The only other
> > > requirement is that you initilize the bonded vdev after the slave vdevs have
> > > been created, which you can do by any of several methods (a priority field to
> > > indicate that bonded drivers should be initilized last/later, a deferral return
> > > code from the init routine, or by dead reckoning via the careful construction
> of
> > > the application command line (placed the bonded --vdev option last on the
> > > command line argument list at run time).
> > >
> > It was my initial intent to do as you have describe above, but the physical devices
> > cause a real issue here, physical devices don't call through to
> rte_eth_dev_allocate until
> > during rte_eal_pci_probe call, so it's not possible to initialize the bonded device
> from
> > within rte_eal_dev_init as the physical devices have not been fully initialized at
> this
> > point, as a port_id has not been allocated and can't be added as bonding slaves. I
> don't
> > see away around this without changing the EAL API, which I've tried to avoid with
> this solution.
> >
> > Ordering isn't an issue, and can easily be solved if the above problem didn't
> exist, and although
> > a new device type isn't technically required, I think it's a cleaner solution than
> doing
> > string comparisons.
> >
> 
> 
> Ugh, you're right. That stinks.  We still shouldn't just drop the init code in
> with the pci probe code though, especially if dpdk starts supporting more
> stacked network devices. A notification system for interface creation that you
> could monitor for pci_device addresses (B:D:F tupples) would be nice, but might
> be overkill since dpdk doesn't really support device freeing yet.  what might be
> useful is a generic ordering mechanism for rte_eal_dev_init.  If you added a
> priority field to the rte_driver structure, you could create an enumeration for
> the field to define when the init routine for that driver was called something
> like:
> typdef enum {
> 	PMD_INIT_FIRST = 0,
> 	PMD_INIT_POST_PROBE = 1,
> }
> 
> Then you could make multiple calls to rte_eal_dev_init from rte_eal_init at the
> appropriate places in the initalization process, each time passing one of the
> above values.  rte_eal_dev_init would only initalize those drivers whos priority
> matched the passed in priority.  That would be helpful for interfaces like
> vlans, vxlans, tunnels, etc.
> 
I'm not sure I understand how you intend this to work, are you proposing that 
rte_eal_dev_init is call multiple times by the user pre and post rte_eal_pci_probe
this doesn't seem like an idea solution either. I'm not 100% clear why 
rte_eal_pci_probe is currently called by  the application code and not initiated
 from within rte_eal_dev_init, if this was the case we would be able to figure out 
a dependency tree for initialization of devices, based on the parsed input 
arguments without having extra user input to or multiple calls of rte_eal_dev_init.

Either way I don't think that any of these solution are trivial and they would 
need to be thoroughly tested, which I don't think will be possible in the time
frame for 1.7. If my current implementation calling rte_eal_bonded_dev_init
from within rte_eal_pci_probe is unacceptable, then I propose that I remove 
the EAL changes for --vdev initialization of bonding devices from this release and
only support the C API, and that an overall strategy is discussed for static device
 initialization that is developed for 1.7.1 or 1.8.

> > > >
> > > > +static int
> > > > +rte_eth_dev_name_unique(const char* name)
> > > > +{
> > > > +	unsigned i;
> > > > +
> > > > +	for (i = 0; i < nb_ports; i++) {
> > > > +		if (strcmp(rte_eth_devices[i].data->name, name) == 0)
> > > > +			return -1;
> > > > +	}
> > > > +
> > > > +	return 0;
> > > > +}
> > > > +
> > > >  struct rte_eth_dev *
> > > > -rte_eth_dev_allocate(void)
> > > > +rte_eth_dev_allocate(const char* name)
> > > >  {
> > > >  	struct rte_eth_dev *eth_dev;
> > > >
> > > > @@ -165,23 +179,37 @@ rte_eth_dev_allocate(void)
> > > >  	if (rte_eth_dev_data == NULL)
> > > >  		rte_eth_dev_data_alloc();
> > > >
> > > > +	if (rte_eth_dev_name_unique(name)) {
> > > > +		PMD_DEBUG_TRACE("Ethernet Device with name %s already
> > > allocated!\n");
> > > > +		return NULL;
> > > > +	}
> > > > +
> > > This seems fairly racy if you allow dynamic device creation at run time from
> the
> > > application, if multiple threads attempt to create bonds in parallel.
> >
> > True but if this is an issue then there probably should be some locking around
> this allocation anyway.
> Agreed the allocation of the port id is also racy.
> 
> > >
> > > >  /**
> > > > diff --git a/lib/librte_pmd_pcap/rte_eth_pcap.c
> > > b/lib/librte_pmd_pcap/rte_eth_pcap.c
> > > Hmm, we're modifying other pmds for the naming feature, I think it would be
> best
> > > split out into a separate patch.  Something entitled "support unique interface
> > > naming for virtual pmds" or something.
> >
> > True, but if I do that, I will need to break support for virtual devices in this patch
> set, and
> > I would prefer to keep this functionality working initial release of link bonding.
> Also another
> > patchset will be required at some point in the near future to tidy up/rationalize
> the current
> > handling of virtual devices identification and these changes do not change
> functionally how
> > these devices work.
> 
> I'm not sure I'm following what you're saying here.  If you split this into
> another patch set, and order earlier in this series, then you won't break
> anything.  I'm not suggesting a functional change here, just a organizational
> one, so that this change doesn't do more than the changelog indicates.
> 

Sorry I misunderstood, I thought that you where proposing that I create a completely new 
patchset separate to the link bonding release set for these changes.

> Neil

Declan


More information about the dev mailing list