[dpdk-dev] [PATCH v2 2/6] ethdev: add port ownership

Ananyev, Konstantin konstantin.ananyev at intel.com
Fri Jan 12 01:02:00 CET 2018


Hi Matan,

> 
> Hi Konstantin
> 
> From: Ananyev, Konstantin, Thursday, January 11, 2018 2:40 PM
> > Hi Matan,
> >
> > >
> > > Hi Konstantin
> > >
> > > From: Ananyev, Konstantin, Wednesday, January 10, 2018 3:36 PM
> > > > Hi Matan,
> > > >
> > > > Few comments from me below.
> > > > BTW, do you plan to add ownership mandatory check in control path
> > > > functions that change port configuration?
> > >
> > > No.
> >
> > So it still totally voluntary usage and application nneds to be changed to
> > exploit it?
> > Apart from RTE_FOR_EACH_DEV() change proposed by Gaetan?
> >
> 
> Also RTE_FOR_EACH_DEV() change proposed by Gaetan is not protected because 2 DPDK entities can get the same port while using it.

I am not talking about racing condition here.
Right now even from the same thread - I can call dev_configure()
for the port which I don't own (let say it belongs to failsafe port),
and that would remain, correct?
 
> As I wrote in the log\docs and as discussed a lot in the first version:
> The new synchronization rules are:
> 1. The port allocation and port release synchronization will be
>    managed by ethdev.
> 2. The port usage synchronization will be managed by the port owner.
> 3. The port ownership API synchronization(also with port creation) will be managed by ethdev.
> 5. DPDK entity which want to use a port must take ownership before.
> 
> Ethdev should not protect 2 and 4 according these rules.
> 
> > > > Konstantin
> > > >
> > > > > -----Original Message-----
> > > > > From: Matan Azrad [mailto:matan at mellanox.com]
> > > > > Sent: Sunday, January 7, 2018 9:46 AM
> > > > > To: Thomas Monjalon <thomas at monjalon.net>; Gaetan Rivet
> > > > > <gaetan.rivet at 6wind.com>; Wu, Jingjing <jingjing.wu at intel.com>
> > > > > Cc: dev at dpdk.org; Neil Horman <nhorman at tuxdriver.com>;
> > Richardson,
> > > > > Bruce <bruce.richardson at intel.com>; Ananyev, Konstantin
> > > > > <konstantin.ananyev at intel.com>
> > > > > Subject: [PATCH v2 2/6] ethdev: add port ownership
> > > > >
> > > > > The ownership of a port is implicit in DPDK.
> > > > > Making it explicit is better from the next reasons:
> > > > > 1. It will define well who is in charge of the port usage synchronization.
> > > > > 2. A library could work on top of a port.
> > > > > 3. A port can work on top of another port.
> > > > >
> > > > > Also in the fail-safe case, an issue has been met in testpmd.
> > > > > We need to check that the application is not trying to use a port
> > > > > which is already managed by fail-safe.
> > > > >
> > > > > A port owner is built from owner id(number) and owner name(string)
> > > > > while the owner id must be unique to distinguish between two
> > > > > identical entity instances and the owner name can be any name.
> > > > > The name helps to logically recognize the owner by different DPDK
> > > > > entities and allows easy debug.
> > > > > Each DPDK entity can allocate an owner unique identifier and can
> > > > > use it and its preferred name to owns valid ethdev ports.
> > > > > Each DPDK entity can get any port owner status to decide if it can
> > > > > manage the port or not.
> > > > >
> > > > > The mechanism is synchronized for both the primary process threads
> > > > > and the secondary processes threads to allow secondary process
> > > > > entity to be a port owner.
> > > > >
> > > > > Add a sinchronized ownership mechanism to DPDK Ethernet devices to
> > > > > avoid multiple management of a device by different DPDK entities.
> > > > >
> > > > > The current ethdev internal port management is not affected by
> > > > > this feature.
> > > > >
> > > > > Signed-off-by: Matan Azrad <matan at mellanox.com>
> > > > > ---
> > > > >  doc/guides/prog_guide/poll_mode_drv.rst |  14 ++-
> > > > >  lib/librte_ether/rte_ethdev.c           | 206
> > > > ++++++++++++++++++++++++++++++--
> > > > >  lib/librte_ether/rte_ethdev.h           |  89 ++++++++++++++
> > > > >  lib/librte_ether/rte_ethdev_version.map |  12 ++
> > > > >  4 files changed, 311 insertions(+), 10 deletions(-)
> > > >
> > > >
> > > > >
> > > > >
> > > > > diff --git a/lib/librte_ether/rte_ethdev.c
> > > > > b/lib/librte_ether/rte_ethdev.c index 684e3e8..0e12452 100644
> > > > > --- a/lib/librte_ether/rte_ethdev.c
> > > > > +++ b/lib/librte_ether/rte_ethdev.c
> > > > > @@ -70,7 +70,10 @@
> > > > >
> > > > >  static const char *MZ_RTE_ETH_DEV_DATA = "rte_eth_dev_data";
> > > > > struct rte_eth_dev rte_eth_devices[RTE_MAX_ETHPORTS];
> > > > > +/* ports data array stored in shared memory */
> > > > >  static struct rte_eth_dev_data *rte_eth_dev_data;
> > > > > +/* next owner identifier stored in shared memory */ static
> > > > > +uint16_t *rte_eth_next_owner_id;
> > > > >  static uint8_t eth_dev_last_created_port;
> > > > >
> > > > >  /* spinlock for eth device callbacks */ @@ -82,6 +85,9 @@
> > > > >  /* spinlock for add/remove tx callbacks */  static rte_spinlock_t
> > > > > rte_eth_tx_cb_lock = RTE_SPINLOCK_INITIALIZER;
> > > > >
> > > > > +/* spinlock for eth device ownership management stored in shared
> > > > > +memory */ static rte_spinlock_t *rte_eth_dev_ownership_lock;
> > > > > +
> > > > >  /* store statistics names and its offset in stats structure  */
> > > > > struct rte_eth_xstats_name_off {
> > > > >  	char name[RTE_ETH_XSTATS_NAME_SIZE]; @@ -153,14 +159,18 @@
> > > > enum {  }
> > > > >
> > > > >  static void
> > > > > -rte_eth_dev_data_alloc(void)
> > > > > +rte_eth_dev_share_data_alloc(void)
> > > > >  {
> > > > >  	const unsigned flags = 0;
> > > > >  	const struct rte_memzone *mz;
> > > > > +	const unsigned int data_size = RTE_MAX_ETHPORTS *
> > > > > +						sizeof(*rte_eth_dev_data);
> > > > >
> > > > >  	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
> > > > > +		/* Allocate shared memory for port data and ownership */
> > > > >  		mz = rte_memzone_reserve(MZ_RTE_ETH_DEV_DATA,
> > > > > -				RTE_MAX_ETHPORTS *
> > > > sizeof(*rte_eth_dev_data),
> > > > > +				data_size + sizeof(*rte_eth_next_owner_id)
> > > > +
> > > > > +				sizeof(*rte_eth_dev_ownership_lock),
> > > > >  				rte_socket_id(), flags);
> > > > >  	} else
> > > > >  		mz = rte_memzone_lookup(MZ_RTE_ETH_DEV_DATA);
> > > > > @@ -168,9 +178,17 @@ enum {
> > > > >  		rte_panic("Cannot allocate memzone for ethernet port
> > > > data\n");
> > > > >
> > > > >  	rte_eth_dev_data = mz->addr;
> > > > > -	if (rte_eal_process_type() == RTE_PROC_PRIMARY)
> > > > > -		memset(rte_eth_dev_data, 0,
> > > > > -				RTE_MAX_ETHPORTS *
> > > > sizeof(*rte_eth_dev_data));
> > > > > +	rte_eth_next_owner_id = (uint16_t *)((uintptr_t)mz->addr +
> > > > > +					     data_size);
> > > > > +	rte_eth_dev_ownership_lock = (rte_spinlock_t *)
> > > > > +		((uintptr_t)rte_eth_next_owner_id +
> > > > > +		 sizeof(*rte_eth_next_owner_id));
> > > >
> > > >
> > > > I think that might make  rte_eth_dev_ownership_lock location not 4B
> > > > aligned...
> > >
> > > Where can I find the documentation about it?
> >
> > That's in your code above - data_size and mz_->addr are both at least 4B
> > aligned - rte_eth_dev_ownership_lock = mz->addr + data_size + 2; You can
> > align it manually, but as discussed below it is probably easier to group related
> > fields into the same struct.
> >
> I mean the documentation about the needed alignment for spinlock. Where is it?

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka15414.html
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dht0008a/CJAGCFAF.html

Might be ARM and PPC guys can provide you some more complete/recent docs. 


> 
> > >
> > > > Why just not to put all data that you are trying to allocate as one
> > > > chunck into the same struct:
> > > > static struct {
> > > >         uint16_t next_owner_id;
> > > >         /* spinlock for eth device ownership management stored in
> > > > shared memory */
> > > >         rte_spinlock_t dev_ownership_lock;
> > > >         rte_eth_dev_data *data;
> > > > } rte_eth_dev_data;
> > > > and allocate/use it everywhere?
> > > > That would simplify allocation/management stuff.
> > > >
> > > I don't understand what exactly do you mean. ?
> > > If you mean to group all in one struct like:
> > >
> > > static struct {
> > >         uint16_t next_owner_id;
> > >         rte_spinlock_t dev_ownership_lock;
> > >         rte_eth_dev_data  data[];
> > > } rte_eth_dev_share_data;
> > >
> > > Just to simplify the addresses calculation above,
> >
> > Yep, that's exactly what I meant.
> > As you said it would help with bulk allocation/alignment stuff, plus IMO it is
> > better and easier to group several related global together - Improve code
> > quality, will make it easier to read & maintain in future.
> >
> > > It will change more code in ethdev relative to the old rte_eth_dev_data
> > global array and will be more intrusive.
> > > Stay it as is, focuses the change only here.
> >
> > Yes it would require few more changes, though I think it worth it.
> >
> 
> Ok, Got you and agree.
> 
> > >
> > > I can just move the spinlock memory allocation to be at the beginning of
> > the memzone(to be sure about the alignment).
> > >
> > > > It is good to see that now scanning/updating rte_eth_dev_data[] is
> > > > lock protected, but it might be not very plausible to protect both
> > > > data[] and next_owner_id using the same lock.
> > >
> > > I guess you mean to the owner structure in rte_eth_dev_data[port_id].
> > > The next_owner_id is read by ownership APIs(for owner validation), so it
> > makes sense to use the same lock.
> > > Actually, why not?
> >
> > Well to me next_owner_id and rte_eth_dev_data[] are not directly related.
> > You may create new owner_id but it doesn't mean you would update
> > rte_eth_dev_data[] immediately.
> > And visa-versa - you might just want to update rte_eth_dev_data[].name or
> > .owner_id.
> > It is not very good coding practice to use same lock for non-related data
> > structures.
> >
> I see the relation like next:
> Since the ownership mechanism synchronization is in ethdev responsibility,
> we must protect against user mistakes as much as we can by using the same lock.
> So, if user try to set by invalid owner (exactly the ID which currently is allocated) we can protect on it.

Hmm, not sure why you can't do same checking with different lock or atomic variable?

> 
> > >
> > > > In fact, for next_owner_id, you don't need a lock - just
> > > > rte_atomic_t should be enough.
> > >
> > > I don't think so, it is problematic in next_owner_id wraparound and may
> > complicate the code in other places which read it.
> >
> > IMO it is not that complicated, something like that should work I think.
> >
> > /* init to 0 at startup*/
> > rte_atomic32_t *owner_id;
> >
> > int new_owner_id(void)
> > {
> >     int32_t x;
> >     x = rte_atomic32_add_return(&owner_id, 1);
> >     if (x > UINT16_MAX) {
> >        rte_atomic32_dec(&owner_id);
> >        return -EOVERWLOW;
> >     } else
> >         return x;
> > }
> >
> >
> > > Why not just to keep it simple and using the same lock?
> >
> > Lock is also fine, I just think it better be a separate one - that would protext
> > just next_owner_id.
> > Though if you are going to use uuid here - all that probably not relevant any
> > more.
> >
> 
> I agree about the uuid but still think the same lock should be used for both.

But with uuid you don't need next_owner_id at all, right?
So lock will only be used for rte_eth_dev_data[] fields anyway.

> 
> > >
> > > > Another alternative would be to use 2 locks - one for next_owner_id
> > > > second for actual data[] protection.
> > > >
> > > > Another thing - you'll probably need to grab/release a lock inside
> > > > rte_eth_dev_allocated() too.
> > > > It is a public function used by drivers, so need to be protected too.
> > > >
> > >
> > > Yes, I thought about it, but decided not to use lock in next:
> > > rte_eth_dev_allocated
> > > rte_eth_dev_count
> > > rte_eth_dev_get_name_by_port
> > > rte_eth_dev_get_port_by_name
> > > maybe more...
> >
> > As I can see in patch #3 you protect by lock access to
> > rte_eth_dev_data[].name (which seems like a good  thing).
> > So I think any other public function that access rte_eth_dev_data[].name
> > should be protected by the same lock.
> >
> 
> I don't think so, I can understand to use the ownership lock here(as in port creation) but I don't think it is necessary too.
> What are we exactly protecting here?
> Don't you think it is just timing?(ask in the next moment and you
>  may get another answer) I don't see optional crash.

Not sure what you mean here by timing...
As I understand rte_eth_dev_data[].name unique identifies device and is used
by  port allocation/release/find functions.
As you stated above:
"1. The port allocation and port release synchronization will be  managed by ethdev."
To me it means that ethdev layer has to make sure that all accesses to
rte_eth_dev_data[].name are atomic.
Otherwise what would prevent the situation when one process does
rte_eth_dev_allocate()->snprintf(rte_eth_dev_data[x].name, ...)
while second one does rte_eth_dev_allocated(rte_eth_dev_data[x].name, ...)
?

> 
> > > > > +
> > > > > +	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
> > > > > +		memset(rte_eth_dev_data, 0, data_size);
> > > > > +		*rte_eth_next_owner_id = RTE_ETH_DEV_NO_OWNER + 1;
> > > > > +		rte_spinlock_init(rte_eth_dev_ownership_lock);
> > > > > +	}
> > > > >  }
> > > > >
> > > > >  struct rte_eth_dev *
> > > > > @@ -225,7 +243,7 @@ struct rte_eth_dev *
> > > > >  	}
> > > > >
> > > > >  	if (rte_eth_dev_data == NULL)
> > > > > -		rte_eth_dev_data_alloc();
> > > > > +		rte_eth_dev_share_data_alloc();
> > > > >
> > > > >  	if (rte_eth_dev_allocated(name) != NULL) {
> > > > >  		RTE_PMD_DEBUG_TRACE("Ethernet Device with name %s
> > > > already
> > > > > allocated!\n", @@ -253,7 +271,7 @@ struct rte_eth_dev *
> > > > >  	struct rte_eth_dev *eth_dev;
> > > > >
> > > > >  	if (rte_eth_dev_data == NULL)
> > > > > -		rte_eth_dev_data_alloc();
> > > > > +		rte_eth_dev_share_data_alloc();
> > > > >
> > > > >  	for (i = 0; i < RTE_MAX_ETHPORTS; i++) {
> > > > >  		if (strcmp(rte_eth_dev_data[i].name, name) == 0) @@ -
> > > > 278,8 +296,12
> > > > > @@ struct rte_eth_dev *
> > > > >  	if (eth_dev == NULL)
> > > > >  		return -EINVAL;
> > > > >
> > > > > -	memset(eth_dev->data, 0, sizeof(struct rte_eth_dev_data));
> > > > > +	rte_spinlock_lock(rte_eth_dev_ownership_lock);
> > > > > +
> > > > >  	eth_dev->state = RTE_ETH_DEV_UNUSED;
> > > > > +	memset(eth_dev->data, 0, sizeof(struct rte_eth_dev_data));
> > > > > +
> > > > > +	rte_spinlock_unlock(rte_eth_dev_ownership_lock);
> > > > >  	return 0;
> > > > >  }
> > > > >
> > > > > @@ -294,6 +316,174 @@ struct rte_eth_dev *
> > > > >  		return 1;
> > > > >  }
> > > > >
> > > > > +static int
> > > > > +rte_eth_is_valid_owner_id(uint16_t owner_id) {
> > > > > +	if (owner_id == RTE_ETH_DEV_NO_OWNER ||
> > > > > +	    (*rte_eth_next_owner_id > RTE_ETH_DEV_NO_OWNER &&
> > > > > +	     *rte_eth_next_owner_id <= owner_id)) {
> > > > > +		RTE_LOG(ERR, EAL, "Invalid owner_id=%d.\n", owner_id);
> > > > > +		return 0;
> > > > > +	}
> > > > > +	return 1;
> > > > > +}
> > > > > +
> > > > > +uint16_t
> > > > > +rte_eth_find_next_owned_by(uint16_t port_id, const uint16_t
> > > > owner_id)
> > > > > +{
> > > > > +	while (port_id < RTE_MAX_ETHPORTS &&
> > > > > +	       (rte_eth_devices[port_id].state != RTE_ETH_DEV_ATTACHED ||
> > > > > +	       rte_eth_devices[port_id].data->owner.id != owner_id))
> > > > > +		port_id++;
> > > > > +
> > > > > +	if (port_id >= RTE_MAX_ETHPORTS)
> > > > > +		return RTE_MAX_ETHPORTS;
> > > > > +
> > > > > +	return port_id;
> > > > > +}
> > > > > +
> > > > > +int
> > > > > +rte_eth_dev_owner_new(uint16_t *owner_id) {
> > > > > +	int ret = 0;
> > > > > +
> > > > > +	rte_spinlock_lock(rte_eth_dev_ownership_lock);
> > > > > +
> > > > > +	if (*rte_eth_next_owner_id == RTE_ETH_DEV_NO_OWNER) {
> > > > > +		/* Counter wrap around. */
> > > > > +		RTE_PMD_DEBUG_TRACE("Reached maximum number of
> > > > Ethernet port owners.\n");
> > > > > +		ret = -EUSERS;
> > > > > +	} else {
> > > > > +		*owner_id = (*rte_eth_next_owner_id)++;
> > > > > +	}
> > > > > +
> > > > > +	rte_spinlock_unlock(rte_eth_dev_ownership_lock);
> > > > > +	return ret;
> > > > > +}
> > > > > +
> > > > > +int
> > > > > +rte_eth_dev_owner_set(const uint16_t port_id,
> > > > > +		      const struct rte_eth_dev_owner *owner)
> > > >
> > > > As a nit - if you'll have rte_eth_dev_owner_set(port_id, old_owner,
> > > > new_owner)
> > > > - that might be more plausible for user, and would greatly simplify
> > > > unset()
> > > > part:
> > > > just set(port_id, cur_owner, zero_owner);
> > > >
> > >
> > > How the user should know the old owner?
> >
> > By dev_owner_get() or it might have it stored somewhere already (or
> > constructed on the fly in case of NO_OWNER).
> >
> It complicates the usage.
> What's about creating an internal API  _rte_eth_dev_owner_set(port_id, old_owner,
> new_owner) and using it by the current exposed set\unset APIs?

Sounds good to me.

> 
> > >
> > > > > +{
> > > > > +	struct rte_eth_dev_owner *port_owner;
> > > > > +	int ret = 0;
> > > > > +	int sret;
> > > > > +
> > > > > +	rte_spinlock_lock(rte_eth_dev_ownership_lock);
> > > > > +
> > > > > +	if (!rte_eth_dev_is_valid_port(port_id)) {
> > > > > +		RTE_PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
> > > > > +		ret = -ENODEV;
> > > > > +		goto unlock;
> > > > > +	}
> > > > > +
> > > > > +	if (!rte_eth_is_valid_owner_id(owner->id)) {
> > > > > +		ret = -EINVAL;
> > > > > +		goto unlock;
> > > > > +	}
> > > > > +
> > > > > +	port_owner = &rte_eth_devices[port_id].data->owner;
> > > > > +	if (port_owner->id != RTE_ETH_DEV_NO_OWNER &&
> > > > > +	    port_owner->id != owner->id) {
> > > > > +		RTE_LOG(ERR, EAL,
> > > > > +			"Cannot set owner to port %d already owned by
> > > > %s_%05d.\n",
> > > > > +			port_id, port_owner->name, port_owner->id);
> > > > > +		ret = -EPERM;
> > > > > +		goto unlock;
> > > > > +	}
> > > > > +
> > > > > +	sret = snprintf(port_owner->name,
> > > > RTE_ETH_MAX_OWNER_NAME_LEN, "%s",
> > > > > +			owner->name);
> > > > > +	if (sret < 0 || sret >= RTE_ETH_MAX_OWNER_NAME_LEN) {
> > > >
> > > > Personally, I don't see any reason to fail if description was truncated...
> > > > Another alternative - just use rte_malloc() here to allocate big
> > > > enough buffer to hold the description.
> > > >
> > >
> > > But it is static allocation like in the device name, why to allocate it
> > differently?
> >
> > Static allocation is fine by me - I just said there is probably no need to fail if
> > description provide by use will be truncated in that case.
> > Though if used description is *that* important - rte_malloc() can help here.
> >
> Again, what is the difference between port name and owner name regarding the allocations?

As I understand rte_eth_dev_data[].name unique identifies device and always has to be consistent.
owner.name is not critical for system operation, and I don't see a big deal if it would be truncated.

> The advantage of static allocation:
> 1. Not use protected malloc\free functions in other protected code.

You can call malloc/free before/after grabbing the lock.
But as I said - I am fine with static array here too - I just don't think
truncating user description should cause a failure.  

> 2.  Easier to the user.
> 
> > >
> > > > > +		memset(port_owner->name, 0,
> > > > RTE_ETH_MAX_OWNER_NAME_LEN);
> > > > > +		RTE_LOG(ERR, EAL, "Invalid owner name.\n");
> > > > > +		ret = -EINVAL;
> > > > > +		goto unlock;
> > > > > +	}
> > > > > +
> > > > > +	port_owner->id = owner->id;
> > > > > +	RTE_PMD_DEBUG_TRACE("Port %d owner is %s_%05d.\n", port_id,
> > > > > +			    owner->name, owner->id);
> > > > > +
> > > >
> > > > As another nit - you can avoid all these gotos by restructuring code a bit:
> > > >
> > > > rte_eth_dev_owner_set(const uint16_t port_id, const struct
> > > > rte_eth_dev_owner *owner) {
> > > >     rte_spinlock_lock(...);
> > > >     ret = _eth_dev_owner_set_unlocked(port_id, owner);
> > > >     rte_spinlock_unlock(...);
> > > >     return ret;
> > > > }
> > > >
> > > Don't you like gotos? :)
> >
> > Not really :)
> >
> > > I personally use it only in error\performance scenarios.
> >
> > Same here - prefer to avoid them if possible.
> >
> > > Do you think it worth the effort?
> >
> > IMO - yes, well structured code is much easier to understand and maintain.
> I don't think so in error cases(and performance), It is really clear here, but if you are insisting, I will change it.
> Are you?

Yes, that would be my preference.
Why otherwise I would bother to write all this? :)

> (If the community thinks like you I think "goto" check should be added to checkpatch).

Might be there are pieces of code there goto are really hard to avoid,
and/or using goto would provide some performance benefit or so...
But that case definitely doesn't look like that.
Konstantin



More information about the dev mailing list