[dpdk-dev] [PATCH v6 3/3] net/failsafe: fix hotplug races

Matan Azrad matan at mellanox.com
Mon Feb 12 21:35:52 CET 2018


Hi Gaetan

From: Gaëtan Rivet, Sent: Monday, February 12, 2018 8:33 PM
> Hi Matan,
> 
> On Sun, Feb 11, 2018 at 05:24:32PM +0000, Matan Azrad wrote:
> > Fail-safe uses periodic alarm mechanism, running from the host thread,
> > to manage the hot-plug events of its sub-devices. This management
> > requires a lot of sub-devices PMDs operations (stop,close,start,etc).
> >
> > While the hot-plug alarm runs in the host thread, the application may
> > call fail-safe operations which directly trigger the sub-devices PMDs
> > operations too, This call may occur from any thread decided by the
> > application (probably the master thread).
> >
> > So, more than one operation can execute to a sub-device in same time
> > what can cause a lot of races in the sub-PMDs.
> >
> > Moreover, some control operations update the fail-safe internal
> > databases which can be used by the alarm mechanism in the same time,
> > what also can cause to races and crashes.
> >
> > Fail-safe is the owner of its sub-devices and must to synchronize
> > their use according to the ETHDEV ownership rules.
> >
> > Synchronize hot-plug management by a new lock mechanism uses a mutex
> > to atomically defend each critical section in the fail-safe hot-plug
> > mechanism and control operations to prevent any races between them.
> >
> > Fixes: a46f8d5 ("net/failsafe: add fail-safe PMD")
> > Cc: stable at dpdk.org
> >
> > Signed-off-by: Matan Azrad <matan at mellanox.com>
> > ---
> >  drivers/net/failsafe/Makefile           |   1 +
> >  drivers/net/failsafe/failsafe.c         |  35 ++++++++
> >  drivers/net/failsafe/failsafe_ether.c   |   6 +-
> >  drivers/net/failsafe/failsafe_flow.c    |  20 ++++-
> >  drivers/net/failsafe/failsafe_ops.c     | 148
> ++++++++++++++++++++++++++------
> >  drivers/net/failsafe/failsafe_private.h |  62 +++++++++++--
> >  6 files changed, 239 insertions(+), 33 deletions(-)
> >
> > diff --git a/drivers/net/failsafe/Makefile
> > b/drivers/net/failsafe/Makefile index d1ae899..bd2f019 100644
> > --- a/drivers/net/failsafe/Makefile
> > +++ b/drivers/net/failsafe/Makefile
> > @@ -68,5 +68,6 @@ CFLAGS += -pedantic
> >  LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring  LDLIBS +=
> > -lrte_ethdev -lrte_net -lrte_kvargs  LDLIBS += -lrte_bus_vdev
> > +LDLIBS += -lpthread
> >
> >  include $(RTE_SDK)/mk/rte.lib.mk
> > diff --git a/drivers/net/failsafe/failsafe.c
> > b/drivers/net/failsafe/failsafe.c index 7b2cdbb..c499bfb 100644
> > --- a/drivers/net/failsafe/failsafe.c
> > +++ b/drivers/net/failsafe/failsafe.c
> > @@ -113,17 +113,46 @@
> >  			break;
> >  	/* if we have non-probed device */
> >  	if (i != PRIV(dev)->subs_tail) {
> > +		if (fs_lock(dev, 1) != 0)
> > +			goto reinstall;
> 
> You have left a few operations unlocked further down the call stack.
> With these discrepancies fixed, the RECURSIVE attribute could be removed,
> and this lock as well.
> 
> >  		ret = failsafe_eth_dev_state_sync(dev);
> > +		fs_unlock(dev, 1);
> 
> Compared to the first version of these changes, I much prefer having a
> wrapper for locking. However, I dislike having the arguably unnecessary
> additional argument (alarm_lock).
> 
> I guess you added this for debugging purpose, but in the end either the
> design is simple and clear, and you have a proper model, or you don't, and
> that's an issue.

Not for debug, the debug is by the way,
Actually it is just will be nice to know if the alarm is running in the critical sections and may be used in future.
Actually, following patch "fix reconfiguration" is using it. 

> And having the RECURSIVE attribute "just in case", is not reassuring.

You know, there are a lot of pros and cons to the RECURSIVE usage and I can understand your concern.
Just not to create a bigger patch this time I think it can be changed in the next release.

> >  		if (ret)
> >  			ERROR("Unable to synchronize sub_device state");
> >  	}
> >  	failsafe_dev_remove(dev);
> > +reinstall:
> >  	ret = failsafe_hotplug_alarm_install(dev);
> >  	if (ret)
> >  		ERROR("Unable to set up next alarm");  }
> >
> >  static int
> > +fs_mutex_init(struct fs_priv *priv)
> > +{
> > +	int ret;
> > +	pthread_mutexattr_t attr;
> > +
> > +	ret = pthread_mutexattr_init(&attr);
> > +	if (ret) {
> > +		ERROR("Cannot initiate mutex attributes - %s", strerror(ret));
> > +		return ret;
> > +	}
> > +	/* Allow mutex relocks for the thread holding the mutex. */
> > +	ret = pthread_mutexattr_settype(&attr,
> PTHREAD_MUTEX_RECURSIVE);
> > +	if (ret) {
> 
> Just to emphasize, I think this should be removed.
> 
> Please explain why you thought it was necessary.
> 

To simplify the code:
1. Allow to use less calls to lock operations.
2. Allow to differentiate between alarm time to app time in the shared code(dev_configure(),dev_start()) easily.
3. Allow easily to use try_lock in the alarm thread.
4. Defend from some kinds of deadlock.

Actually the alternative way to use simple lock is more complicated.

> > +		ERROR("Cannot set mutex type - %s", strerror(ret));
> > +		return ret;
> > +	}
> > +	ret = pthread_mutex_init(&priv->hotplug_mutex, &attr);
> > +	if (ret) {
> > +		ERROR("Cannot initiate mutex - %s", strerror(ret));
> > +		return ret;
> > +	}
> > +	return 0;
> > +}
> > +
> > +static int
> >  fs_eth_dev_create(struct rte_vdev_device *vdev)  {
> >  	struct rte_eth_dev *dev;
> > @@ -176,6 +205,9 @@
> >  	ret = failsafe_eal_init(dev);
> >  	if (ret)
> >  		goto free_args;
> > +	ret = fs_mutex_init(priv);
> > +	if (ret)
> > +		goto free_args;
> >  	ret = failsafe_hotplug_alarm_install(dev);
> >  	if (ret) {
> >  		ERROR("Could not set up plug-in event detection"); @@ -
> 250,6 +282,9
> > @@
> >  		ERROR("Error while uninitializing sub-EAL");
> >  	failsafe_args_free(dev);
> >  	fs_sub_device_free(dev);
> > +	ret = pthread_mutex_destroy(&PRIV(dev)->hotplug_mutex);
> > +	if (ret)
> > +		ERROR("Error while destroying hotplug mutex");
> >  	rte_free(PRIV(dev));
> >  	rte_eth_dev_release_port(dev);
> >  	return ret;
> > diff --git a/drivers/net/failsafe/failsafe_ether.c
> > b/drivers/net/failsafe/failsafe_ether.c
> > index d820faf..8672819 100644
> > --- a/drivers/net/failsafe/failsafe_ether.c
> > +++ b/drivers/net/failsafe/failsafe_ether.c
> 
> Locking fs_eth_dev_conf_apply should allow to remove the lock in
> fs_hotplug_alarm, as long as we make sure only public rte_ether API is used
> in fs_eth_dev_conf_apply and its callee.

No, all the state updates(failsafe state and the sub-devices states) in fs_hotplug_alarm stuck should be defended by lock (or any other atomic mechanism). 

> > @@ -328,8 +328,11 @@
> >
> >  	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE)
> >  		if (sdev->remove && fs_rxtx_clean(sdev)) {
> > +			if (fs_lock(dev, 1) != 0)
> > +				return;
> >  			fs_dev_stats_save(sdev);
> >  			fs_dev_remove(sdev);
> > +			fs_unlock(dev, 1);
> >  		}
> >  }
> >
> > @@ -428,7 +431,7 @@
> >  				void *cb_arg, void *out __rte_unused)  {
> >  	struct sub_device *sdev = cb_arg;
> > -
> 
> This line should remain.

Sure.
 
> > +	fs_lock(sdev->fs_dev, 0);
> >  	/* Switch as soon as possible tx_dev. */
> >  	fs_switch_dev(sdev->fs_dev, sdev);
> >  	/* Use safe bursts in any case. */
> > @@ -438,6 +441,7 @@
> >  	 * the callback at the source of the current thread context.
> >  	 */
> >  	sdev->remove = 1;
> > +	fs_unlock(sdev->fs_dev, 0);
> >  	return 0;
> >  }
> >
> 
> <snip>
> 
> > diff --git a/drivers/net/failsafe/failsafe_private.h
> > b/drivers/net/failsafe/failsafe_private.h
> > index f3be152..ef1f63b 100644
> > --- a/drivers/net/failsafe/failsafe_private.h
> > +++ b/drivers/net/failsafe/failsafe_private.h
> > @@ -7,6 +7,7 @@
> >  #define _RTE_ETH_FAILSAFE_PRIVATE_H_
> >
> >  #include <sys/queue.h>
> > +#include <pthread.h>
> >
> >  #include <rte_atomic.h>
> >  #include <rte_dev.h>
> > @@ -161,6 +162,9 @@ struct fs_priv {
> >  	 * appropriate failsafe Rx queue.
> >  	 */
> >  	struct rx_proxy rxp;
> > +	pthread_mutex_t hotplug_mutex;
> > +	/* Hot-plug mutex is locked by the alarm mechanism. */
> > +	volatile unsigned int alarm_lock:1;
> 
> Without the RECURSIVE attribute, I believe this becomes unnecessary.

I explained the potential usage above.

> 
> >  	unsigned int pending_alarm:1; /* An alarm is pending */
> >  	/* flow isolation state */
> >  	int flow_isolated:1;
> > @@ -255,12 +259,6 @@ int failsafe_eth_lsc_event_callback(uint16_t
> port_id,
> >  	     s != NULL;					\
> >  	     s = fs_find_next((dev), i + 1, state, &i))
> >
> > -/**
> > - * Iterator construct over fail-safe sub-devices:
> > - * s:   (struct sub_device *), iterator
> > - * i:   (uint8_t), increment
> > - * dev: (struct rte_eth_dev *), fail-safe ethdev
> > - */
> 
> Editing mistake I think here.

Sure.

> >  #define FOREACH_SUBDEV(s, i, dev)			\
> >  	FOREACH_SUBDEV_STATE(s, i, dev, DEV_UNDEFINED)
> >
> > @@ -347,6 +345,58 @@ int failsafe_eth_lsc_event_callback(uint16_t
> > port_id,  }
> >
> >  /*
> > + * Lock hot-plug mutex.
> > + * is_alarm means that the caller is, for sure, the hot-plug alarm
> mechanism.
> > + */
> > +static inline int
> > +fs_lock(struct rte_eth_dev *dev, unsigned int is_alarm)
> 
> The "is_alarm" should be removed without RECURSIVE.

Not sure.
 
> > +{
> > +	int ret;
> > +
> > +	if (is_alarm) {
> > +		ret = pthread_mutex_trylock(&PRIV(dev)->hotplug_mutex);
> > +		if (ret) {
> > +			DEBUG("Hot-plug mutex lock trying failed(%s), will
> try"
> > +			      " again later...", strerror(ret));
> > +			return ret;
> > +		}
> > +		PRIV(dev)->alarm_lock = 1;
> > +	} else {
> > +		ret = pthread_mutex_lock(&PRIV(dev)->hotplug_mutex);
> > +		if (ret) {
> > +			ERROR("Cannot lock mutex(%s)", strerror(ret));
> > +			return ret;
> > +		}
> > +	}
> > +	DEBUG("Hot-plug mutex was locked by thread %lu%s",
> pthread_self(),
> > +	      PRIV(dev)->alarm_lock ? " by the hot-plug alarm" : "");
> > +	return ret;
> > +}
> > +
> > +/*
> > + * Unlock hot-plug mutex.
> > + * is_alarm means that the caller is, for sure, the hot-plug alarm
> mechanism.
> > + */
> > +static inline void
> > +fs_unlock(struct rte_eth_dev *dev, unsigned int is_alarm)
> 
> ditto

Ditto

> > +{
> > +	int ret;
> > +	unsigned int prev_alarm_lock = PRIV(dev)->alarm_lock;
> > +
> > +	if (is_alarm) {
> > +		RTE_ASSERT(PRIV(dev)->alarm_lock == 1);
> > +		PRIV(dev)->alarm_lock = 0;
> > +	}
> > +	ret = pthread_mutex_unlock(&PRIV(dev)->hotplug_mutex);
> > +	if (ret)
> > +		ERROR("Cannot unlock hot-plug mutex(%s)", strerror(ret));
> > +	else
> > +		DEBUG("Hot-plug mutex was unlocked by thread %lu%s",
> > +		      pthread_self(),
> > +		      prev_alarm_lock ? " by the hot-plug alarm" : ""); }
> 
> I know that using a RECURSIVE lock allows you having an implementation
> quicker.

Yes.
 
> So this choice of implementation is only done due to the impending release,
> not because it is the right one. I think it should work, and I heard that it was
> heavily tested internally.

It is right but can be implemented in other way that have another pros and cons.

> So I guess this patch can go in with the few other nits (removed blank line,
> removed macro doc), as long as it is reworked soon after.

Sure will send V7 for it.

> On this matter, I do not think that blindly testing implementations that all
> either copied each other or weren't too complicated does the trick regarding
> concurrency issues.
> 
> You were thinking about an example app for your ownership library, in order
> to validate its implementation. I think this could work nicely as a torture
> instrument for this patchset as well, with some care.

Can be, Yes.

Thanks.
> Regards,
> --
> Gaëtan Rivet
> 6WIND


More information about the dev mailing list