[dpdk-dev] [PATCH v3] net/failsafe: fix calling device during RMV events

Ophir Munk ophirmu at mellanox.com
Fri Oct 6 00:42:08 CEST 2017


This commit prevents control path operations from failing after a sub
device removal.

Following are the failure steps:
1. The physical device is removed due to change in one of PF parameters
(e.g. MTU)
2. The interrupt thread flags the device
3. Within 2 seconds Interrupt thread initializes the actual device removal,
then every 2 seconds it tries to re-sync (plug in) the device. The trials
fail as long as VF parameter mismatches the PF parameter.
4. A control thread initiates a control operation on failsafe which
initiates this operation on the device.
5. A race condition occurs between the control thread and interrupt thread
when accessing the device data structures.

This commit prevents the race condition in step 5. Before this commit if a
device was removed and then a control thread operation was initiated on
failsafe - in some cases failsafe called the sub device operation instead
of avoiding it. Such cases could lead to operations failures.

This commit fixes failsafe criteria to determine when the device is removed
such that it will avoid calling the sub device operations during that time
and will only call them otherwise.

Fixes: a46f8d584eb8 ("net/failsafe: add fail-safe PMD")
Cc: stable at dpdk.org

Signed-off-by: Ophir Munk <ophirmu at mellanox.com>
---
v3:
1. Rebase v2

2. Please ignore checkpatch checks on arguments re-usage - they are confirmed.
	CHECK:MACRO_ARG_REUSE: Macro argument reuse ... possible side-effects?
	#217: FILE: drivers/net/failsafe/failsafe_private.h:241:

3. Add rationales (copy from an email which accompanied v2):

On Monday, September 11, 2017 11:31 AM, Gaetan Rivet wrote:
> 
> Hi Ophir,
> 
> On Sat, Sep 09, 2017 at 07:27:11PM +0000, Ophir Munk wrote:
> > This commit prevents control path operations from failing after a 
> > sub device has informed failsafe it has been removed.
> >
> > Before this commit if a device was removed and then a control path
> 
> Here are the steps if I understood correctly:
> 
> 0. The physical device is removed
> 1. The interrupt thread flags the device 2. A control lcore initiates 
> a control operation 3. The alarm triggers, waking up the eal-intr-thread,
>    initiating the actual device removal.
> 4. Race condition occurs between control lcore and interrupt thread.
> 
> "if a device was removed" is ambiguous I think (are we speaking about 
> the physical port? Is it only flagged? Is it after the removal of the device itself?).
> From the context I gather that you mean the device is flagged to be 
> removed, but it won't be as clear in a few month when we revisit this bug :) .
> 
> Could you please rephrase this so that the whole context of the issue 
> is available?
> 

Done. Commit message was rephrased based on your comments 

> > operations was initiated on failsafe - in some cases failsafe called 
> > the sub device operation instead of avoiding it. Such cases could 
> > lead to operations failures.
> >
> > This commit fixes failsafe criteria to determine when the device is 
> > removed such that it will avoid calling the sub device operations 
> > during that time and will only call them otherwise.
> >
> 
> This commit mitigates the race condition, reducing the probability for 
> it to have an effect. It does not, however, remove this race 
> condition, which is inherent to the DPDK architecture at the moment.
> 
> A proper fix, a more detailed workaround and additional documentation 
> warning users writing applications to mind their threads could be interesting.
> 

The race condition occurs in the last step and may lead to segmentation faults (accessing data structures 
of the same device by 2 threads) The previous steps ("the physical device is removed", etc) were not 
recreated and tested but probably cannot lead to segmentation fault. 

> But let's focus on this patch for the time being.
> 
> > Fixes: a46f8d584eb8 ("net/failsafe: add fail-safe PMD")
> > Cc: stable at dpdk.org
> >
> > Signed-off-by: Ophir Munk <ophirmu at mellanox.com>
> > ---
> >  drivers/net/failsafe/failsafe_ether.c |  1 +
> >  drivers/net/failsafe/failsafe_ops.c   | 52
> +++++++++++++++++++++++++++++------
> >  2 files changed, 45 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/net/failsafe/failsafe_ether.c
> > b/drivers/net/failsafe/failsafe_ether.c
> > index a3a8cce..1def110 100644
> > --- a/drivers/net/failsafe/failsafe_ether.c
> > +++ b/drivers/net/failsafe/failsafe_ether.c
> > @@ -378,6 +378,7 @@
> 
> Could you please generate your patches with the function name in the diff?

Done 

> 
> >  				      i);
> >  				goto err_remove;
> >  			}
> > +			sdev->remove = 0;
> 
> You are adding this here, within failsafe_eth_dev_state_sync, and 
> removing it from the dev_configure ops.
> 
> 10 lines above, the call to dev_configure is done, meaning that the 
> remove flag was resetted at this point.
> 
> Can you explain why you prefer resetting the flag here?
> 
> The position of this flag reset will be dependent upon my subsequent 
> remarks anyway, so hold that thought :) .
> 

The motivation for resetting the "remove" flag within failsafe_eth_dev_state_sync is as follows:
Previously to this patch the "remove" flag was designed to signal the need to remove the sub device. 
Once the sub device was removed and before being reconfigured the "remove" flag was reset. 

After this patch the scope of the "remove" flag was *extended* to indicate the sub device status as 
being "plugged out" by resetting this flag only after a successful call to failsafe_eth_dev_state_sync(). 
The "plug out" status could last a very long time (seconds, minutes, days, weeks, ...).

Previously to this patch failsafe based the "plugged out" status on the sub device state as being below 
ACTIVE however every 2 seconds dev_configure() was called where the sub device was assigned sdev-
>state = DEV_ACTIVE; therefore the sub device state became ACTIVE for some time every 2 seconds. 
This is where the race condition occurred: failsafe considered the sub device as "Plugged in" for some 
time every 2 seconds (based on its ACTIVE state) while it was actually plugged out. 

After this patch the "Plugged out" status is based on the "remove" flag.

> >  		}
> >  	}
> >  	/*
> > diff --git a/drivers/net/failsafe/failsafe_ops.c
> > b/drivers/net/failsafe/failsafe_ops.c
> > index ff9ad15..314d53d 100644
> > --- a/drivers/net/failsafe/failsafe_ops.c
> > +++ b/drivers/net/failsafe/failsafe_ops.c
> > @@ -232,7 +232,6 @@
> >  			dev->data->dev_conf.intr_conf.lsc = 0;
> >  		}
> >  		DEBUG("Configuring sub-device %d", i);
> > -		sdev->remove = 0;
> >  		ret = rte_eth_dev_configure(PORT_ID(sdev),
> >  					dev->data->nb_rx_queues,
> >  					dev->data->nb_tx_queues,
> > @@ -311,6 +310,8 @@
> >  	int ret;
> >
> >  	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
> > +		if (sdev->remove)
> > +			continue;
> >  		DEBUG("Calling rte_eth_dev_set_link_up on sub_device %d",
> i);
> >  		ret = rte_eth_dev_set_link_up(PORT_ID(sdev));
> >  		if (ret) {
> > @@ -330,6 +331,8 @@
> >  	int ret;
> >
> >  	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
> > +		if (sdev->remove)
> > +			continue;
> 
> For this change and all the others:
> 
> I think it might be best to have this check added to fs_find_next directly.
> 
> Most of the call to the iterators are done within dev_ops, so it makes 
> sense I think to have it there.
> 
> But then there'd be an issue with the sub-EAL iterations done on 
> previously- removed ports, as the removed flag is precisely resetted 
> too late. The function failsafe_dev_remove would also need to have a 
> manual iteration upon the sub-devices instead of using the macro.
> 
> I think you can actually reset this flag within fs_dev_remove, instead 
> of the next plug-in, then having this check within fs_find_next 
> *should* not be a problem.
> 

With the new scope of "remove" flag (remaining set to 1 as long as the sub device is "plugged out" 
which may last for a very long time) we cannot reset it in fs_dev_remove which is called every 2 
seconds.

> I think you should break up those changes in two: first move the flag 
> reset to fs_dev_remove instead of fs_dev_configure, then add this 
> check to the iterator.
> 
> This way, a git bisect should allow us to pinpoint more easily any new 
> bug as both changes have the potential to introduce subtle ones.
> 

I suggest defining a new macro 

FOREACH_SUBDEV_ACTIVE(sdev, i, dev)  { ...

that will replace all cases of:

FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
		if (sdev->remove)
			continue;

In order to support the new macro I added a "check_remove" flag to fs_find_next (which is based on 
your idea above: "I think it might be best to have this check added to fs_find_next directly"). 

> >  		DEBUG("Calling rte_eth_dev_set_link_down on sub_device
> %d", i);
> >  		ret = rte_eth_dev_set_link_down(PORT_ID(sdev));
> >  		if (ret) {
> > @@ -517,8 +520,11 @@
> >  	struct sub_device *sdev;
> >  	uint8_t i;
> >
> > -	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE)
> > +	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
> > +		if (sdev->remove)
> > +			continue;
> >  		rte_eth_promiscuous_enable(PORT_ID(sdev));
> > +	}
> >  }
> >
> >  static void
> 
> <snip>
> 
> > --
> > 1.8.3.1
> >
> 
> Thanks,
> --
> Gaetan Rivet
> 6WIND

 drivers/net/failsafe/failsafe_ether.c   |  1 +
 drivers/net/failsafe/failsafe_ops.c     | 31 +++++++++++++++----------------
 drivers/net/failsafe/failsafe_private.h | 26 ++++++++++++++++++++++----
 3 files changed, 38 insertions(+), 20 deletions(-)

diff --git a/drivers/net/failsafe/failsafe_ether.c b/drivers/net/failsafe/failsafe_ether.c
index 0c0748f..42e9808 100644
--- a/drivers/net/failsafe/failsafe_ether.c
+++ b/drivers/net/failsafe/failsafe_ether.c
@@ -389,6 +389,7 @@ failsafe_eth_dev_state_sync(struct rte_eth_dev *dev)
 				      i);
 				goto err_remove;
 			}
+			sdev->remove = 0;
 		}
 	}
 	/*
diff --git a/drivers/net/failsafe/failsafe_ops.c b/drivers/net/failsafe/failsafe_ops.c
index e0f1b0b..b3cac40 100644
--- a/drivers/net/failsafe/failsafe_ops.c
+++ b/drivers/net/failsafe/failsafe_ops.c
@@ -232,7 +232,6 @@ fs_dev_configure(struct rte_eth_dev *dev)
 			dev->data->dev_conf.intr_conf.lsc = 0;
 		}
 		DEBUG("Configuring sub-device %d", i);
-		sdev->remove = 0;
 		ret = rte_eth_dev_configure(PORT_ID(sdev),
 					dev->data->nb_rx_queues,
 					dev->data->nb_tx_queues,
@@ -310,7 +309,7 @@ fs_dev_set_link_up(struct rte_eth_dev *dev)
 	uint8_t i;
 	int ret;
 
-	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
+	FOREACH_SUBDEV_ACTIVE(sdev, i, dev) {
 		DEBUG("Calling rte_eth_dev_set_link_up on sub_device %d", i);
 		ret = rte_eth_dev_set_link_up(PORT_ID(sdev));
 		if (ret) {
@@ -329,7 +328,7 @@ fs_dev_set_link_down(struct rte_eth_dev *dev)
 	uint8_t i;
 	int ret;
 
-	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
+	FOREACH_SUBDEV_ACTIVE(sdev, i, dev) {
 		DEBUG("Calling rte_eth_dev_set_link_down on sub_device %d", i);
 		ret = rte_eth_dev_set_link_down(PORT_ID(sdev));
 		if (ret) {
@@ -517,7 +516,7 @@ fs_promiscuous_enable(struct rte_eth_dev *dev)
 	struct sub_device *sdev;
 	uint8_t i;
 
-	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE)
+	FOREACH_SUBDEV_ACTIVE(sdev, i, dev)
 		rte_eth_promiscuous_enable(PORT_ID(sdev));
 }
 
@@ -527,7 +526,7 @@ fs_promiscuous_disable(struct rte_eth_dev *dev)
 	struct sub_device *sdev;
 	uint8_t i;
 
-	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE)
+	FOREACH_SUBDEV_ACTIVE(sdev, i, dev)
 		rte_eth_promiscuous_disable(PORT_ID(sdev));
 }
 
@@ -537,7 +536,7 @@ fs_allmulticast_enable(struct rte_eth_dev *dev)
 	struct sub_device *sdev;
 	uint8_t i;
 
-	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE)
+	FOREACH_SUBDEV_ACTIVE(sdev, i, dev)
 		rte_eth_allmulticast_enable(PORT_ID(sdev));
 }
 
@@ -547,7 +546,7 @@ fs_allmulticast_disable(struct rte_eth_dev *dev)
 	struct sub_device *sdev;
 	uint8_t i;
 
-	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE)
+	FOREACH_SUBDEV_ACTIVE(sdev, i, dev)
 		rte_eth_allmulticast_disable(PORT_ID(sdev));
 }
 
@@ -559,7 +558,7 @@ fs_link_update(struct rte_eth_dev *dev,
 	uint8_t i;
 	int ret;
 
-	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
+	FOREACH_SUBDEV_ACTIVE(sdev, i, dev) {
 		DEBUG("Calling link_update on sub_device %d", i);
 		ret = (SUBOPS(sdev, link_update))(ETH(sdev), wait_to_complete);
 		if (ret && ret != -1) {
@@ -602,7 +601,7 @@ fs_stats_reset(struct rte_eth_dev *dev)
 	struct sub_device *sdev;
 	uint8_t i;
 
-	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
+	FOREACH_SUBDEV_ACTIVE(sdev, i, dev) {
 		rte_eth_stats_reset(PORT_ID(sdev));
 		memset(&sdev->stats_snapshot, 0, sizeof(struct rte_eth_stats));
 	}
@@ -700,7 +699,7 @@ fs_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
 	uint8_t i;
 	int ret;
 
-	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
+	FOREACH_SUBDEV_ACTIVE(sdev, i, dev) {
 		DEBUG("Calling rte_eth_dev_set_mtu on sub_device %d", i);
 		ret = rte_eth_dev_set_mtu(PORT_ID(sdev), mtu);
 		if (ret) {
@@ -719,7 +718,7 @@ fs_vlan_filter_set(struct rte_eth_dev *dev, uint16_t vlan_id, int on)
 	uint8_t i;
 	int ret;
 
-	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
+	FOREACH_SUBDEV_ACTIVE(sdev, i, dev) {
 		DEBUG("Calling rte_eth_dev_vlan_filter on sub_device %d", i);
 		ret = rte_eth_dev_vlan_filter(PORT_ID(sdev), vlan_id, on);
 		if (ret) {
@@ -753,7 +752,7 @@ fs_flow_ctrl_set(struct rte_eth_dev *dev,
 	uint8_t i;
 	int ret;
 
-	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
+	FOREACH_SUBDEV_ACTIVE(sdev, i, dev) {
 		DEBUG("Calling rte_eth_dev_flow_ctrl_set on sub_device %d", i);
 		ret = rte_eth_dev_flow_ctrl_set(PORT_ID(sdev), fc_conf);
 		if (ret) {
@@ -774,7 +773,7 @@ fs_mac_addr_remove(struct rte_eth_dev *dev, uint32_t index)
 	/* No check: already done within the rte_eth_dev_mac_addr_remove
 	 * call for the fail-safe device.
 	 */
-	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE)
+	FOREACH_SUBDEV_ACTIVE(sdev, i, dev)
 		rte_eth_dev_mac_addr_remove(PORT_ID(sdev),
 				&dev->data->mac_addrs[index]);
 	PRIV(dev)->mac_addr_pool[index] = 0;
@@ -791,7 +790,7 @@ fs_mac_addr_add(struct rte_eth_dev *dev,
 	uint8_t i;
 
 	RTE_ASSERT(index < FAILSAFE_MAX_ETHADDR);
-	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
+	FOREACH_SUBDEV_ACTIVE(sdev, i, dev) {
 		ret = rte_eth_dev_mac_addr_add(PORT_ID(sdev), mac_addr, vmdq);
 		if (ret) {
 			ERROR("Operation rte_eth_dev_mac_addr_add failed for sub_device %"
@@ -813,7 +812,7 @@ fs_mac_addr_set(struct rte_eth_dev *dev, struct ether_addr *mac_addr)
 	struct sub_device *sdev;
 	uint8_t i;
 
-	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE)
+	FOREACH_SUBDEV_ACTIVE(sdev, i, dev)
 		rte_eth_dev_default_mac_addr_set(PORT_ID(sdev), mac_addr);
 }
 
@@ -832,7 +831,7 @@ fs_filter_ctrl(struct rte_eth_dev *dev,
 		*(const void **)arg = &fs_flow_ops;
 		return 0;
 	}
-	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
+	FOREACH_SUBDEV_ACTIVE(sdev, i, dev) {
 		DEBUG("Calling rte_eth_dev_filter_ctrl on sub_device %d", i);
 		ret = rte_eth_dev_filter_ctrl(PORT_ID(sdev), type, op, arg);
 		if (ret) {
diff --git a/drivers/net/failsafe/failsafe_private.h b/drivers/net/failsafe/failsafe_private.h
index d2d92af..03e1f58 100644
--- a/drivers/net/failsafe/failsafe_private.h
+++ b/drivers/net/failsafe/failsafe_private.h
@@ -225,10 +225,23 @@ extern int mac_from_arg;
  * dev:   (struct rte_eth_dev *), fail-safe ethdev
  * state: (enum dev_state), minimum acceptable device state
  */
+
 #define FOREACH_SUBDEV_STATE(s, i, dev, state)		\
-	for (s = fs_find_next((dev), 0, state, &i);	\
+	for (s = fs_find_next((dev), 0, state, 0, &i);	\
 	     s != NULL;					\
-	     s = fs_find_next((dev), i + 1, state, &i))
+	     s = fs_find_next((dev), i + 1, state, 0, &i))
+
+/**
+ * Stateful iterator construct over fail-safe sub-devices
+ * in ACTIVE state and not removed due to RMV event
+ * s:     (struct sub_device *), iterator
+ * i:     (uint8_t), increment
+ * dev:   (struct rte_eth_dev *), fail-safe ethdev
+ */
+#define FOREACH_SUBDEV_ACTIVE(s, i, dev)				\
+	for (s = fs_find_next((dev), 0, DEV_ACTIVE, 1, &i);	\
+	     s != NULL;						\
+	     s = fs_find_next((dev), i + 1, DEV_ACTIVE, 1, &i))
 
 /**
  * Iterator construct over fail-safe sub-devices:
@@ -303,6 +316,7 @@ static inline struct sub_device *
 fs_find_next(struct rte_eth_dev *dev,
 	     uint8_t sid,
 	     enum dev_state min_state,
+		 uint8_t check_remove,
 	     uint8_t *sid_out)
 {
 	struct sub_device *subs;
@@ -311,8 +325,12 @@ fs_find_next(struct rte_eth_dev *dev,
 	subs = PRIV(dev)->subs;
 	tail = PRIV(dev)->subs_tail;
 	while (sid < tail) {
-		if (subs[sid].state >= min_state)
-			break;
+		if (subs[sid].state >= min_state) {
+			if (check_remove == 0)
+				break;
+			if (PRIV(dev)->subs[sid].remove == 0)
+				break;
+		}
 		sid++;
 	}
 	*sid_out = sid;
-- 
2.7.4



More information about the dev mailing list