[dpdk-dev] [PATCH] service: fix stop API to wait for service thread

Harry van Haaren harry.van.haaren at intel.com
Mon Jul 20 14:09:38 CEST 2020


This commit improves the service_lcore_stop() implementation,
waiting for the service core in question to return. The service
thread itself now has a variable to indicate if its thread is
active. When zero the service thread has completed its service,
and has returned from the service_runner_func() function.

This fixes a race condition observed in the DPDK CI, where the
statistics of the service were not consistent with the expectation
due to the service thread still running, and incrementing a stat
after stop was called.

Signed-off-by: Harry van Haaren <harry.van.haaren at intel.com>

---

This is one possible solution, that avoids a class of race-conditions
based on stop() api and following behaviours. Without a change in
implementation of the service core thread, we could not detect when
the thread was actually finished. This is now possible, and the stop
api makes use of it to wait for 1000x one millisecond, or log a warning
that a service core didn't return quickly.

Thanks for the discussion/debug on list - I'm not sure how to add
reported-by/suggested-by etc tags: but I'll resend a V2 (or can add
on apply).

---
 lib/librte_eal/common/rte_service.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/lib/librte_eal/common/rte_service.c b/lib/librte_eal/common/rte_service.c
index 6a0e0ff65..d2255587d 100644
--- a/lib/librte_eal/common/rte_service.c
+++ b/lib/librte_eal/common/rte_service.c
@@ -65,6 +65,7 @@ struct core_state {
 	/* map of services IDs are run on this core */
 	uint64_t service_mask;
 	uint8_t runstate; /* running or stopped */
+	uint8_t thread_active; /* indicates when the thread is in service_run() */
 	uint8_t is_service_core; /* set if core is currently a service core */
 	uint8_t service_active_on_lcore[RTE_SERVICE_NUM_MAX];
 	uint64_t loops;
@@ -457,6 +458,8 @@ service_runner_func(void *arg)
 	const int lcore = rte_lcore_id();
 	struct core_state *cs = &lcore_states[lcore];
 
+	__atomic_store_n(&cs->thread_active, 1, __ATOMIC_RELAXED);
+
 	/* runstate act as the guard variable. Use load-acquire
 	 * memory order here to synchronize with store-release
 	 * in runstate update functions.
@@ -475,6 +478,7 @@ service_runner_func(void *arg)
 		cs->loops++;
 	}
 
+	__atomic_store_n(&cs->thread_active, 0, __ATOMIC_RELAXED);
 	return 0;
 }
 
@@ -765,6 +769,26 @@ rte_service_lcore_stop(uint32_t lcore)
 	__atomic_store_n(&lcore_states[lcore].runstate, RUNSTATE_STOPPED,
 		__ATOMIC_RELEASE);
 
+	/* wait for service lcore to return */
+	i = 0;
+	uint8_t active;
+	uint64_t start = rte_rdtsc();
+	do {
+		active = __atomic_load_n(&lcore_states[lcore].thread_active,
+					 __ATOMIC_RELAXED);
+		if (active == 0)
+			break;
+		rte_delay_ms(1);
+		i++;
+	} while (i < 1000);
+
+	if (active != 0) {
+		uint64_t end = rte_rdtsc();
+		RTE_LOG(WARNING, EAL,
+			"service lcore stop() failed, waited for %ld cycles\n",
+			end - start);
+	}
+
 	return 0;
 }
 
-- 
2.17.1



More information about the dev mailing list