[dpdk-dev] [PATCH] service: fix stop API to wait for service thread
Lukasz Wojciechowski
l.wojciechow at partner.samsung.com
Mon Jul 20 14:51:56 CEST 2020
W dniu 20.07.2020 o 14:09, Harry van Haaren pisze:
> This commit improves the service_lcore_stop() implementation,
> waiting for the service core in question to return. The service
> thread itself now has a variable to indicate if its thread is
> active. When zero the service thread has completed its service,
> and has returned from the service_runner_func() function.
>
> This fixes a race condition observed in the DPDK CI, where the
> statistics of the service were not consistent with the expectation
> due to the service thread still running, and incrementing a stat
> after stop was called.
>
> Signed-off-by: Harry van Haaren <harry.van.haaren at intel.com>
>
> ---
>
> This is one possible solution, that avoids a class of race-conditions
> based on stop() api and following behaviours. Without a change in
> implementation of the service core thread, we could not detect when
> the thread was actually finished. This is now possible, and the stop
> api makes use of it to wait for 1000x one millisecond, or log a warning
> that a service core didn't return quickly.
>
> Thanks for the discussion/debug on list - I'm not sure how to add
> reported-by/suggested-by etc tags: but I'll resend a V2 (or can add
> on apply).
>
> ---
> lib/librte_eal/common/rte_service.c | 24 ++++++++++++++++++++++++
> 1 file changed, 24 insertions(+)
>
> diff --git a/lib/librte_eal/common/rte_service.c b/lib/librte_eal/common/rte_service.c
> index 6a0e0ff65..d2255587d 100644
> --- a/lib/librte_eal/common/rte_service.c
> +++ b/lib/librte_eal/common/rte_service.c
> @@ -65,6 +65,7 @@ struct core_state {
> /* map of services IDs are run on this core */
> uint64_t service_mask;
> uint8_t runstate; /* running or stopped */
> + uint8_t thread_active; /* indicates when the thread is in service_run() */
> uint8_t is_service_core; /* set if core is currently a service core */
> uint8_t service_active_on_lcore[RTE_SERVICE_NUM_MAX];
> uint64_t loops;
> @@ -457,6 +458,8 @@ service_runner_func(void *arg)
> const int lcore = rte_lcore_id();
> struct core_state *cs = &lcore_states[lcore];
>
> + __atomic_store_n(&cs->thread_active, 1, __ATOMIC_RELAXED);
> +
> /* runstate act as the guard variable. Use load-acquire
> * memory order here to synchronize with store-release
> * in runstate update functions.
> @@ -475,6 +478,7 @@ service_runner_func(void *arg)
> cs->loops++;
> }
>
> + __atomic_store_n(&cs->thread_active, 0, __ATOMIC_RELAXED);
> return 0;
> }
>
> @@ -765,6 +769,26 @@ rte_service_lcore_stop(uint32_t lcore)
> __atomic_store_n(&lcore_states[lcore].runstate, RUNSTATE_STOPPED,
> __ATOMIC_RELEASE);
>
> + /* wait for service lcore to return */
> + i = 0;
> + uint8_t active;
> + uint64_t start = rte_rdtsc();
> + do {
> + active = __atomic_load_n(&lcore_states[lcore].thread_active,
> + __ATOMIC_RELAXED);
> + if (active == 0)
> + break;
> + rte_delay_ms(1);
> + i++;
> + } while (i < 1000);
> +
> + if (active != 0) {
> + uint64_t end = rte_rdtsc();
> + RTE_LOG(WARNING, EAL,
> + "service lcore stop() failed, waited for %ld cycles\n",
> + end - start);
> + }
> +
> return 0;
> }
>
I don't like the idea of inserting this polling loop inside API call.
And I don't like setting up a 1000 iterations constraint.
How about keeping the thread_active flag, but moving checking state of
this flag to separate function. This way the user of the API would be
able to write own loop.
Maybe he/she would like a custom loop, because:
* waiting for more cores
* would like to wait longer
* would like to check if service is finished less often...
--
Lukasz Wojciechowski
Principal Software Engineer
Samsung R&D Institute Poland
Samsung Electronics
Office +48 22 377 88 25
l.wojciechow at partner.samsung.com
More information about the dev
mailing list