[dpdk-dev] [PATCH v7] eal: fix race in ctrl thread creation

Honnappa Nagarahalli Honnappa.Nagarahalli at arm.com
Wed Apr 7 19:15:07 CEST 2021


<snip>

> Subject: [PATCH v7] eal: fix race in ctrl thread creation
> 
> The creation of control threads uses a pthread barrier for synchronization.
> This patch fixes a race condition where the pthread barrier could get
> destroyed while one of the threads has not yet returned from the
> pthread_barrier_wait function, which could result in undefined behaviour.
> 
> Fixes: 3a0d465d4c53 ("eal: fix use-after-free on control thread creation")
> Cc: jianfeng.tan at intel.com
> Cc: stable at dpdk.org
> 
> Signed-off-by: Luc Pelletier <lucp.at.work at gmail.com>
Overall, this LGTM.

Just wondering if this should be split into 2 commits.
1) Fix the race condition with refcnt.
2) Remove the use of pthread_cancel

I have some observations below.
> ---
> 
> Hi Olivier,
> 
> Olivier, thanks for pointing out that pthread_barrier_wait acts as a full
> memory barrier; I didn't know that was explicitly specified in the
"full memory barrier" just says that the memory operations prior and after the barrier will not be mixed. It does not guarantee that the modification done by one core are 'visible' to other cores. Additional code is required to ensure that the modifications are visible. For ex: in the following code setting start_routine=NULL is not guaranteed to be visible to other cores if pthread_barrier_wait is just a full barrier.
The particular wording used in the reference is "synchronize memory with respect to other threads". I am not exactly sure what is means, but may be safe to assume it means the updates are 'visible' to other cores (as the locking APIs are part of the list and they guarantee the visibility aspects).

May be we can simplify this during 21.11 release as we will have the ability to make more invasive changes.

> documentation. I've changed it to use a regular read/write.
> 
> Hi Honnappa,
> 
> I have not changed the code to use pthread_attr_setaffinity_np because the
> attr parameter is const. I could make a copy of the provided attributes and
> use that instead, but I think this would still violate the spirit of the API and,
> AFAICT, there's no official mechanism for copying a pthread_attr_t.
Ok, thank you.

> 
> Please let me know what you think.
> 
>  lib/librte_eal/common/eal_common_thread.c | 63 +++++++++++++----------
>  1 file changed, 35 insertions(+), 28 deletions(-)
> 
> diff --git a/lib/librte_eal/common/eal_common_thread.c
> b/lib/librte_eal/common/eal_common_thread.c
> index 73a055902..d4e09f84b 100644
> --- a/lib/librte_eal/common/eal_common_thread.c
> +++ b/lib/librte_eal/common/eal_common_thread.c
> @@ -170,25 +170,34 @@ struct rte_thread_ctrl_params {
>  	void *(*start_routine)(void *);
>  	void *arg;
>  	pthread_barrier_t configured;
> +	unsigned int refcnt;
>  };
> 
> +static void ctrl_params_free(struct rte_thread_ctrl_params *params) {
> +	if (__atomic_sub_fetch(&params->refcnt, 1, __ATOMIC_ACQ_REL) ==
> 0) {
> +		pthread_barrier_destroy(&params->configured);
> +		free(params);
> +	}
> +}
> +
>  static void *ctrl_thread_init(void *arg)  {
> -	int ret;
>  	struct internal_config *internal_conf =
>  		eal_get_internal_configuration();
>  	rte_cpuset_t *cpuset = &internal_conf->ctrl_cpuset;
>  	struct rte_thread_ctrl_params *params = arg;
> -	void *(*start_routine)(void *) = params->start_routine;
> +	void *(*start_routine)(void *);
>  	void *routine_arg = params->arg;
> 
>  	__rte_thread_init(rte_lcore_id(), cpuset);
> 
> -	ret = pthread_barrier_wait(&params->configured);
> -	if (ret == PTHREAD_BARRIER_SERIAL_THREAD) {
> -		pthread_barrier_destroy(&params->configured);
> -		free(params);
> -	}
> +	pthread_barrier_wait(&params->configured);
> +	start_routine = params->start_routine;
> +	ctrl_params_free(params);
> +
> +	if (start_routine == NULL)
> +		return NULL;
> 
>  	return start_routine(routine_arg);
>  }
> @@ -210,14 +219,15 @@ rte_ctrl_thread_create(pthread_t *thread, const
> char *name,
> 
>  	params->start_routine = start_routine;
>  	params->arg = arg;
> +	params->refcnt = 2;
> 
> -	pthread_barrier_init(&params->configured, NULL, 2);
> +	ret = pthread_barrier_init(&params->configured, NULL, 2);
> +	if (ret != 0)
> +		goto fail_no_barrier;
> 
>  	ret = pthread_create(thread, attr, ctrl_thread_init, (void *)params);
> -	if (ret != 0) {
> -		free(params);
> -		return -ret;
> -	}
> +	if (ret != 0)
> +		goto fail_with_barrier;
> 
>  	if (name != NULL) {
>  		ret = rte_thread_setname(*thread, name); @@ -227,25
> +237,22 @@ rte_ctrl_thread_create(pthread_t *thread, const char *name,
>  	}
> 
>  	ret = pthread_setaffinity_np(*thread, sizeof(*cpuset), cpuset);
> -	if (ret)
> -		goto fail;
> +	if (ret != 0)
> +		params->start_routine = NULL;
> +	pthread_barrier_wait(&params->configured);
> +	ctrl_params_free(params);
> 
> -	ret = pthread_barrier_wait(&params->configured);
> -	if (ret == PTHREAD_BARRIER_SERIAL_THREAD) {
> -		pthread_barrier_destroy(&params->configured);
> -		free(params);
> -	}
> +	if (ret != 0)
A comment here on why the pthread_join will not wait indefinitely will help. Splitting the patch will also help understand the changes better.

> +		pthread_join(*thread, NULL);
> 
> -	return 0;
> +	return -ret;
> +
> +fail_with_barrier:
> +	pthread_barrier_destroy(&params->configured);
> +
> +fail_no_barrier:
> +	free(params);
> 
> -fail:
> -	if (PTHREAD_BARRIER_SERIAL_THREAD ==
> -	    pthread_barrier_wait(&params->configured)) {
> -		pthread_barrier_destroy(&params->configured);
> -		free(params);
> -	}
> -	pthread_cancel(*thread);
> -	pthread_join(*thread, NULL);
>  	return -ret;
>  }
> 
> --
> 2.25.1



More information about the dev mailing list