[dpdk-dev] [EXTERNAL] Re: [PATCH v6 06/10] eal: add thread lifetime management

Dmitry Malloy dmitrym at microsoft.com
Thu Apr 29 23:31:52 CEST 2021


Thread cancellation is a pain point. Emulating it properly is nearly impossible (without hooking into various OS calls which are supposed to be "cancellation points"). I do like the cancellation token idea, but I'm not sure how existing clients, who rely on existing phtread_cancel() semantic, will react to that - it will require a code re-write. 

How about we defer fixing this to another follow-up change? 

-----Original Message-----
From: Dmitry Kozlyuk <dmitry.kozliuk at gmail.com> 
Sent: Thursday, April 29, 2021 1:44 PM
To: Narcisa Ana Maria Vasile <navasile at linux.microsoft.com>
Cc: dev at dpdk.org; thomas <thomas at monjalon.net>; Khoa To <khot at microsoft.com>; Narcisa Ana Maria Vasile <Narcisa.Vasile at microsoft.com>; Dmitry Malloy <dmitrym at microsoft.com>; Tyler Retzlaff <roretzla at microsoft.com>; talshn at nvidia.com; Omar Cardona <ocardona at microsoft.com>; bruce.richardson at intel.com; david.marchand at redhat.com; Kadam, Pallavi <pallavi.kadam at intel.com>
Subject: [EXTERNAL] Re: [PATCH v6 06/10] eal: add thread lifetime management

2021-04-02 18:39 (UTC-0700), Narcisa Ana Maria Vasile:
[...]
> diff --git a/lib/librte_eal/windows/rte_thread.c 
> b/lib/librte_eal/windows/rte_thread.c
> index f61103bbc..86bbd7bc2 100644
> --- a/lib/librte_eal/windows/rte_thread.c
> +++ b/lib/librte_eal/windows/rte_thread.c
> @@ -329,6 +329,131 @@ rte_thread_attr_set_priority(rte_thread_attr_t *thread_attr,
>  	return 0;
>  }
>  
> +int
> +rte_thread_create(rte_thread_t *thread_id,
> +		  const rte_thread_attr_t *thread_attr,
> +		  void *(*thread_func)(void *), void *args) {
> +	int ret = 0;
> +	HANDLE thread_handle = NULL;
> +	GROUP_AFFINITY thread_affinity;
> +
> +	thread_handle = CreateThread(NULL, 0,
> +				(LPTHREAD_START_ROUTINE)(ULONG_PTR)thread_func,

thread_func returns void* (8 bytes), LPTHREAD_START_ROUTING returns DWORD (4 byte), is this cast safe?
It seems to be on x86, can't say for ARM64.

> +				args, 0, thread_id);
> +	if (thread_handle == NULL) {
> +		ret = rte_thread_translate_win32_error();
> +		RTE_LOG_WIN32_ERR("CreateThread()");
> +		goto cleanup;
> +	}

After this point, in case of errors the thread remains running, while the function reports failure. It's better to create the thread suspended, configure its attributes (destroying the thread on failure), then resume (start) the thread.

> +
> +	if (thread_attr != NULL) {
> +		if (CPU_COUNT(&thread_attr->cpuset) > 0) {
> +			ret = rte_convert_cpuset_to_affinity(&thread_attr->cpuset, &thread_affinity);
> +			if (ret != 0) {
> +				RTE_LOG(DEBUG, EAL, "Unable to convert cpuset to thread affinity\n");
> +				goto cleanup;
> +			}
> +
> +			if (!SetThreadGroupAffinity(thread_handle, &thread_affinity, NULL)) {
> +				ret = rte_thread_translate_win32_error();
> +				RTE_LOG_WIN32_ERR("SetThreadGroupAffinity()");
> +				goto cleanup;
> +			}
> +		}
> +		ret = rte_thread_set_priority(*thread_id, thread_attr->priority);
> +		if (ret != 0) {
> +			RTE_LOG(DEBUG, EAL, "Unable to set thread priority\n");
> +			goto cleanup;
> +		}
> +	}
> +
> +	return 0;
> +
> +cleanup:
> +	if (thread_handle != NULL) {
> +		CloseHandle(thread_handle);
> +		thread_handle = NULL;
> +	}
> +	return ret;
> +}

[...]
> +
> +int
> +rte_thread_cancel(rte_thread_t thread_id) {
> +	int ret = 0;
> +	HANDLE thread_handle = NULL;
> +
> +	thread_handle = OpenThread(THREAD_TERMINATE, FALSE, thread_id);
> +	if (thread_handle == NULL) {
> +		ret = rte_thread_translate_win32_error();
> +		RTE_LOG_WIN32_ERR("OpenThread()");
> +		goto cleanup;
> +	}
> +
> +	/*
> +	 * TODO: Behavior is different between POSIX and Windows threads.
> +	 * POSIX threads wait for a cancellation point.
> +	 * Current Windows emulation kills thread at any point.
> +	 */
> +	ret = TerminateThread(thread_handle, 0);

This is not acceptable.
There is a handful of pthread_cancel() usages in DPDK.
The threads they cancel take spinlocks and mutexes, so interrupt at arbitrary point may cause a deadlock.

User		Cancellation points	Notes
----		-------------------	-----
net/ipn3ke	libfdt functions?
net/kni		usleep			Linux-only
raw/ifpga	open/read/write/close
vdpa/ifc	read
vdpa/mlx5	usleep, nanosleep
lib/eventdev	rte_epoll_wait		no pthread CP

Possible solutions:

1. Make specific DPDK functions cancellation points for this API and ensure DPDK users of pthread_cancel() call it.  This way is almost non-invasive, but it's more work in EAL.

2. Divert from pthread style: add cancellation token concept, so that DPDK users can check themselves if they're cancelled. This is invasive, but very explicit and flexible.

> +	if (ret != 0) {
> +		ret = rte_thread_translate_win32_error();
> +		RTE_LOG_WIN32_ERR("TerminateThread()");
> +		goto cleanup;
> +	}
> +
> +cleanup:
> +	if (thread_handle != NULL) {
> +		CloseHandle(thread_handle);
> +		thread_handle = NULL;

This line is not needed.

> +	}
> +	return ret;
> +}
> +
>  int
>  rte_thread_key_create(rte_thread_key *key,
>  		__rte_unused void (*destructor)(void *))




More information about the dev mailing list