[dpdk-dev] Windows DPDK real-time priority threads causing thread starvation

John Alexander John.Alexander at datapath.co.uk
Wed Dec 9 17:08:07 CET 2020


Hi,

I tend to run with a winbdg kernel debugger (KDNET) connected to my debug target machines.  It quite often reports deadlock detection when we have such "real-time" threads never yielding on a core.  If we hog core-0 in particular dwm.exe never gets a look in so the desktop stops being drawn too.

John.

> -----Original Message-----
> From: dev <dev-bounces at dpdk.org> On Behalf Of Tal Shnaiderman
> Sent: 09 December 2020 14:16
> To: Dmitry Kozlyuk <dmitry.kozliuk at gmail.com>; Dmitry Malloy
> (MESHCHANINOV) <dmitrym at microsoft.com>; Narcisa Ana Maria Vasile
> <Narcisa.Vasile at microsoft.com>
> Cc: Eilon Greenstein <eilong at nvidia.com>; Omar Cardona
> <ocardona at microsoft.com>; Rani Sharoni <ranish at nvidia.com>; Odi Assli
> <odia at nvidia.com>; Harini Ramakrishnan
> <Harini.Ramakrishnan at microsoft.com>; NBU-Contact-Thomas Monjalon
> <thomas at monjalon.net>; dev at dpdk.org
> Subject: [dpdk-dev] Windows DPDK real-time priority threads causing thread
> starvation
> 
> CAUTION: This email originated from outside of the organization. Do not click
> links or open attachments unless you recognize the sender and know the
> content is safe.
> 
> Hi,
> 
> During our verification tests on Windows DPDK we've noticed that DPDK
> polling threads, which run in REALTIME_PRIORITY_CLASS are causing
> starvation to other threads from the OS which need to change affinity and
> run in lower priority.
> 
> While running an application for a while we see the OS thread waits for 2:30
> minutes and raises a bugcheck, see below example of such flow:
> 
> 1) DPDK thread running on core-0 in real-time high priority(24) polling mode.
> 2) The thread is blocking the system function NtSetSystemInformation
> (ExpUpdateTimerConfiguration) in another thread from
>    switching to core-0 via KeSetSystemGroupAffinityThread since the calling
> thread is priority 15.
> 3) NtSetSystemInformation exclusively acquired system-wide lock
> (ExpTimeRefreshLock) hence
>     it blocks other threads (e.g. calling NtQuerySystemInformation).
> 
> We've seen this behavior only while running on Windows 2019 VMs, maybe
> on native machines OS scheduling of such flow is done differently?
> 
> Below is usage explanation from the documentation of SetPriorityClass [1]:
> 
> - REALTIME_PRIORITY_CLASS
> Process that has the highest possible priority. The threads of the process
> preempt the threads of all other processes, including operating system
> processes performing important tasks. For example, a real-time process that
> executes for more than a very brief interval can cause disk caches not to
> flush or cause the mouse to be unresponsive.
> 
> So I assume using this kind of thread for a long period as we do can cause
> unstable behavior.
> 
> How do you think we can resolve this? Are there such cases in Linux?
> 
> [1] - https://docs.microsoft.com/en-
> us/windows/win32/api/processthreadsapi/nf-processthreadsapi-
> setpriorityclass
> 
> Thanks,
> 
> Tal.


More information about the dev mailing list