[dpdk-dev] [RFC] service: stop lcore threads before 'finalize'

David Marchand david.marchand at redhat.com
Mon Feb 10 15:42:21 CET 2020


On Mon, Feb 10, 2020 at 3:16 PM Van Haaren, Harry
<harry.van.haaren at intel.com> wrote:
> I haven't easily reproduced this yet - so I'll investigate a way to
> reproduce with close to 100% rate, then we can identify the root cause
> and actually get a clean fix. If you have pointers to reproduce easily,
> please let me know.

- In shell #1:

$ git reset --hard v20.02-rc2
HEAD is now at 2636c2a23 version: 20.02-rc2
$ rm -rf build

$ git diff
diff --git a/app/test/meson.build b/app/test/meson.build
index 3675ffb5c..23c00a618 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -400,7 +400,7 @@ timeout_seconds = 600
 timeout_seconds_fast = 10

 get_coremask = find_program('get-coremask.sh')
-num_cores_arg = '-l ' + run_command(get_coremask).stdout().strip()
+num_cores_arg = '-l 0,1'

 test_args = [num_cores_arg]
 foreach arg : fast_test_names

$ meson --werror --buildtype=debugoptimized build
The Meson build system
Version: 0.47.2
Source dir: /home/dmarchan/dpdk
Build dir: /home/dmarchan/dpdk/build
Build type: native build
Program cat found: YES (/usr/bin/cat)
Project name: DPDK
Project version: 20.02.0-rc2
...

$ ninja-build -C build
ninja: Entering directory `build'
[2081/2081] Linking target app/test/dpdk-test.

$ taskset -pc 1 $$
pid 11143's current affinity list: 0-7
pid 11143's new affinity list: 1

$ while true; do true; done


- Now, in shell #2, as root:

# taskset -pc 0,1 $$
pid 22233's current affinity list: 0-7
pid 22233's new affinity list: 0,1

# meson test --gdb  --repeat=10000 service_autotest
...

 + ------------------------------------------------------- +
 + Test Suite Summary
 + Tests Total :       16
 + Tests Skipped :      3
 + Tests Executed :    16
 + Tests Unsupported:   0
 + Tests Passed :      13
 + Tests Failed :       0
 + ------------------------------------------------------- +

Test OK
RTE>>
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff4922700 (LWP 31194)]
rte_service_runner_func (arg=<optimized out>) at
../lib/librte_eal/common/rte_service.c:453
453            cs->loops++;
A debugging session is active.

    Inferior 1 [process 31187] will be killed.

Quit anyway? (y or n)


I get the crash in like 30s, often less.
In my test right now, I got the crash on the 3rd try.



-- 
David Marchand



More information about the dev mailing list