how to make dpdk processes tolerable to segmantation fault?
Dmitry Kozlyuk
dmitry.kozliuk at gmail.com
Thu Nov 30 17:24:01 CET 2023
2023-11-30 13:45 (UTC+0600), Fuji Nafiul:
> In a normal c program, I saw that the segmentation fault in 1 loosely
> coupled thread doesn't necessarily affect other threads or the main
> program. There, I can check all the threads by process ID of it in every
> certain period of time and if some unexepected segmentation fault occurs or
> got killed I can re run the thread and it works fine. I can later monitor
> the logs and inspect the situation.
>
> But I saw that, segmentation fault or other unexpected error in remotely
> launched (using DPDK) functions on different core affects the whole dpdk
> process and whole dpdk program crashes.. why is that?
>
> Is there any alternative way to handle this scenario ? How can I take
> measures for unexpected future error occurance where I should auto rerun
> dpdk remote processes in live system?
Please consider running the buggy code that causes SIGSEGV
in a separate process rather than a thread.
If it must use DPDK, can it be made an independent app?
DPDK is unlikely to ever support the described scenario.
Continuing to run the process after SIGSEGV is inherently unsafe.
Specifically, DPDK communicates with its lcore threads
using pipes allocated at startup.
If such thread crashed and a SIGSEGV not killing the app was installed,
the communication would hang.
Generally, DPDK employs user-space synchronization primitives,
which cannot recover if one of the threads using them crashes.
More information about the users
mailing list