how to make dpdk processes tolerable to segmantation fault?
Stephen Hemminger
stephen at networkplumber.org
Thu Nov 30 22:19:46 CET 2023
On Thu, 30 Nov 2023 19:24:01 +0300
Dmitry Kozlyuk <dmitry.kozliuk at gmail.com> wrote:
> 2023-11-30 13:45 (UTC+0600), Fuji Nafiul:
> > In a normal c program, I saw that the segmentation fault in 1 loosely
> > coupled thread doesn't necessarily affect other threads or the main
> > program. There, I can check all the threads by process ID of it in every
> > certain period of time and if some unexepected segmentation fault occurs or
> > got killed I can re run the thread and it works fine. I can later monitor
> > the logs and inspect the situation.
> >
> > But I saw that, segmentation fault or other unexpected error in remotely
> > launched (using DPDK) functions on different core affects the whole dpdk
> > process and whole dpdk program crashes.. why is that?
> >
> > Is there any alternative way to handle this scenario ? How can I take
> > measures for unexpected future error occurance where I should auto rerun
> > dpdk remote processes in live system?
>
> Please consider running the buggy code that causes SIGSEGV
> in a separate process rather than a thread.
> If it must use DPDK, can it be made an independent app?
>
> DPDK is unlikely to ever support the described scenario.
> Continuing to run the process after SIGSEGV is inherently unsafe.
> Specifically, DPDK communicates with its lcore threads
> using pipes allocated at startup.
> If such thread crashed and a SIGSEGV not killing the app was installed,
> the communication would hang.
> Generally, DPDK employs user-space synchronization primitives,
> which cannot recover if one of the threads using them crashes.
A couple of things you can do.
- run your DPDK application as a systemd service which will be restarted
when you crash.
- catch SIGSEGV in the application an print a backtrace, then abort.
More information about the users
mailing list