[PATCH] eal: clean runtime directory files on exit
Stephen Hemminger
stephen at networkplumber.org
Fri Mar 6 17:42:59 CET 2026
On Fri, 6 Mar 2026 06:08:22 +0000
Mounika Kalamandala <mounikax.kalamandala at intel.com> wrote:
> Add cleanup of EAL runtime directory artifacts during rte_eal_cleanup().
> This removes runtime files created during initialization and ensures
> that no stale files remain under /var/run/dpdk/<file-prefix> after the
> application terminates.
>
> Update the documentation in multi_proc_support.rst to describe the
> secondary process lifetime requirement when the primary process exits.
>
> Signed-off-by: Mounika Kalamandala <mounikax.kalamandala at intel.com>
> ---
AI has more corrections here. It does raise the point that primary
process does know if secondary is still running. Also the more unfortunate
and common case happens when primary crashes.
Normally eal_clean_runtime_dir() is called on eal_init, not exit.
Also, clean_runtime_dir does a test flock() of the file and will
fail if it is in use, so it may leave things on shutdown.
Overall, this patch is a no from me.
---
**Patch: eal: clean runtime directory files on exit**
## Errors
**1. Runtime directory cleanup while secondary processes may still be running (potential correctness issue, ~60% confidence)**
The code unconditionally calls `eal_clean_runtime_dir()` in the primary process cleanup path. While the new documentation states secondaries "must be terminated" before the primary exits, the code does not enforce or verify this. If a secondary process is still running when the primary calls `rte_eal_cleanup()`, this removes runtime files (the config, hugepage info, socket files) from under the secondary process, which could cause crashes, SIGBUS on hugepage access, or silent corruption. The documentation addition describes the requirement but the code provides no defense.
Consider at minimum logging a warning if secondary processes are detected as still active, or skipping the directory cleanup when secondaries are still connected. A best-effort check via the mp socket or `/proc` scan for processes with the same file-prefix would make this safer.
**2. Ordering: `eal_clean_runtime_dir()` placed after memory detach**
`eal_clean_runtime_dir()` is called after `rte_eal_memory_detach()` and `rte_eal_malloc_heap_cleanup()`. This means the hugepage mapping files are already detached from this process, which is fine for the primary itself. However, the runtime directory also contains the mp socket and config file. Verify that `eal_clean_runtime_dir()` does not rely on any state cleaned up by the preceding two calls. If it only uses `internal_conf->runtime_dir` (a stack/heap string, not hugepage-backed), the ordering is acceptable — but this deserves confirmation since the function wasn't previously called at this point in the sequence.
## Warnings
**1. Documentation says "must be terminated" but the patch creates a new failure mode rather than preventing it**
The commit message says the patch "ensures that no stale files remain," but it also creates a new hazard: a clean `rte_eal_cleanup()` in the primary now destroys the runtime directory that running secondaries depend on. Previously, stale files were a nuisance; now, a primary exiting with secondaries still running actively destroys their runtime state. The commit message and documentation should acknowledge this behavioral change more clearly, since operators who previously relied on secondaries surviving a primary restart will now face a harder failure.
**2. RST indentation suggests this is inside the existing `.. note::` block — verify placement**
The added text is indented with 4 spaces, matching the indentation of the preceding `.. note::` directive content. If intentional, this is fine. But the content describes a general lifecycle requirement, not just a note about device access options. It might be clearer as a standalone paragraph or its own admonition (e.g., `.. warning::`) rather than appended to the note about allow/block options.
**3. Commit body starts with "Add" — consider whether this needs a "why"**
The commit body describes *what* the patch does but not *why*. What problem was observed with stale runtime files? Under what circumstances do they cause issues? A sentence of motivation (e.g., "After unclean shutdown, stale files in /var/run/dpdk prevent re-initialization") would help reviewers and future readers understand the necessity.
## Info
The subject line is 42 characters, well within the 60-character limit. Tag ordering and format are correct. Copyright year update to 2026 is within valid range.
More information about the dev
mailing list