|WARNING| [v3,5/5] eal: avoid deadlock in async IPC alarm callback

dpdklab at iol.unh.edu dpdklab at iol.unh.edu
Thu Jun 4 23:55:45 CEST 2026


Test-Label: ai-code-review
Test-Status: WARNING
https://dpdk.org/patch/164998

_AI Code Review_

=== Patch Review: series.patch (via Claude) ===
Review date: 2026-06-04

# DPDK Patch Review

## PATCH 1/5: eal: fix wrong log message in async IPC request

**No issues found.**

---

## PATCH 2/5: eal: fix async IPC callback not fired when no peers

### Errors

**Resource leak on error path (rte_eal_alarm_set failure, secondary process path)**

Lines 1198-1204: When `rte_eal_alarm_set()` fails after inserting dummy into the queue, the code removes dummy from the queue and sets `dummy_used = false`, but does not set `ret = -1` before the error path checks. The function will return 0 (success) at line 1209 while the alarm setup failed, and dummy will be freed at line 1212-1213 even though the operation should be considered failed.

```c
/* Current code - wrong */
if (rte_eal_alarm_set(1, async_reply_handle, dummy) < 0) {
	EAL_LOG(ERR, "Fail to set alarm for dummy request");
	TAILQ_REMOVE(&pending_requests.requests, dummy, next);
	dummy_used = false;
	ret = -1;  /* This line IS present - my mistake, see correction below */
}
```

*Correction: The code does set `ret = -1`. The actual issue is that when `rte_eal_alarm_set()` fails in the primary process path (line 1251), there is no rollback and no error return set.*

**Missing error handling (rte_eal_alarm_set failure, primary process path)**

Line 1251: When `rte_eal_alarm_set()` fails after inserting dummy into the queue in the primary process path, the code only logs an error but does not:
- Remove dummy from the queue
- Set `dummy_used = false` 
- Set `ret = -1`

The function will return 0 (success) while the alarm did not get set, causing the callback to never fire.

Suggested fix:
```c
/* Line ~1250 */
if (rte_eal_alarm_set(1, async_reply_handle,
		(void *)(uintptr_t)dummy->id) < 0) {
	EAL_LOG(ERR, "Fail to set alarm for dummy request");
	TAILQ_REMOVE(&pending_requests.requests, dummy, next);
	dummy_used = false;
	ret = -1;
}
```

---

## PATCH 3/5: eal: fix memory leak in async IPC secondary path

**No issues found.**

---

## PATCH 4/5: eal: fix async IPC resource leaks on partial failure

### Errors

**Resource leak on alarm setup failure**

Lines 916-922: When `rte_eal_alarm_set()` fails, the function returns -1 but the allocated `id` is already reserved (`++next_request_id` at line 916) and an alarm was attempted to be set for it. The `pending_req` and `reply_msg` allocated later (lines 925-926) will be freed by the fail path, but if a stale alarm callback fires for this `id` in the future (unlikely but possible if alarm setup partially succeeded), `find_async_request_by_id()` will not find it and the callback will be a noop. This is actually safe given the new ID-based design, so this is not a bug.

*Withdrawn - this is safe by design.*

### Warnings

**Misleading comment**

Lines 1272-1276: The comment says "Stale alarms will fire and harmlessly find nothing via ID-based lookup," but this is only true after patch 5/5 removes the `rte_eal_alarm_cancel()` call. In this patch alone, the alarm cancel is still attempted in `async_reply_handle_thread_unsafe()` (line 552), so stale alarms could still cause issues if they race with the rollback. The comment should clarify that the ID-based safety relies on the next patch, or be reworded to match the current state.

Suggested rewording:
```c
/*
 * On partial failure, roll back all queued requests. Stale alarm
 * callbacks may still fire but will fail ID-based lookup after
 * these requests are removed.
 */
```

---

## PATCH 5/5: eal: avoid deadlock in async IPC alarm callback

**No issues found.**

---

## Summary

- **Patch 1/5**: Clean fix, no issues.
- **Patch 2/5**: Missing error handling when `rte_eal_alarm_set()` fails in the primary process path (line ~1251).
- **Patch 3/5**: Clean fix, no issues.
- **Patch 4/5**: Comment in rollback logic should clarify dependency on patch 5/5 or be reworded to match current behavior (Warning).
- **Patch 5/5**: Clean fix, no issues.

The most critical issue is the missing error handling in patch 2/5 (primary process path). The series overall addresses real correctness bugs (callback not firing, memory leaks, use-after-free risk, deadlock) with a sound design using numeric request IDs.


More information about the test-report mailing list