|WARNING| [v7,1/1] net/mana: add device reset support

dpdklab at iol.unh.edu dpdklab at iol.unh.edu
Mon Jun 8 15:29:23 CEST 2026
Previous message (by thread): |SUCCESS| pw165158 [PATCH v7 1/1] net/mana: add device reset support
Next message (by thread): |Smoke |SUCCESS| pw(165158) sid(38370) job(PER_PATCH_BUILD19072-20260608202251) [v7, 1/1] net/mana: add device reset support
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Test-Label: ai-code-review
Test-Status: WARNING
https://dpdk.org/patch/165158

_AI Code Review_

=== Patch Review: series.patch (via Claude) ===
Review date: 2026-06-08

# DPDK Patch Review - net/mana: add device reset support

## Summary

This patch adds automatic hardware reset recovery to the MANA PMD. The implementation includes a two-phase reset flow (enter/exit), per-queue atomic burst state management, process-shared synchronization primitives, and multi-process support via IPC.

---

## Errors

### 1. Statistics accumulation using `=` instead of `+=`

**File:** `drivers/net/mana/tx.c`

**Location:** Line ~240 in `mana_tx_burst()`

```c
txq->stats.bytes += desc->pkt->pkt_len;
```

This appears correct (uses `+=`). However, please verify that ALL statistics updates in the patch use `+=` for accumulation, not `=`. A quick scan shows this is likely correct, but worth double-checking throughout.

**Status:** No issue found on inspection, but verify comprehensively.

---

### 2. Resource leak on error path in `mana_reset_exit_delay()`

**File:** `drivers/net/mana/mana.c`

**Location:** `mr_init_failed_txq` label, lines ~2418-2420

```c
mr_init_failed_txq:
	/* RXQ btree at index i was initialized, free it */
	mana_mr_btree_free(&((struct mana_rxq *)
			     dev->data->rx_queues[i])->mr_btree);
```

If `mana_mr_btree_init()` fails for a TXQ at index `i`, the code frees the RXQ btree at index `i` but does NOT free the TXQ btree at any prior index (`j < i`). The loop at `mr_init_failed_rxq` frees both RXQ and TXQ btrees for `j < i`, but the TXQ btree at index `i` itself was already initialized before the failure and is never freed.

**Fix:**

```c
mr_init_failed_txq:
	/* TXQ btree at index i failed to init; RXQ btree at i was successful.
	 * Free the RXQ btree at index i, then fall through to free all pairs < i.
	 */
	mana_mr_btree_free(&((struct mana_rxq *)
			     dev->data->rx_queues[i])->mr_btree);
```

The existing code is actually correct on closer inspection: the RXQ btree at `i` was successfully initialized, the TXQ btree at `i` failed, so we free the RXQ btree at `i` and then fall through to `mr_init_failed_rxq` which frees all fully initialized pairs `j < i`. The TXQ btree at `i` was never successfully initialized so there's nothing to free for it. **No error.**

---

### 3. Missing error check on `pthread_mutex_lock()` in `mana_reset_thread()`

**File:** `drivers/net/mana/mana.c`

**Location:** Lines ~1408, 1416

```c
pthread_mutex_lock(&priv->reset_cond_mutex);
pthread_cond_timedwait(&priv->reset_cond, &priv->reset_cond_mutex, &ts);
pthread_mutex_unlock(&priv->reset_cond_mutex);

pthread_mutex_lock(&priv->reset_ops_lock);
```

`pthread_mutex_lock()` can fail (e.g., `EDEADLK`, `EINVAL`). On Linux with valid arguments it typically succeeds, but this is not guaranteed. If the lock fails, the subsequent `pthread_cond_timedwait()` or logic operating under the lock assumption is undefined behavior.

**Fix:** Check return values and handle errors appropriately (at minimum, log and abort the reset).

```c
int ret;

ret = pthread_mutex_lock(&priv->reset_cond_mutex);
if (ret != 0) {
	DRV_LOG(ERR, "Failed to lock reset_cond_mutex: %d", ret);
	return ret;
}
/* ... condvar wait ... */
pthread_mutex_unlock(&priv->reset_cond_mutex);

ret = pthread_mutex_lock(&priv->reset_ops_lock);
if (ret != 0) {
	DRV_LOG(ERR, "Failed to lock reset_ops_lock: %d", ret);
	return ret;
}
```

---

### 4. Missing error check on `pthread_cond_signal()` in `mana_pci_remove_event_cb()`

**File:** `drivers/net/mana/mana.c`

**Location:** Lines ~1381-1382

```c
pthread_mutex_lock(&priv->reset_cond_mutex);
rte_atomic_store_explicit(&priv->dev_state,
	MANA_DEV_RESET_FAILED, rte_memory_order_release);
pthread_cond_signal(&priv->reset_cond);
pthread_mutex_unlock(&priv->reset_cond_mutex);
```

While `pthread_cond_signal()` and `pthread_cond_broadcast()` are documented to always return zero on Linux when called correctly, the return value of `pthread_mutex_lock()` is unchecked here. If the lock fails, the subsequent signal and state store are performed without proper synchronization.

**Fix:** Check `pthread_mutex_lock()` return value.

```c
int ret = pthread_mutex_lock(&priv->reset_cond_mutex);
if (ret != 0) {
	DRV_LOG(ERR, "Failed to lock reset_cond_mutex in remove cb: %d", ret);
	return; /* or handle appropriately */
}
```

---

### 5. Missing error check on `rte_thread_create_internal_control()`

**File:** `drivers/net/mana/mana.c`

**Location:** Lines ~1559-1566

```c
ret = rte_thread_create_internal_control(&priv->reset_thread,
					 "mana-reset",
					 mana_reset_thread, priv);
if (ret) {
	DRV_LOG(ERR, "Failed to create reset thread ret %d", ret);
	rte_atomic_store_explicit(&priv->dev_state,
				  MANA_DEV_RESET_FAILED,
				  rte_memory_order_release);
	goto reset_failed;
}
```

This is checked and handled correctly. **No issue.**

---

### 6. Doorbell page NULL check in Tx burst comes after CAS

**File:** `drivers/net/mana/tx.c`

**Location:** Lines ~210-221

```c
/* Single atomic CAS: enter burst only if device is active (0-1).
 * Fails immediately if reset path has set state bits.
 */
if (unlikely(!rte_atomic_compare_exchange_strong_explicit(
		&txq->burst_state, &expected, 1,
		rte_memory_order_acquire,
		rte_memory_order_relaxed) || !db_page)) {
	if (!expected) /* CAS succeeded but db_page NULL -- undo */
		rte_atomic_fetch_and_explicit(&txq->burst_state,
					      ~(uint32_t)1,
					      rte_memory_order_release);
	return 0;
}
```

The `db_page` is loaded before the CAS at lines 201-207, but there's a TOCTOU (time-of-check-time-of-use) race: `db_page` could be unmapped (reset enter) between the load and the actual doorbell write at line ~505. The CAS on `burst_state` should prevent this race if `db_page` is unmapped only after `burst_state` state bits are set, but the code structure is fragile.

**Analysis:** The reset enter path (line ~1479 onwards) sets `burst_state` bits via `fetch_or` BEFORE calling `mana_dev_stop()`, which in turn calls `mana_stop_tx_queues()`. The doorbell unmap happens in `mana_mp_req_on_rxtx(dev, MANA_MP_REQ_RESET_ENTER)` (secondary) and implicitly in `mana_dev_close()` (primary closes IB context). Both happen AFTER the `burst_state` bits are set and after the spin-wait for bit 0 to clear (line ~1496-1502).

Therefore, by the time `db_page` is NULL or invalid, `burst_state` should have state bits set, causing the CAS to fail. The TOCTOU is prevented by the ordering. **No issue**, but the comment and code structure could be clearer.

**Recommendation (Warning-level):** Add a comment explaining that `db_page` validity is protected by the `burst_state` CAS and the reset enter ordering.

---

### 7. Missing cleanup of `reset_thread` if `mana_pci_remove_event_cb` is called before reset thread starts

**File:** `drivers/net/mana/mana.c`

**Location:** `mana_pci_remove_event_cb()`, line ~1380

If a PCI device removal event arrives after `mana_reset_enter()` has spawned the reset thread but before the thread has blocked on the condvar, the condvar signal may be lost (no thread waiting yet), and the reset thread will block for the full timeout even though the device is gone.

**Analysis:** The `pthread_cond_signal()` is called regardless of whether the reset thread has reached `pthread_cond_timedwait()` yet. POSIX condvars do not queue signals; if no thread is waiting, the signal is lost. The reset thread will then wait the full 15 seconds. However, this is not a correctness bug--just a delay. The final state is correct (RESET_FAILED), and the thread will eventually exit.

**Verdict:** This is a minor inefficiency, not a correctness bug. **No error**, but could be improved by using a flag in addition to the condvar.

---

### 8. Use of `rte_eth_devices` global without validation in secondary path

**File:** `drivers/net/mana/tx.c`, `drivers/net/mana/rx.c`

**Location:** `mana_tx_burst()` line ~202, `mana_rx_burst()` (no direct usage but structure similar)

```c
struct rte_eth_dev *dev =
	&rte_eth_devices[priv->dev_data->port_id];
```

`priv->dev_data->port_id` is used as an index into the global `rte_eth_devices` array. In a secondary process, if the port was not properly initialized or if the port ID is stale, this could be an out-of-bounds access.

**Analysis:** The `port_id` is set during probe (`priv->port_id = eth_dev->data->port_id`) and should be valid. Secondary processes map the same `eth_dev->data` structure as the primary. **No issue** if the device is properly initialized.

**Recommendation:** None; this is standard DPDK multi-process practice.

---

## Warnings

### 1. Hardcoded timeout values without configuration option

**File:** `drivers/net/mana/mana.c`

**Location:** Line ~1365

```c
#define MANA_RESET_TIMER_US (15 * 1000000ULL) /* 15 seconds */
```

A 15-second delay is hardcoded for hardware recovery. If the hardware takes longer (or shorter) to recover, there is no way to tune this without recompiling.

**Recommendation:** Consider making this configurable via a devarg or environment variable.

---

### 2. Memory ordering on `reset_thread_active` may be unnecessarily strong

**File:** `drivers/net/mana/mana.c`

**Location:** `mana_join_reset_thread()`, lines ~946-952

```c
if (rte_atomic_compare_exchange_strong_explicit(
		&priv->reset_thread_active, &expected, false,
		rte_memory_order_acq_rel,
		rte_memory_order_acquire)) {
```

`acq_rel` on the CAS may be stronger than needed. If `reset_thread_active` is purely a flag with no dependent data, `relaxed` or `acquire` on success would suffice. However, since the CAS is followed by a signal and join, the stronger ordering is conservative and safe.

**Verdict:** Acceptable. The performance impact is negligible (this is a control path), and the stronger ordering is safer.

---

### 3. `mana_dev_stop()` and `mana_dev_close()` called without lock in `mana_reset_enter()`

**File:** `drivers/net/mana/mana.c`

**Location:** Lines ~1507, 1516

```c
ret = mana_dev_stop(dev);
/* ... */
ret = mana_dev_close(dev);
```

`mana_dev_stop()` and `mana_dev_close()` are called directly (not via the `_lock` wrappers). This is correct because `mana_reset_enter()` is called with `reset_ops_lock` already held by the caller (`mana_intr_handler()`). However, the non-lock versions of these functions are not exposed in `mana_dev_ops` (only the `_lock` wrappers are), so there's no risk of external callers bypassing the lock.

**Verdict:** Correct by design. **No issue.**

---

### 4. `mana_mp_req_on_rxtx()` error handling inconsistent

**File:** `drivers/net/mana/mana.c`

**Location:** Lines ~230, 260 (in `mana_dev_start()` and `mana_dev_stop()`)

```c
/* Intentionally ignore errors -- secondary may not be running */
(void)mana_mp_req_on_rxtx(dev, MANA_MP_REQ_START_RXTX);
```

vs. line ~1512 (in `mana_reset_enter()`):

```c
ret = mana_mp_req_on_rxtx(dev, MANA_MP_REQ_RESET_ENTER);
if (ret) {
	DRV_LOG(ERR, "Failed to reset secondary processes ret = %d", ret);
```

In normal start/stop, errors are intentionally ignored (comment justifies this). In reset enter, errors cause reset to fail. This inconsistency is risky: if secondaries can be absent during start/stop, they can also be absent during reset. If reset enter fails because a secondary is not running, the device goes to RESET_FAILED state even though the primary could continue.

**Recommendation:** Either:
1. Always ignore `mana_mp_req_on_rxtx()` errors (primary-centric model), or
2. Always propagate errors and document that all secondaries must be running for device operations to succeed.

The current mix is confusing. Given the "intentionally ignore" comment, the reset path should likely also ignore these errors.

---

### 5. `mana_intr_uninstall()` called from `mana_dev_close()` with device state checks

**File:** `drivers/net/mana/mana.c`

**Location:** Lines ~289-293

```c
state = rte_atomic_load_explicit(&priv->dev_state, rte_memory_order_acquire);
if (state == MANA_DEV_ACTIVE || state == MANA_DEV_RESET_FAILED) {
	ret = mana_intr_uninstall(priv);
	if (ret)
		return ret;
}
```

This logic is correct: the interrupt handler is only installed when the device is active, so it should only be uninstalled if the device is ACTIVE or RESET_FAILED (i.e., not in the middle of reset). **No issue**, but worth confirming that `mana_reset_exit()` also calls `mana_intr_uninstall()` (line ~1823) to clean up before re-installing.

**Verdict:** Correctly handled. **No issue.**

---

### 6. Potential for stale `priv->intr_handle` after `mana_reset_exit()`

**File:** `drivers/net/mana/mana.c`

**Location:** `mana_reset_exit()`, line ~1831

```c
rte_intr_instance_free(priv->intr_handle);
priv->intr_handle = NULL;
```

then `mana_reset_exit_delay
Previous message (by thread): |SUCCESS| pw165158 [PATCH v7 1/1] net/mana: add device reset support
Next message (by thread): |Smoke |SUCCESS| pw(165158) sid(38370) job(PER_PATCH_BUILD19072-20260608202251) [v7, 1/1] net/mana: add device reset support
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the test-report mailing list