|WARNING| [v8,1/1] net/mana: add device reset support

dpdklab at iol.unh.edu dpdklab at iol.unh.edu
Wed Jun 10 20:26:05 CEST 2026


Test-Label: ai-code-review
Test-Status: WARNING
https://dpdk.org/patch/165257

_AI Code Review_

=== Patch Review: series.patch (via Claude) ===
Review date: 2026-06-10

# DPDK Patch Review: net/mana Device Reset Support

## Summary
This patch adds automatic recovery from hardware service reset events to the MANA PMD. The implementation includes a two-phase reset flow (enter/teardown/exit), per-queue atomic burst state management, process-shared synchronization primitives, multi-process IPC for secondary doorbell remapping, and ethdev recovery event emission.

---

## Correctness Issues

### Errors

1. **Use-after-free risk in `mana_reset_exit_delay` when `mana_pci_probe` fails**

   **drivers/net/mana/mana.c:1651-1652**
   ```c
   ret = mana_pci_probe(NULL, pci_dev);
   if (ret) {
       DRV_LOG(ERR, "Failed to probe mana pci dev ret %d", ret);
       rte_atomic_store_explicit(&priv->dev_state, MANA_DEV_RESET_FAILED,
               rte_memory_order_release);
       goto out;
   }
   ```
   If `mana_pci_probe` fails partway through (after allocating some resources but before completing), subsequent accesses to `priv` or `dev->data->rx_queues[i]` in the MR initialization loop may be invalid. The probe failure path in `mana_probe_port` does not free `priv` on reset (`if (!is_reset)`), so `priv` remains valid, but queue structures may be partially initialized or freed by the failed probe.

   **Suggested fix:** After `mana_pci_probe` fails, verify that queues are in a safe state before attempting to free MR btrees. Consider setting `priv->num_queues = 0` in the probe failure path when `is_reset == true`, or add a validation check here.

2. **Resource leak: `priv->intr_handle` not freed if `rte_dev_event_callback_register` fails**

   **drivers/net/mana/mana.c:1964-1970**
   ```c
   ret = rte_dev_event_callback_register(eth_dev->device->name,
                         mana_pci_remove_event_cb, priv);
   if (ret)
       DRV_LOG(WARNING, "Failed to register PCI remove event callback");

   eth_dev->intr_handle = priv->intr_handle;
   return 0;
   ```
   If `rte_dev_event_callback_register` fails, the function returns success but `priv->intr_handle` is not freed. On subsequent `mana_intr_uninstall` or `mana_reset_exit`, the callback unregister will fail with `-ENOENT` (callback was never registered), but the `intr_handle` itself is leaked.

   **Suggested fix:** Change the warning to an error and call `rte_intr_instance_free` + `goto free_intr` on failure, or document that this is acceptable (callback registration failure is non-fatal).

3. **Potential double-join of reset thread in rare race**

   **drivers/net/mana/mana.c:988-998** (`mana_join_reset_thread`)
   ```c
   if (rte_atomic_compare_exchange_strong_explicit(
           &priv->reset_thread_active, &expected, false,
           rte_memory_order_acq_rel,
           rte_memory_order_acquire)) {
       pthread_mutex_lock(&priv->reset_cond_mutex);
       rte_atomic_store_explicit(&priv->dev_state,
           MANA_DEV_ACTIVE, rte_memory_order_release);
       pthread_cond_signal(&priv->reset_cond);
       pthread_mutex_unlock(&priv->reset_cond_mutex);
       rte_thread_join(priv->reset_thread, NULL);
   }
   ```
   Between the CAS clearing `reset_thread_active` and the `rte_thread_join`, if another thread calls `mana_join_reset_thread` concurrently, the second thread will not enter the `if` block (CAS fails) but the reset thread may not have fully exited yet. This is safe only if no code path can call `mana_join_reset_thread` concurrently on the same `priv`. However, the patch does not document this requirement.

   **Analysis:** All call sites (`mana_dev_stop_lock`, `mana_dev_close_lock`, `mana_dev_uninit`) are either under `reset_ops_lock` or serialized by EAL port teardown. The pattern is safe but not obviously so.

   **Suggested fix:** Add a comment above `mana_join_reset_thread` stating that callers must ensure serialization (e.g., via `reset_ops_lock` or EAL port lifecycle serialization).

4. **Missing error check: `rte_thread_create_internal_control` failure leaves state inconsistent**

   **drivers/net/mana/mana.c:1540-1552**
   ```c
   ret = rte_thread_create_internal_control(&priv->reset_thread,
                        "mana-reset",
                        mana_reset_thread, priv);
   if (ret) {
       DRV_LOG(ERR, "Failed to create reset thread ret %d", ret);
       rte_atomic_store_explicit(&priv->dev_state,
                     MANA_DEV_RESET_FAILED,
                     rte_memory_order_release);
       goto reset_failed;
   }
   rte_atomic_store_explicit(&priv->reset_thread_active,
       true, rte_memory_order_release);
   ```
   If thread creation fails, `dev_state` is set to `RESET_FAILED` and the function jumps to `reset_failed`, which clears `burst_state` and emits `RECOVERY_FAILED`. However, `reset_thread_active` is never set to `true`, so any subsequent call to `mana_join_reset_thread` will not join a nonexistent thread. This is safe but subtle.

   **Correctness verdict:** Not a bug--`reset_thread_active` remains `false` if thread creation fails, so no join is attempted. However, the code would be clearer if `reset_thread_active` were explicitly set to `false` on failure (redundant store, but documents intent).

   **Suggested fix:** Add `rte_atomic_store_explicit(&priv->reset_thread_active, false, ...)` before `goto reset_failed` for clarity (optional, not a correctness issue).

### Warnings

5. **Potential deadlock: `mana_intr_uninstall` inside reset under lock may hang on unregister**

   **drivers/net/mana/mana.c:1799-1808** (`mana_reset_exit`)
   ```c
   if (priv->intr_handle) {
       ret = rte_intr_callback_unregister(priv->intr_handle,
                          mana_intr_handler, priv);
       if (ret < 0)
           DRV_LOG(ERR, "Failed to unregister intr callback ret %d",
               ret);
   ```
   The comment states "Since mana_reset_exit is always called from mana_reset_thread (a non-interrupt thread), the interrupt source is inactive and rte_intr_callback_unregister succeeds directly." However, if an IBV async event arrives concurrently, `mana_intr_handler` could be executing and attempting to acquire `reset_ops_lock`, while `mana_reset_exit` holds the lock and calls `rte_intr_callback_unregister`, which may block waiting for the handler to complete.

   **Analysis:** The reset path sets `dev_state` to `RESET_ENTER` or `RESET_EXIT` before reaching here, so `mana_intr_handler` will not re-enter the reset logic (the `if (dev_state == MANA_DEV_ACTIVE)` check prevents it). The handler will ack the event and return without acquiring the lock. This is safe, but the comment is misleading--the interrupt source is NOT inactive; it just won't trigger reset logic.

   **Suggested fix:** Clarify the comment to state that the interrupt handler checks `dev_state` and exits early if not `ACTIVE`, so unregister is safe despite concurrent events.

6. **Missing bounds check on loop index in `mana_reset_exit_delay` MR init failure path**

   **drivers/net/mana/mana.c:1696-1706**
   ```c
   mr_init_failed_all:
       i = priv->num_queues;
       goto mr_init_failed_rxq;

   mr_init_failed_txq:
       /* RXQ btree at index i was initialized, free it */
       mana_mr_btree_free(&((struct mana_rxq *)
                    dev->data->rx_queues[i])->mr_btree);

   mr_init_failed_rxq:
       /* Free all fully initialized btrees for indices < i */
       for (int j = 0; j < i; j++) {
   ```
   If `mr_init_failed_all` is reached, `i` is set to `priv->num_queues`, then `mr_init_failed_txq` accesses `dev->data->rx_queues[i]`, which is out of bounds (valid indices are `0` to `num_queues - 1`).

   **Suggested fix:** In the `mr_init_failed_all` case, skip the `mr_init_failed_txq` logic (it does not apply). Change to:
   ```c
   mr_init_failed_all:
       i = priv->num_queues;
       goto mr_init_failed_rxq; /* skip mr_init_failed_txq */
   ```
   Or restructure to avoid the out-of-bounds access.

7. **Queue state check in `mana_stop_rx_queues` / `mana_stop_tx_queues` returns 0 instead of proceeding**

   **drivers/net/mana/rx.c:178-180**, **drivers/net/mana/tx.c:19-20**
   ```c
   for (i = 0; i < priv->num_queues; i++)
       if (dev->data->rx_queue_state[i] == RTE_ETH_QUEUE_STATE_STOPPED)
           return 0;
   ```
   If ANY queue is already stopped, the function returns success without stopping the remaining queues. This is likely incorrect--the intent is probably to skip queues that are already stopped, not to abort the entire loop.

   **Suggested fix:** Change to `continue;` instead of `return 0;` to skip already-stopped queues, or check that ALL queues are stopped before skipping.

8. **PCI remove callback may access freed `priv` after `mana_dev_close`**

   **drivers/net/mana/mana.c:1377-1393** (`mana_pci_remove_event_cb`)
   ```c
   pthread_mutex_lock(&priv->reset_cond_mutex);
   rte_atomic_store_explicit(&priv->dev_state,
       MANA_DEV_RESET_FAILED, rte_memory_order_release);
   pthread_cond_signal(&priv->reset_cond);
   pthread_mutex_unlock(&priv->reset_cond_mutex);

   pthread_mutex_lock(&priv->reset_ops_lock);
   pthread_mutex_unlock(&priv->reset_ops_lock);
   ```
   If the PCI device is hot-removed during reset, the reset thread may call `mana_dev_close` (which can fail but does not free `priv`), then exit. However, the callback was registered in `mana_intr_install` with `priv` as the `cb_arg`, and it is unregistered in `mana_intr_uninstall` (called from `mana_dev_close` or `mana_reset_exit`). If the callback fires after `priv` is freed by `mana_dev_uninit`, this is a use-after-free.

   **Analysis:** The callback is unregistered in `mana_intr_uninstall` before `priv` can be freed. The only path to free `priv` is `mana_pci_remove` - `mana_dev_uninit` - `rte_free(priv)`, and `mana_dev_uninit` calls `mana_join_reset_thread` then `mana_dev_close` (which calls `mana_intr_uninstall`). So the callback is unregistered before `priv` is freed. However, `mana_dev_free_resources` (which destroys mutexes) is called AFTER `mana_dev_close`, so the callback could still fire and access destroyed mutexes if EAL does not guarantee callback unregister is synchronous.

   **Suggested fix:** Verify that `rte_dev_event_callback_unregister` is synchronous (waits for in-flight callbacks) or add a comment documenting the ordering requirement. The comment in `mana_intr_uninstall` mentions not retrying on `-EAGAIN` to avoid deadlock, which suggests the unregister may block--need to ensure it does not return while the callback is still runnable.

---

## Style and Process Issues

### Warnings

9. **Variable initialization in `mana_intr_handler` is unnecessary**

   **drivers/net/mana/mana.c:1738**
   ```c
   struct ibv_async_event event = { 0 };
   ```
   The variable is immediately overwritten by `ibv_get_async_event(&event)` in the loop, so zero-initialization is dead code. (However, on loop exit the event may be uninitialized if `ibv_get_async_event` fails, but the code does not access it after the loop.)

   **Suggested fix:** Remove `= { 0 }` initialization.

10. **Non-const function pointer struct `mana_dev_ops` and `mana_dev_secondary_ops`**

    **drivers/net/mana/mana.c:1146-1169**
    ```c
    static const struct eth_dev_ops mana_dev_ops = { ... };
    static const struct eth_dev_ops mana_dev_secondary_ops = { ... };
    ```
    These are already declared `const`. No issue here. (Guideline check: verified.)

11. **Hardcoded timeout in `mana_mp_req_on_rxtx`**

    **drivers/net/mana/mp.c:370**
    ```c
    struct timespec ts = {.tv_sec = MANA_MP_REQ_TIMEOUT_SEC, .tv_nsec = 0};
    ```
    The macro `MANA_MP_REQ_TIMEOUT_SEC` is defined elsewhere (not shown in patch), but the comment in `mana.c` mentions "multi-second IPC timeouts." If this is intended to be tunable, it should be a devarg or documented.

    **Suggested fix:** Document the timeout in the NIC guide or add a comment explaining why 5 seconds (or whatever the value is) is appropriate.

12. **RST documentation uses bullet lists where definition lists would be clearer**

    **doc/guides/nics/mana.rst:99-111**
    ```rst
    The driver emits the following ethdev recovery events
    to notify upper layers (e.g. netvsc) of the reset lifecycle:

    ``RTE_ETH_EVENT_ERR_RECOVERING``
       Reset has started.

    ``RTE_ETH_EVENT_RECOVERY_SUCCESS``
       Device has recovered successfully.

    ``RTE_ETH_EVENT_RECOVERY_FAILED``
       Recovery failed.
    ```
    This is already a definition list (correct RST syntax). No issue here.

13. **Missing space in log messages**

    **drivers/net/mana/mana.c:968-969**
    ```c
    DRV_LOG(ERR, "Device reset in progress, "
        "mana_dev_tx_queue_release not called");
    ```
    Multi-line strings are concatenated without space, but the continuation starts a new sentence, so this is acceptable. (C concatenates adjacent string literals; the space before the second string is preserved.)

---

## API and Documentation

### No Issues Found

14. **New feature is properly documented in release notes and NIC guide.**

    - Release notes updated in `doc/guides/rel_notes/release_26_07.rst


More information about the test-report mailing list