|SUCCESS| [v2,1/1] net/mana: add device reset support

dpdklab at iol.unh.edu dpdklab at iol.unh.edu
Sat May 9 13:09:08 CEST 2026


Test-Label: ai-code-review
Test-Status: SUCCESS
https://dpdk.org/patch/163830

_AI Code Review_

=== Patch Review: series.patch (via Claude) ===
Review date: 2026-05-09

commit b8a7f9e123456789abcdef0123456789abcdef01
Author: Wei Hu <weh at linux.microsoft.com>

    net/mana: add device reset support

This patch adds support for handling hardware reset events in the MANA
driver. When the MANA kernel driver receives a hardware service event,
it initiates a device reset and notifies userspace via
IBV_EVENT_DEVICE_FATAL. The DPDK driver handles this by performing an
automatic teardown and recovery sequence.

The reset flow has two phases. In the enter phase, running on the EAL
interrupt thread, the driver transitions the device state, waits for
data path threads to reach a quiescent state using RCU, stops queues,
tears down IB resources, and frees per-queue MR caches. A control
thread is then spawned to handle the exit phase: it waits for the
hardware to recover, unregisters the interrupt handler, re-probes the
PCI device, reinitializes MR caches, and restarts queues.

A per-device spinlock serializes the reset path with ethdev operations.
Operations that cannot wait (configure, queue setup) return -EBUSY
during reset, while dev_stop and dev_close join the reset thread and
use a blocking lock to ensure proper sequencing.

Multi-process support is included: secondary processes unmap and remap
doorbell pages via IPC during the reset enter and exit phases. Data
path functions in both primary and secondary processes check the device
state atomically and return early when the device is not active.

The driver uses ethdev recovery events to notify upper layers (e.g.
netvsc) of the reset lifecycle: RTE_ETH_EVENT_ERR_RECOVERING on entry,
RTE_ETH_EVENT_RECOVERY_SUCCESS or RTE_ETH_EVENT_RECOVERY_FAILED on
completion. A PCI device removal event callback distinguishes
hot-remove from service reset.

> diff --git a/drivers/net/mana/mana.c b/drivers/net/mana/mana.c

> @@ -103,6 +106,23 @@ mana_dev_configure(struct rte_eth_dev *dev)
>  			      RTE_ETH_RX_OFFLOAD_VLAN_STRIP);
>  
>  	priv->num_queues = dev->data->nb_rx_queues;
> +	DRV_LOG(DEBUG, "priv %p, port %u, dev port %u, num_queues: %u",
> +		priv, priv->port_id, priv->dev_port, priv->num_queues);
> +
> +	/*
> +	 * Register data path thread IDs (rx and tx) with the RCU
> +	 * quiescent state variable for device state synchronization.
> +	 */
> +	for (int i = 0; i < 2 * priv->num_queues; i++) {
> +		if (rte_rcu_qsbr_thread_register(priv->dev_state_qsv, i) != 0) {
> +			DRV_LOG(ERR, "Failed to register rcu qsv thread %d of total %d",
> +				i, 2 * priv->num_queues - 1);
> +			return -EINVAL;
> +		}

Does this code handle configure being called multiple times (for
example, after a reconfigure or when num_queues changes)? If
num_queues decreases, does it unregister the threads that are no
longer needed? If num_queues increases, is it safe to register the
same thread IDs again?

> @@ -242,26 +262,33 @@ mana_dev_stop(struct rte_eth_dev *dev)
>  {
>  	int ret;
>  	struct mana_priv *priv = dev->data->dev_private;
> -
> -	rxq_intr_disable(priv);
> +	enum mana_device_state state;
> +
> +	state = rte_atomic_load_explicit(&priv->dev_state,
> +					 rte_memory_order_acquire);
> +	if (state == MANA_DEV_ACTIVE ||
> +	    state == MANA_DEV_RESET_FAILED) {
> +		rxq_intr_disable(priv);
> +		DRV_LOG(DEBUG, "rxq_intr_disable called");
> +	}

Does this code skip rxq_intr_disable during reset states
MANA_DEV_RESET_ENTER and MANA_DEV_RESET_EXIT? Could this leave
interrupts enabled when they should be disabled?

> @@ -275,31 +302,58 @@ mana_dev_close(struct rte_eth_dev *dev)
>  {
>  	struct mana_priv *priv = dev->data->dev_private;
>  	int ret;
> +	enum mana_device_state state;
> +
> +	/* Wait for reset thread to finish before freeing resources */
> +	if (priv->reset_thread_active) {
> +		pthread_mutex_lock(&priv->reset_cond_mutex);
> +		rte_atomic_store_explicit(&priv->dev_state,
> +			MANA_DEV_ACTIVE, rte_memory_order_release);
> +		pthread_cond_signal(&priv->reset_cond);
> +		pthread_mutex_unlock(&priv->reset_cond_mutex);
> +		rte_thread_join(priv->reset_thread, NULL);
> +		priv->reset_thread_active = false;
> +	}

Does this code handle a race if mana_dev_close is called concurrently
from multiple threads? The check and modification of
reset_thread_active are not protected by a lock here.

> @@ -391,6 +445,27 @@ mana_dev_info_get(struct rte_eth_dev *dev,
>  	return 0;
>  }
>  
> +static int
> +mana_dev_info_get_lock(struct rte_eth_dev *dev,
> +		       struct rte_eth_dev_info *dev_info)
> +{
> +	struct mana_priv *priv = dev->data->dev_private;
> +	int ret;
> +
> +	if (rte_spinlock_trylock(&priv->reset_ops_lock)) {
> +		if (rte_atomic_load_explicit(&priv->dev_state,
> +		    rte_memory_order_acquire) != MANA_DEV_ACTIVE) {
> +			rte_spinlock_unlock(&priv->reset_ops_lock);
> +			return -EBUSY;
> +		}
> +		ret = mana_dev_info_get(dev, dev_info);
> +		rte_spinlock_unlock(&priv->reset_ops_lock);
> +	} else {
> +		ret = -EBUSY;
> +	}
> +	return ret;
> +}

Does this trylock approach cause the device info get operation to fail
unnecessarily during brief lock contention (not just during reset)?

> @@ -820,33 +942,217 @@ mana_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
>  	return mana_ifreq(priv, SIOCSIFMTU, &request);
>  }
>  
> +#define MANA_OPS_1_LOCK(_func)						\
> +static int								\
> +_func##_lock(struct rte_eth_dev *dev)					\
> +{									\
> +	struct mana_priv *priv = dev->data->dev_private;		\
> +	int ret;							\
> +	if (rte_spinlock_trylock(&priv->reset_ops_lock)) {		\
> +		if (rte_atomic_load_explicit(&priv->dev_state,		\
> +		    rte_memory_order_acquire) !=			\
> +		    MANA_DEV_ACTIVE) {					\
> +			rte_spinlock_unlock(&priv->reset_ops_lock);	\
> +			return -EBUSY;					\
> +		}							\
> +		ret = _func(dev);					\
> +		rte_spinlock_unlock(&priv->reset_ops_lock);		\
> +	} else {							\
> +		ret = -EBUSY;						\
> +	}								\
> +	return ret;							\
> +}

Does this code verify that priv is not NULL before dereferencing it?
If dev->data is NULL or dev->data->dev_private is NULL, does this
code crash?

> @@ -1063,28 +1337,398 @@ mana_ibv_device_to_pci_addr(const struct ibv_device *device,
>  	return 0;
>  }
>  
> +static void
> +mana_reset_enter(struct mana_priv *priv)
> +	__rte_no_thread_safety_analysis
> +{
> +	int ret;
> +	uint64_t ticket;
> +	struct rte_eth_dev *dev = &rte_eth_devices[priv->port_id];
> +
> +	rte_atomic_store_explicit(&priv->dev_state, MANA_DEV_RESET_ENTER,
> +				     rte_memory_order_release);
> +
> +	DRV_LOG(DEBUG, "Entering into device reset state");
> +	DRV_LOG(DEBUG, "Resetting dev = %p, priv = %p", dev, priv);
> +
> +	ticket = rte_rcu_qsbr_start(priv->dev_state_qsv);
> +
> +	while (rte_rcu_qsbr_check(priv->dev_state_qsv, ticket, false) == 0)
> +		rte_pause();

Does this busy-wait loop have any timeout or progress checking? Can it
hang indefinitely if a thread is stuck in the data path?

> +	ret = mana_dev_stop(dev);
> +	if (ret) {
> +		DRV_LOG(ERR, "Failed to stop mana dev ret %d", ret);
> +		rte_atomic_store_explicit(&priv->dev_state, MANA_DEV_RESET_FAILED,
> +				     rte_memory_order_release);
> +		goto reset_failed;
> +	}
> +
> +	/* Unmap secondary doorbell pages after data path is stopped */
> +	ret = mana_mp_req_on_rxtx(dev, MANA_MP_REQ_RESET_ENTER);
> +	if (ret) {
> +		DRV_LOG(ERR, "Failed to reset secondary processes ret = %d",
> +			ret);
> +		rte_atomic_store_explicit(&priv->dev_state, MANA_DEV_RESET_FAILED,
> +				     rte_memory_order_release);
> +		goto reset_failed;
> +	}

Does this code ensure that secondary processes are not actively using
the doorbell pages before unmapping them? Are there race conditions if
secondary processes are in the middle of TX/RX operations?

> +	ret = mana_dev_close(dev);
> +	if (ret) {
> +		DRV_LOG(ERR, "Failed to close mana dev ret %d", ret);
> +		rte_atomic_store_explicit(&priv->dev_state, MANA_DEV_RESET_FAILED,
> +				     rte_memory_order_release);
> +		goto reset_failed;
> +	}
> +
> +	for (int i = 0; i < priv->num_queues; i++) {
> +		struct mana_rxq *rxq = dev->data->rx_queues[i];
> +		struct mana_txq *txq = dev->data->tx_queues[i];
> +
> +		DRV_LOG(DEBUG, "Free MR for priv = %p, rxq %u, txq %u",
> +			priv, rxq->rxq_idx, txq->rxq_idx);
> +		mana_mr_btree_free(&rxq->mr_btree);
> +		mana_mr_btree_free(&txq->mr_btree);
> +	}

Does this code verify that rxq and txq are not NULL before
dereferencing them? Can dev->data->rx_queues[i] or
dev->data->tx_queues[i] be NULL?

> +	ret = rte_thread_create_internal_control(&priv->reset_thread,
> +						 "mana-reset",
> +						 mana_reset_thread, priv);
> +	if (ret) {
> +		DRV_LOG(ERR, "Failed to create reset thread ret %d", ret);
> +		rte_atomic_store_explicit(&priv->dev_state,
> +					  MANA_DEV_RESET_FAILED,
> +					  rte_memory_order_release);
> +		rte_spinlock_unlock(&priv->reset_ops_lock);
> +		return;
> +	}
> +	priv->reset_thread_active = true;

Does this code handle a race if a second reset event arrives before
the first reset thread completes? Does it check reset_thread_active
before creating a new thread?

> +static uint32_t
> +mana_reset_thread(void *arg)
> +	__rte_no_thread_safety_analysis
> +{
> +	struct mana_priv *priv = (struct mana_priv *)arg;
> +	struct timespec ts;
> +
> +	DRV_LOG(INFO, "Reset thread started, waiting %us",
> +		(unsigned int)(MANA_RESET_TIMER_US / 1000000));
> +
> +	/* Wait on condvar with timeout -- can be woken early by PCI remove */
> +	clock_gettime(CLOCK_REALTIME, &ts);
> +	ts.tv_sec += MANA_RESET_TIMER_US / 1000000;
> +
> +	pthread_mutex_lock(&priv->reset_cond_mutex);
> +	pthread_cond_timedwait(&priv->reset_cond, &priv->reset_cond_mutex, &ts);
> +	pthread_mutex_unlock(&priv->reset_cond_mutex);

Does this code check the return value of pthread_cond_timedwait to
distinguish timeout from signal? Is CLOCK_REALTIME appropriate here
(susceptible to system time changes) versus CLOCK_MONOTONIC?

> +static void
> +mana_reset_exit(struct mana_priv *priv)
> +	__rte_no_thread_safety_analysis
> +{
> +	int ret;
> +
> +	if (!priv) {
> +		DRV_LOG(ERR, "Private structure invalid");
> +		return;
> +	}

Does mana_reset_exit need to clean up or set error state if it returns
early due to NULL priv? Or does this indicate a serious internal
error?

> +	/* Unregister the interrupt handler. Since mana_reset_exit is always
> +	 * called from mana_reset_thread (a non-interrupt thread), the
> +	 * interrupt source is inactive and rte_intr_callback_unregister
> +	 * succeeds directly.
> +	 */
> +	if (priv->intr_handle) {
> +		ret = rte_intr_callback_unregister(priv->intr_handle,
> +						   mana_intr_handler, priv);
> +		if (ret < 0)
> +			DRV_LOG(ERR, "Failed to unregister intr callback ret %d",


More information about the test-report mailing list