|SUCCESS| net/iavf: fix VF reset race and stale ARQ message handling

dpdklab at iol.unh.edu dpdklab at iol.unh.edu
Mon May 18 09:23:43 CEST 2026


Test-Label: ai-code-review
Test-Status: SUCCESS
https://dpdk.org/patch/164052

_AI Code Review_

=== Patch Review: series.patch (via Claude) ===
Review date: 2026-05-18

Reviewing this patch for DPDK coding standards and guideline compliance.

Commit: Not specified in patch
Author: Talluri Chaitanyababu <chaitanyababux.talluri at intel.com>
Subject: [PATCH] net/iavf: fix VF reset race and stale ARQ message handling

This patch addresses VF reset race conditions and stale ARQ message
handling issues in the iavf driver by introducing two-phase reset
detection and ARQ message draining.

> diff --git a/drivers/net/intel/iavf/iavf_ethdev.c b/drivers/net/intel/iavf/iavf_ethdev.c
> index 1eca20bc9a..692e4455dc 100644
> --- a/drivers/net/intel/iavf/iavf_ethdev.c
> +++ b/drivers/net/intel/iavf/iavf_ethdev.c
> @@ -2002,11 +2002,36 @@ iavf_dev_rx_queue_intr_disable(struct rte_eth_dev *dev, uint16_t queue_id)
>  	return 0;
>  }
>  
> +/* Wait until PF acknowledges VF reset (RSTAT leaves VFACTIVE) */
> +static int
> +iavf_wait_for_reset_start(struct iavf_hw *hw)
> +{
> +	int i;
> +	uint32_t rstat;

Does this function need a timeout parameter? The hardcoded 100
iterations with 10ms delays equals 1 second timeout, but is this value
optimal for all hardware platforms? Consider adding a #define for the
iteration count for maintainability.

> +
> +	for (i = 0; i < 100; i++) {
> +		rte_delay_ms(10);
> +
> +		rstat = IAVF_READ_REG(hw, IAVF_VFGEN_RSTAT);
> +		rstat &= IAVF_VFGEN_RSTAT_VFR_STATE_MASK;
> +		rstat >>= IAVF_VFGEN_RSTAT_VFR_STATE_SHIFT;
> +
> +		if (rstat != VIRTCHNL_VFR_VFACTIVE)
> +			return 0;
> +	}
> +
> +	return -1;

This returns -1 on timeout, but is this checked properly by the caller?

> +}
> +
>  static int
>  iavf_check_vf_reset_done(struct iavf_hw *hw)
>  {
>  	int i, reset;
>  
> +	/* Phase 1: wait for reset to start (leave VFACTIVE) */
> +	if (iavf_wait_for_reset_start(hw) != 0)
> +		PMD_DRV_LOG(DEBUG, "VF reset did not start within timeout");

Does this code continue after the timeout with only a DEBUG message? If
iavf_wait_for_reset_start() fails, should iavf_check_vf_reset_done()
also return failure? The current implementation appears to proceed with
Phase 2 even if Phase 1 times out, which could mask initialization
problems.

> +
>  	for (i = 0; i < IAVF_RESET_WAIT_CNT; i++) {
>  		reset = IAVF_READ_REG(hw, IAVF_VFGEN_RSTAT) &
>  			IAVF_VFGEN_RSTAT_VFR_STATE_MASK;

[ ... ]

> @@ -2517,6 +2542,31 @@ iavf_init_proto_xtr(struct rte_eth_dev *dev)
>  	}
>  }
>  
> +/* Drain stale Admin Receive Queue messages after reset */
> +static void
> +iavf_drain_arq(struct iavf_hw *hw, struct iavf_info *vf)
> +{
> +	struct iavf_arq_event_info event;
> +	int drain_count = 0;
> +
> +	memset(&event, 0, sizeof(event));
> +	event.msg_buf = vf->aq_resp;
> +
> +	while (drain_count < IAVF_AQ_LEN) {
> +		event.buf_len = IAVF_AQ_BUF_SZ;
> +
> +		if (iavf_clean_arq_element(hw, &event, NULL) != IAVF_SUCCESS)
> +			break;
> +
> +		drain_count++;

Does this code properly handle potential infinite loops? If
iavf_clean_arq_element() keeps returning SUCCESS but the queue never
empties, could this spin for IAVF_AQ_LEN iterations unnecessarily? Also,
is reusing vf->aq_resp safe here? Could this interfere with other
operations that might be using the same buffer?

> +	}
> +
> +	if (drain_count > 0)
> +		PMD_INIT_LOG(DEBUG,
> +				"Drained %d stale ARQ messages",
> +				drain_count);

The indentation here appears inconsistent with DPDK style. The string
argument should start on the same line as PMD_INIT_LOG, or if it must
wrap, align with the opening parenthesis.

> +}
> +
>  static int
>  iavf_init_vf(struct rte_eth_dev *dev)
>  {
> @@ -2558,6 +2608,10 @@ iavf_init_vf(struct rte_eth_dev *dev)
>  		PMD_INIT_LOG(ERR, "unable to allocate vf_aq_resp memory");
>  		goto err_aq;
>  	}
> +
> +	/* Drain stale ARQ messages after VF reset */
> +	iavf_drain_arq(hw, vf);
> +

Is it safe to drain ARQ messages before iavf_check_api_version()? Could
the API version negotiation message itself be in the queue and get
inadvertently drained?

> [ ... ]
>  
> diff --git a/drivers/net/intel/iavf/iavf_vchnl.c b/drivers/net/intel/iavf/iavf_vchnl.c
> index 08dd6f2d7f..a014be8b57 100644
> --- a/drivers/net/intel/iavf/iavf_vchnl.c
> +++ b/drivers/net/intel/iavf/iavf_vchnl.c
> @@ -296,11 +296,21 @@ iavf_read_msg_from_pf(struct iavf_adapter *adapter, uint16_t buf_len,
>  					__func__, vpe->event);
>  		}
>  	}  else {

There is a double space before else here. Was this pre-existing or
introduced by this patch?

> -		/* async reply msg on command issued by vf previously */
> +		/* Async reply for previously issued VF command.
> +		 * Stale messages from before reset are ignored, and polling
> +		 * continues until the expected response is received.
> +		 */
>  		result = IAVF_MSG_CMD;
>  		if (opcode != vf->pend_cmd) {
> -			PMD_DRV_LOG(WARNING, "command mismatch, expect %u, get %u",
> -					vf->pend_cmd, opcode);
> +			if (opcode == VIRTCHNL_OP_UNKNOWN)
> +				PMD_DRV_LOG(DEBUG,
> +						"Ignoring stale msg (opcode 0), pending cmd %u",
> +						vf->pend_cmd);

Same indentation issue here. The string should align better per DPDK
style.

> +			else
> +				PMD_DRV_LOG(WARNING,
> +						"command mismatch, expect %u, get %u",
> +						vf->pend_cmd, opcode);
> +

And here as well.

>  			result = IAVF_MSG_ERR;
>  		}
>  	}

Does the logic change properly handle the case where opcode is
VIRTCHNL_OP_UNKNOWN? The code still sets result = IAVF_MSG_ERR in both
cases (opcode 0 and mismatch), but the comment says "polling continues
until the expected response is received". Does the caller actually retry
when it receives IAVF_MSG_ERR for opcode 0, or does it fail?


More information about the test-report mailing list