|SUCCESS| net/iavf: fix VF reset race and stale ARQ message handling
dpdklab at iol.unh.edu
dpdklab at iol.unh.edu
Mon May 18 09:23:43 CEST 2026
Test-Label: ai-code-review
Test-Status: SUCCESS
https://dpdk.org/patch/164052
_AI Code Review_
=== Patch Review: series.patch (via Claude) ===
Review date: 2026-05-18
Reviewing this patch for DPDK coding standards and guideline compliance.
Commit: Not specified in patch
Author: Talluri Chaitanyababu <chaitanyababux.talluri at intel.com>
Subject: [PATCH] net/iavf: fix VF reset race and stale ARQ message handling
This patch addresses VF reset race conditions and stale ARQ message
handling issues in the iavf driver by introducing two-phase reset
detection and ARQ message draining.
> diff --git a/drivers/net/intel/iavf/iavf_ethdev.c b/drivers/net/intel/iavf/iavf_ethdev.c
> index 1eca20bc9a..692e4455dc 100644
> --- a/drivers/net/intel/iavf/iavf_ethdev.c
> +++ b/drivers/net/intel/iavf/iavf_ethdev.c
> @@ -2002,11 +2002,36 @@ iavf_dev_rx_queue_intr_disable(struct rte_eth_dev *dev, uint16_t queue_id)
> return 0;
> }
>
> +/* Wait until PF acknowledges VF reset (RSTAT leaves VFACTIVE) */
> +static int
> +iavf_wait_for_reset_start(struct iavf_hw *hw)
> +{
> + int i;
> + uint32_t rstat;
Does this function need a timeout parameter? The hardcoded 100
iterations with 10ms delays equals 1 second timeout, but is this value
optimal for all hardware platforms? Consider adding a #define for the
iteration count for maintainability.
> +
> + for (i = 0; i < 100; i++) {
> + rte_delay_ms(10);
> +
> + rstat = IAVF_READ_REG(hw, IAVF_VFGEN_RSTAT);
> + rstat &= IAVF_VFGEN_RSTAT_VFR_STATE_MASK;
> + rstat >>= IAVF_VFGEN_RSTAT_VFR_STATE_SHIFT;
> +
> + if (rstat != VIRTCHNL_VFR_VFACTIVE)
> + return 0;
> + }
> +
> + return -1;
This returns -1 on timeout, but is this checked properly by the caller?
> +}
> +
> static int
> iavf_check_vf_reset_done(struct iavf_hw *hw)
> {
> int i, reset;
>
> + /* Phase 1: wait for reset to start (leave VFACTIVE) */
> + if (iavf_wait_for_reset_start(hw) != 0)
> + PMD_DRV_LOG(DEBUG, "VF reset did not start within timeout");
Does this code continue after the timeout with only a DEBUG message? If
iavf_wait_for_reset_start() fails, should iavf_check_vf_reset_done()
also return failure? The current implementation appears to proceed with
Phase 2 even if Phase 1 times out, which could mask initialization
problems.
> +
> for (i = 0; i < IAVF_RESET_WAIT_CNT; i++) {
> reset = IAVF_READ_REG(hw, IAVF_VFGEN_RSTAT) &
> IAVF_VFGEN_RSTAT_VFR_STATE_MASK;
[ ... ]
> @@ -2517,6 +2542,31 @@ iavf_init_proto_xtr(struct rte_eth_dev *dev)
> }
> }
>
> +/* Drain stale Admin Receive Queue messages after reset */
> +static void
> +iavf_drain_arq(struct iavf_hw *hw, struct iavf_info *vf)
> +{
> + struct iavf_arq_event_info event;
> + int drain_count = 0;
> +
> + memset(&event, 0, sizeof(event));
> + event.msg_buf = vf->aq_resp;
> +
> + while (drain_count < IAVF_AQ_LEN) {
> + event.buf_len = IAVF_AQ_BUF_SZ;
> +
> + if (iavf_clean_arq_element(hw, &event, NULL) != IAVF_SUCCESS)
> + break;
> +
> + drain_count++;
Does this code properly handle potential infinite loops? If
iavf_clean_arq_element() keeps returning SUCCESS but the queue never
empties, could this spin for IAVF_AQ_LEN iterations unnecessarily? Also,
is reusing vf->aq_resp safe here? Could this interfere with other
operations that might be using the same buffer?
> + }
> +
> + if (drain_count > 0)
> + PMD_INIT_LOG(DEBUG,
> + "Drained %d stale ARQ messages",
> + drain_count);
The indentation here appears inconsistent with DPDK style. The string
argument should start on the same line as PMD_INIT_LOG, or if it must
wrap, align with the opening parenthesis.
> +}
> +
> static int
> iavf_init_vf(struct rte_eth_dev *dev)
> {
> @@ -2558,6 +2608,10 @@ iavf_init_vf(struct rte_eth_dev *dev)
> PMD_INIT_LOG(ERR, "unable to allocate vf_aq_resp memory");
> goto err_aq;
> }
> +
> + /* Drain stale ARQ messages after VF reset */
> + iavf_drain_arq(hw, vf);
> +
Is it safe to drain ARQ messages before iavf_check_api_version()? Could
the API version negotiation message itself be in the queue and get
inadvertently drained?
> [ ... ]
>
> diff --git a/drivers/net/intel/iavf/iavf_vchnl.c b/drivers/net/intel/iavf/iavf_vchnl.c
> index 08dd6f2d7f..a014be8b57 100644
> --- a/drivers/net/intel/iavf/iavf_vchnl.c
> +++ b/drivers/net/intel/iavf/iavf_vchnl.c
> @@ -296,11 +296,21 @@ iavf_read_msg_from_pf(struct iavf_adapter *adapter, uint16_t buf_len,
> __func__, vpe->event);
> }
> } else {
There is a double space before else here. Was this pre-existing or
introduced by this patch?
> - /* async reply msg on command issued by vf previously */
> + /* Async reply for previously issued VF command.
> + * Stale messages from before reset are ignored, and polling
> + * continues until the expected response is received.
> + */
> result = IAVF_MSG_CMD;
> if (opcode != vf->pend_cmd) {
> - PMD_DRV_LOG(WARNING, "command mismatch, expect %u, get %u",
> - vf->pend_cmd, opcode);
> + if (opcode == VIRTCHNL_OP_UNKNOWN)
> + PMD_DRV_LOG(DEBUG,
> + "Ignoring stale msg (opcode 0), pending cmd %u",
> + vf->pend_cmd);
Same indentation issue here. The string should align better per DPDK
style.
> + else
> + PMD_DRV_LOG(WARNING,
> + "command mismatch, expect %u, get %u",
> + vf->pend_cmd, opcode);
> +
And here as well.
> result = IAVF_MSG_ERR;
> }
> }
Does the logic change properly handle the case where opcode is
VIRTCHNL_OP_UNKNOWN? The code still sets result = IAVF_MSG_ERR in both
cases (opcode 0 and mismatch), but the comment says "polling continues
until the expected response is received". Does the caller actually retry
when it receives IAVF_MSG_ERR for opcode 0, or does it fail?
More information about the test-report
mailing list