[PATCH 1/2] net/iavf: support VF initiated resets
Loftus, Ciara
ciara.loftus at intel.com
Fri Oct 17 16:35:32 CEST 2025
> > > > > >
> > > > > > 03/10/2025 12:29, Loftus, Ciara:
> > > > > > > > > > Introduce a function that allows a VF to request the PF to reset
> > > > itself.
> > > > > > > > > > This is useful for example when the application detects that one
> > of
> > > > the
> > > > > > > > > > queues have hung or any event where a reset is required and
> the
> > PF
> > > > is
> > > > > > > > > > unlikely to trigger it.
> > > > > > > > > >
> > > > > > [...]
> > > > > > > > >
> > > > > > > > > In general, I'm not a huge fan of adding driver-specific functions
> > and I
> > > > > > > > > feel like this should fit under the existing reset APIs in some way.
> > That
> > > > > > > > > should avoid the need to update (or such a big update) to
> > testpmd,
> > > > for
> > > > > > > > > example.
> > > > > > > > > Some thoughts here:
> > > > > > > > >
> > > > > > > > > 1. we could add a devarg to the driver to adjust whether reset
> > does a
> > > > > > > > > "softer" reset of the VF just resetting itself, or a "hard" reset
> > where
> > > > the
> > > > > > > > > PF does a fuller reset of the VF.
> > > > > > > > > 2. rather than a devarg, would do use a driver-specific function
> to
> > > > adjust
> > > > > > > > > this behaviour. The difference here would be that the driver-
> > specific
> > > > > > > > > function would be an init-time one rather than runtime, so the
> > > > runtime
> > > > > > of
> > > > > > > > > the app, like testpmd, would be generic.
> > > > > > > > > 3. the most generic solution would be to add an additional
> > parameter
> > > > to
> > > > > > the
> > > > > > > > > reset() function itself to specify a hard or soft reset. This would
> > > > mean
> > > > > > > > > updating all drivers to handle the new parameter (shouldn't be
> > hard,
> > > > > > since
> > > > > > > > > it would be __rte_unused in all cases by default). This also
> opens
> > up
> > > > > > > > > the possibility of other drivers - especially VFs - using it in the
> > same
> > > > > > > > > way. We could actually document that the "hard" option "may
> be
> > > > used
> > > > > > by
> > > > > > > > VF
> > > > > > > > > drivers to request a full reset of the VF by the PF".
> > > > > > > >
> > > > > > > > Hi Bruce,
> > > > > > > >
> > > > > > > > I did not make it clear in my commit messages the full purpose of
> > this
> > > > new
> > > > > > API.
> > > > > > > > Along with resetting the VF, the VF is also reconfigured and
> restarted
> > > > > > > > transparently, using the existing iavf_handle_hw_reset
> > implementation.
> > > > > > > > While there is probably merit in extending the reset API to include
> a
> > > > > > soft/hard
> > > > > > > > reset flag, I would still need this new API to fulfil the purpose
> > described
> > > > > > above.
> > > > > > > > If we wanted to make this generic I see two options:
> > > > > > > > 1. extend the reset API to optionally reconfigure and restart
> > > > > > > > 2. introduce a new generic API that performs a reset followed by a
> > > > > > reconfigure
> > > > > > > > and restart.
> > > > > > >
> > > > > > > I've submitted a v2 and renamed the function to "restore" instead of
> > > > "reset"
> > > > > > to
> > > > > > > better describe the behaviour of the function - not just resetting,
> but
> > also
> > > > > > > restoring configuration afterwards and restarting the device.
> > > > > > >
> > > > > > > I agree that it would be best to avoid a driver-specific function like
> > this.
> > > > After
> > > > > > some
> > > > > > > thought I don't think we can extend the reset API to include this
> > > > behaviour,
> > > > > > the
> > > > > > > reset API should probably leave the device in a reset state.
> > > > > > >
> > > > > > > I'm wondering if the community would be in favour of creating a
> new
> > > > ethdev
> > > > > > API
> > > > > > > for restoring/reinitialising a device. It would essentially comprise of
> > reset,
> > > > > > > reconfigure and queue setup, and optionally device start?
> > > > > >
> > > > > > It is confusing to add an API if it is a perfect overlap
> > > > > > of what exists already.
> > > > > > If you want to add a new API, you need to explain how it is different
> > > > > > of a combination of reset/configure/start.
> > > > >
> > > > > Apologies for any confusion.
> > > > > The new API would perform only the necessary reconfiguration
> > > > > and setup steps under-the-hood and use the existing device
> > > > > configuration. This would differ from a combination of
> > > > > reset/reconfigure/start in two ways:
> > > > > 1. In some cases (at least in the case of the iavf driver), a
> > > > > full rte_eth_dev_configure is not required after an rte_eth_dev_reset.
> > > > > So the new API could just call the device callbacks for those functions,
> > > > > rather than forcing the user to call the ethdev functions and perform
> > > > > some unnecessary reconfiguration steps.
> > > > > 2. It takes the burden off the user to build up the device configuration
> > > > > again after the reset. Any pertinent configuration can be stored before
> > > > > the reset and used to reconfigure the device transparently. All the user
> > > > > needs to supply is the port id.
> > > > >
> > > > > Anyway, I've submitted a v3 of the non-generic version of this for the
> > > > > iavf driver, so if there is no appetite for the generic approach I'll
> > > > > continue to pursue the driver-specific one.
> > > >
> > > > A generic API is way better.
> > > >
> > > > I'm trying to understand what is missing in the current API.
> > > >
> > > > I think what you want to achieve is
> > > > RTE_ETH_ERROR_HANDLE_MODE_PROACTIVE.
> > > > What is missing exactly to implement it with the generic API?
> > > >
> > >
> > > Thanks for the recommendation, I think there is probably a place in the
> > > iavf driver for RTE_ETH_ERROR_HANDLE_MODE_PROACTIVE when we
> > > detect a PF-to-VF reset. However I don't think it lends itself to this
> > > problem.
> > >
> > > What is missing is the trigger for the sequence to start.
> > > RTE_ETH_ERROR_HANDLE_MODE_PROACTIVE seems to be concerned
> > > with recovering from hardware errors detected by the PMD. There
> > > doesn't seem to be a way to ask the PMD to enter the recovery mode,
> > > you must wait for a detected error - please correct me if I am wrong.
> > >
> > > What I am looking to implement is an API that will allow the user to
> > > trigger the PMD to perform a reset followed by recovery (configure,
> > > setup and device start). Yes this can be done using existing APIs:
> > > rte_eth_dev_reset
> > > rte_eth_dev_configure
> > > rte_eth_rx_queue_setup
> > > rte_eth_tx_queue_setup
> > > rte_eth_dev_start
> > >
> > > But that involves potentially unnecessary reconfiguration and more
> > > burden on the user in building up the device conf, versus an API that
> > > just accepts the port_id.
> > >
> > > The user may want to tear down the device and bring it straight back
> > > up again in cases where something has gone wrong eg. a queue
> > > appears to be hung, and resetting it may help resolve the issue.
> >
> > OK your description makes sense.
> >
> > Please draft a RFC of the API with all the needed explanations,
> > so we can discuss it in more details.
> > Or are you asking the techboard to discuss it before the draft?
>
> I'll submit a draft, I think it's the best way to proceed. Thanks
> for your input.
I submitted an RFC here:
https://patches.dpdk.org/project/dpdk/list/?series=36391
Ciara
More information about the dev
mailing list