<div dir="ltr"><div dir="ltr">Thank you Ferruh for the review. Please see inline.</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Feb 1, 2022 at 5:41 PM Ferruh Yigit <<a href="mailto:ferruh.yigit@intel.com">ferruh.yigit@intel.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On 1/28/2022 12:48 PM, Kalesh A P wrote:<br>
> From: Kalesh AP <<a href="mailto:kalesh-anakkur.purayil@broadcom.com" target="_blank">kalesh-anakkur.purayil@broadcom.com</a>><br>
> <br>
> Adding support for the device reset and recovery events in the<br>
> rte_eth_event framework. FW error and FW reset conditions would be<br>
> managed internally by the PMD without needing application intervention.<br>
> In such cases, PMD would need reset/recovery events to notify application<br>
> that PMD is undergoing a reset.<br>
> <br>
> While most of the recovery process is transparent to the application since<br>
> most of the driver ensures recovery from FW reset or FW error conditions,<br>
> the application will have to reprogram any flows which were offloaded to<br>
> the underlying hardware.<br>
> <br>
> Signed-off-by: Kalesh AP <<a href="mailto:kalesh-anakkur.purayil@broadcom.com" target="_blank">kalesh-anakkur.purayil@broadcom.com</a>><br>
> Signed-off-by: Somnath Kotur <<a href="mailto:somnath.kotur@broadcom.com" target="_blank">somnath.kotur@broadcom.com</a>><br>
> Reviewed-by: Ajit Khaparde <<a href="mailto:ajit.khaparde@broadcom.com" target="_blank">ajit.khaparde@broadcom.com</a>><br>
<br>
More developer cc'ed.<br>
<br>
> ---<br>
> doc/guides/prog_guide/poll_mode_drv.rst | 24 ++++++++++++++++++++++++<br>
> lib/ethdev/rte_ethdev.h | 18 ++++++++++++++++++<br>
> 2 files changed, 42 insertions(+)<br>
> <br>
> diff --git a/doc/guides/prog_guide/poll_mode_drv.rst b/doc/guides/prog_guide/poll_mode_drv.rst<br>
> index 6831289..9ecc0e4 100644<br>
> --- a/doc/guides/prog_guide/poll_mode_drv.rst<br>
> +++ b/doc/guides/prog_guide/poll_mode_drv.rst<br>
> @@ -623,3 +623,27 @@ by application.<br>
> The PMD itself should not call rte_eth_dev_reset(). The PMD can trigger<br>
> the application to handle reset event. It is duty of application to<br>
> handle all synchronization before it calls rte_eth_dev_reset().<br>
> +<br>
> +Error recovery support<br>
> +~~~~~~~~~~~~~~~~~~~~~~<br>
> +<br>
> +When the PMD detects a FW reset or error condition, it may try to recover<br>
> +from the error without needing the application intervention. In such cases,<br>
> +PMD would need events to notify the application that it is undergoing<br>
> +an error recovery.<br>
> +<br>
> +The PMD should trigger RTE_ETH_EVENT_ERR_RECOVERING event to notify the<br>
> +application that PMD detected a FW reset or FW error condition. PMD may<br>
> +try to recover from the error by itself. Data path may be quiesced and<br>
> +control path operations may fail during the recovery period. The application<br>
> +should stop polling till it receives RTE_ETH_EVENT_RECOVERED event from the PMD.<br>
> +<br>
<br>
Between the time FW error occurred and the application receive the RECOVERING event,<br>
datapath will continue to poll and application may call control APIs, so the event<br>
really is not solving the issue and driver somehow should be sure this won't crash<br>
the application, in that case not sure about the benefit of this event.<br></blockquote><div>[Kalesh]: As soon as the driver detects a FW dead or reset condition, it sets the fastpath pointers to dummy functions. This will prevent the crash. All control path operations would fail with -EBUSY. This change is already there in bnxt PMD. This event is a notification to the application that the PMD is recovering from a FW error condition so that it can stop polling and issue control path operations.</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
> +The PMD should trigger RTE_ETH_EVENT_RECOVERED event to notify the application<br>
> +that the it has recovered from the error condition. PMD re-configures the port<br>
> +to the state prior to the error condition. Control path and data path are up now.<br>
> +Since the device has undergone a reset, flow rules offloaded prior to reset<br>
> +may be lost and the application should recreate the rules again.<br>
> +<br>
<br>
I think the most difficult part here is clarify what application should do<br>
when this event received consistent for all devices, "flow rules may be lost"<br>
looks very vague to me.<br>
Unless it is not clear for application what to do when this event is received,<br>
it is not that useful or it will be specific to some PMDs. And I can see it is<br>
hard to clarify this but perhaps we can define a set of common behavior.<br>
Not sure what others are thinking.<br></blockquote><div>[Kalesh]: Sure, let's wait for others' opinions as well.</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
> +The PMD should trigger RTE_ETH_EVENT_INTR_RMV event to notify the application<br>
> +that it has failed to recover from the error condition. The device may not be<br>
> +usable anymore.<br>
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h<br>
> index 147cc1c..a46819f 100644<br>
> --- a/lib/ethdev/rte_ethdev.h<br>
> +++ b/lib/ethdev/rte_ethdev.h<br>
> @@ -3818,6 +3818,24 @@ enum rte_eth_event_type {<br>
> RTE_ETH_EVENT_DESTROY, /**< port is released */<br>
> RTE_ETH_EVENT_IPSEC, /**< IPsec offload related event */<br>
> RTE_ETH_EVENT_FLOW_AGED,/**< New aged-out flows is detected */<br>
> + RTE_ETH_EVENT_ERR_RECOVERING,<br>
> + /**< port recovering from an error<br>
> + *<br>
> + * PMD detected a FW reset or error condition.<br>
> + * PMD will try to recover from the error.<br>
> + * Data path may be quiesced and Control path operations<br>
> + * may fail at this time.<br>
> + */<br>
<br>
Please put multi line comments before enum, Andrew did a set of cleanups for these.<br></blockquote><div>[Kalesh]: Sure, will do. </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
> + RTE_ETH_EVENT_RECOVERED,<br>
> + /**< port recovered from an error<br>
> + *<br>
> + * PMD has recovered from the error condition.<br>
> + * Control path and Data path are up now.<br>
> + * PMD re-configures the port to the state prior to the error.<br>
> + * Since the device has undergone a reset, flow rules<br>
> + * offloaded prior to reset may be lost and<br>
> + * the application should recreate the rules again.<br>
> + */<br>
> RTE_ETH_EVENT_MAX /**< max value of this enum */<br>
> };<br>
> <br>
<br>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr">Regards,<div>Kalesh A P</div></div></div></div>