[dpdk-dev] [PATCH v7 1/4] ethdev: support device reset and recovery events

Ferruh Yigit ferruh.yigit at intel.com
Tue Feb 1 13:52:18 CET 2022


On 1/28/2022 12:48 PM, Kalesh A P wrote:
> From: Kalesh AP <kalesh-anakkur.purayil at broadcom.com>
> 
> Adding support for the device reset and recovery events in the
> rte_eth_event framework. FW error and FW reset conditions would be
> managed internally by the PMD without needing application intervention.
> In such cases, PMD would need reset/recovery events to notify application
> that PMD is undergoing a reset.
> 
> While most of the recovery process is transparent to the application since
> most of the driver ensures recovery from FW reset or FW error conditions,
> the application will have to reprogram any flows which were offloaded to
> the underlying hardware.
> 
> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil at broadcom.com>
> Signed-off-by: Somnath Kotur <somnath.kotur at broadcom.com>
> Reviewed-by: Ajit Khaparde <ajit.khaparde at broadcom.com>
> ---
>   doc/guides/prog_guide/poll_mode_drv.rst | 24 ++++++++++++++++++++++++
>   lib/ethdev/rte_ethdev.h                 | 18 ++++++++++++++++++
>   2 files changed, 42 insertions(+)
> 
> diff --git a/doc/guides/prog_guide/poll_mode_drv.rst b/doc/guides/prog_guide/poll_mode_drv.rst
> index 6831289..9ecc0e4 100644
> --- a/doc/guides/prog_guide/poll_mode_drv.rst
> +++ b/doc/guides/prog_guide/poll_mode_drv.rst
> @@ -623,3 +623,27 @@ by application.
>   The PMD itself should not call rte_eth_dev_reset(). The PMD can trigger
>   the application to handle reset event. It is duty of application to
>   handle all synchronization before it calls rte_eth_dev_reset().
> +
> +Error recovery support
> +~~~~~~~~~~~~~~~~~~~~~~
> +
> +When the PMD detects a FW reset or error condition, it may try to recover
> +from the error without needing the application intervention. In such cases,
> +PMD would need events to notify the application that it is undergoing
> +an error recovery.
> +
> +The PMD should trigger RTE_ETH_EVENT_ERR_RECOVERING event to notify the
> +application that PMD detected a FW reset or FW error condition. PMD may
> +try to recover from the error by itself. Data path may be quiesced and
> +control path operations may fail during the recovery period. The application
> +should stop polling till it receives RTE_ETH_EVENT_RECOVERED event from the PMD.
> +
> +The PMD should trigger RTE_ETH_EVENT_RECOVERED event to notify the application
> +that the it has recovered from the error condition. PMD re-configures the port
> +to the state prior to the error condition. Control path and data path are up now.
> +Since the device has undergone a reset, flow rules offloaded prior to reset
> +may be lost and the application should recreate the rules again.
> +
> +The PMD should trigger RTE_ETH_EVENT_INTR_RMV event to notify the application
> +that it has failed to recover from the error condition. The device may not be
> +usable anymore.
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index 147cc1c..a46819f 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -3818,6 +3818,24 @@ enum rte_eth_event_type {
>   	RTE_ETH_EVENT_DESTROY,  /**< port is released */
>   	RTE_ETH_EVENT_IPSEC,    /**< IPsec offload related event */
>   	RTE_ETH_EVENT_FLOW_AGED,/**< New aged-out flows is detected */
> +	RTE_ETH_EVENT_ERR_RECOVERING,
> +			/**< port recovering from an error
> +			 *
> +			 * PMD detected a FW reset or error condition.
> +			 * PMD will try to recover from the error.
> +			 * Data path may be quiesced and Control path operations
> +			 * may fail at this time.
> +			 */
> +	RTE_ETH_EVENT_RECOVERED,
> +			/**< port recovered from an error
> +			 *
> +			 * PMD has recovered from the error condition.
> +			 * Control path and Data path are up now.
> +			 * PMD re-configures the port to the state prior to the error.
> +			 * Since the device has undergone a reset, flow rules
> +			 * offloaded prior to reset may be lost and
> +			 * the application should recreate the rules again.
> +			 */
>   	RTE_ETH_EVENT_MAX       /**< max value of this enum */


Also ABI check complains about 'RTE_ETH_EVENT_MAX' value check, cc'ed more people
to evaluate if it is a false positive:


1 function with some indirect sub-type change:
   [C] 'function int rte_eth_dev_callback_register(uint16_t, rte_eth_event_type, rte_eth_dev_cb_fn, void*)' at rte_ethdev.c:4637:1 has some indirect sub-type changes:
     parameter 3 of type 'typedef rte_eth_dev_cb_fn' has sub-type changes:
       underlying type 'int (typedef uint16_t, enum rte_eth_event_type, void*, void*)*' changed:
         in pointed to type 'function type int (typedef uint16_t, enum rte_eth_event_type, void*, void*)':
           parameter 2 of type 'enum rte_eth_event_type' has sub-type changes:
             type size hasn't changed
             2 enumerator insertions:
               'rte_eth_event_type::RTE_ETH_EVENT_ERR_RECOVERING' value '11'
               'rte_eth_event_type::RTE_ETH_EVENT_RECOVERED' value '12'
             1 enumerator change:
               'rte_eth_event_type::RTE_ETH_EVENT_MAX' from value '11' to '13' at rte_ethdev.h:3807:1


More information about the dev mailing list