[PATCH 1/5] ethdev: fix race-condition of proactive error handling mode
    Konstantin Ananyev 
    konstantin.ananyev at huawei.com
       
    Tue Mar  7 10:56:57 CET 2023
    
    
  
> >
> >
> > >>>>
> > >>>> In the proactive error handling mode, the PMD will set the data
> > >>>> path pointers to dummy functions and then try recovery, in this
> > >>>> period the application may still invoking data path API. This will
> > >>>> introduce a race-condition with data path which may lead to crash [1].
> > >>>>
> > >>>> Although the PMD added delay after setting data path pointers to
> > >>>> cover the above race-condition, it reduces the probability, but it
> > >>>> doesn't solve the problem.
> > >>>>
> > >>>> To solve the race-condition problem fundamentally, the following
> > >>>> requirements are added:
> > >>>> 1. The PMD should set the data path pointers to dummy functions after
> > >>>>      report RTE_ETH_EVENT_ERR_RECOVERING event.
> > >>> Do you mean to say, PMD should set the data path pointers after
> > >>> calling the
> > >> call back function?
> > >>> The PMD is running in the context of multiple EAL threads. How do
> > >>> these
> > >> threads synchronize such that only one thread sets these data pointers?
> > >>
> > >> As I understand this event callback supposed to be called in the
> > >> context of EAL interrupt thread (whoever is more familiar with
> > >> original idea, feel free to correct me if I missed something).
> > > I could not figure this out. It looks to be called from the data plane thread
> > context.
> > > I also have a thought on alternate design at the end, appreciate if you can
> > take a look.
> > >
> > >> How it is going to signal data-path threads that they need to
> > >> stop/suspend calling data-path API - that's I suppose is left to application
> > to decide...
> > >> Same as right now it is application responsibility to stop data-path
> > >> threads before doing dev_stop()/dev/_config()/etc.
> > > Ok, good, this expectation is not new. The application must have a
> > mechanism already.
> > >
> > >>
> > >>
> > >>>
> > >>>> 2. The application should stop data path API invocation when process
> > >>>>      the RTE_ETH_EVENT_ERR_RECOVERING event.
> > >>> Any thoughts on how an application can do this?
> > > We can ignore this question as there is already similar expectation set for
> > earlier functionalities.
> > >
> > >>>
> > >>>> 3. The PMD should set the data path pointers to valid functions before
> > >>>>      report RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> > >>>> 4. The application should enable data path API invocation when process
> > >>>>      the RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> > >>> Do you mean to say that the application should not call the datapath
> > >>> APIs
> > >> while the PMD is running the recovery process?
> > >>
> > >> Yes, I believe that's the intention.
> > > Ok, this is good and makes sense.
> > >
> > >>
> > >>>>
> > >>>> Also, this patch introduce a driver internal function
> > >>>> rte_eth_fp_ops_setup which used as an help function for PMD.
> > >>>>
> > >>>> [1]
> > >>>>
> > >>
> > http://patchwork.dpdk.org/project/dpdk/patch/20230220060839.1267349-2
> > >>>> -
> > >>>> ashok.k.kaladi at intel.com/
> > >>>>
> > >>>> Fixes: eb0d471a8941 ("ethdev: add proactive error handling mode")
> > >>>> Cc: stable at dpdk.org
> > >>>>
> > >>>> Signed-off-by: Chengwen Feng <fengchengwen at huawei.com>
> > >>>> ---
> > >>>>    doc/guides/prog_guide/poll_mode_drv.rst | 20 +++++++---------
> > >>>>    lib/ethdev/ethdev_driver.c              |  8 +++++++
> > >>>>    lib/ethdev/ethdev_driver.h              | 10 ++++++++
> > >>>>    lib/ethdev/rte_ethdev.h                 | 32 +++++++++++++++----------
> > >>>>    lib/ethdev/version.map                  |  1 +
> > >>>>    5 files changed, 46 insertions(+), 25 deletions(-)
> > >>>>
> > >>>> diff --git a/doc/guides/prog_guide/poll_mode_drv.rst
> > >>>> b/doc/guides/prog_guide/poll_mode_drv.rst
> > >>>> index c145a9066c..e380ff135a 100644
> > >>>> --- a/doc/guides/prog_guide/poll_mode_drv.rst
> > >>>> +++ b/doc/guides/prog_guide/poll_mode_drv.rst
> > >>>> @@ -638,14 +638,9 @@ different from the application invokes
> > >>>> recovery in PASSIVE mode,  the PMD automatically recovers from
> > >>>> error in PROACTIVE mode,  and only a small amount of work is
> > >>>> required for the
> > >> application.
> > >>>>
> > >>>> -During error detection and automatic recovery, -the PMD sets the
> > >>>> data path pointers to dummy functions -(which will prevent the
> > >>>> crash), -and also make sure the control path operations fail with a
> > >>>> return
> > >> code ``-EBUSY``.
> > >>>> -
> > >>>> -Because the PMD recovers automatically, -the application can only
> > >>>> sense that the data flow is disconnected for a while -and the
> > >>>> control API returns an error in this period.
> > >>>> +During error detection and automatic recovery, the PMD sets the
> > >>>> +data path pointers to dummy functions and also make sure the
> > >>>> +control path operations failed with a return code ``-EBUSY``.
> > >>>>
> > >>>>    In order to sense the error happening/recovering,  as well as to
> > >>>> restore some additional configuration, @@ -653,9 +648,9 @@ three
> > >>>> events
> > >> are available:
> > >>>>
> > >>>>    ``RTE_ETH_EVENT_ERR_RECOVERING``
> > >>>>       Notify the application that an error is detected
> > >>>> -   and the recovery is being started.
> > >>>> +   and the recovery is about to start.
> > >>>>       Upon receiving the event, the application should not invoke
> > >>>> -   any control path function until receiving
> > >>>> +   any control and data path API until receiving
> > >>>>       ``RTE_ETH_EVENT_RECOVERY_SUCCESS`` or
> > >>>> ``RTE_ETH_EVENT_RECOVERY_FAILED`` event.
> > >>>>
> > >>>>    .. note::
> > >>>> @@ -666,8 +661,9 @@ three events are available:
> > >>>>
> > >>>>    ``RTE_ETH_EVENT_RECOVERY_SUCCESS``
> > >>>>       Notify the application that the recovery from error is successful,
> > >>>> -   the PMD already re-configures the port,
> > >>>> -   and the effect is the same as a restart operation.
> > >>>> +   the PMD already re-configures the port.
> > >>>> +   The application should restore some additional configuration,
> > >>>> + and then
> > >>> What is the additional configuration? Is this specific to each NIC/PMD?
> > >>> I thought, this is an auto recovery process and the application does
> > >>> not require
> > >> to reconfigure anything. If the application has to restore the
> > >> configuration, how does auto recovery differ from typical recovery
> > process?
> > >>>
> > >>>> +   enable data path API invocation.
> > >>>>
> > >>>>    ``RTE_ETH_EVENT_RECOVERY_FAILED``
> > >>>>       Notify the application that the recovery from error failed,
> > >>>> diff --git a/lib/ethdev/ethdev_driver.c
> > >>>> b/lib/ethdev/ethdev_driver.c index
> > >>>> 0be1e8ca04..f994653fe9 100644
> > >>>> --- a/lib/ethdev/ethdev_driver.c
> > >>>> +++ b/lib/ethdev/ethdev_driver.c
> > >>>> @@ -515,6 +515,14 @@ rte_eth_dma_zone_free(const struct
> > rte_eth_dev
> > >>>> *dev, const char *ring_name,
> > >>>>    	return rc;
> > >>>>    }
> > >>>>
> > >>>> +void
> > >>>> +rte_eth_fp_ops_setup(struct rte_eth_dev *dev) {
> > >>>> +	if (dev == NULL)
> > >>>> +		return;
> > >>>> +	eth_dev_fp_ops_setup(rte_eth_fp_ops + dev->data->port_id, dev); }
> > >>>> +
> > >>>>    const struct rte_memzone *
> > >>>>    rte_eth_dma_zone_reserve(const struct rte_eth_dev *dev, const
> > >>>> char *ring_name,
> > >>>>    			 uint16_t queue_id, size_t size, unsigned int align, diff
> > -
> > >> -git
> > >>>> a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h index
> > >>>> 2c9d615fb5..0d964d1f67 100644
> > >>>> --- a/lib/ethdev/ethdev_driver.h
> > >>>> +++ b/lib/ethdev/ethdev_driver.h
> > >>>> @@ -1621,6 +1621,16 @@ int
> > >>>>    rte_eth_dma_zone_free(const struct rte_eth_dev *eth_dev, const
> > >>>> char *name,
> > >>>>    		 uint16_t queue_id);
> > >>>>
> > >>>> +/**
> > >>>> + * @internal
> > >>>> + * Setup eth fast-path API to ethdev values.
> > >>>> + *
> > >>>> + * @param dev
> > >>>> + *  Pointer to struct rte_eth_dev.
> > >>>> + */
> > >>>> +__rte_internal
> > >>>> +void rte_eth_fp_ops_setup(struct rte_eth_dev *dev);
> > >>>> +
> > >>>>    /**
> > >>>>     * @internal
> > >>>>     * Atomically set the link status for the specific device.
> > >>>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> > >>>> index
> > >>>> 049641d57c..44ee7229c1 100644
> > >>>> --- a/lib/ethdev/rte_ethdev.h
> > >>>> +++ b/lib/ethdev/rte_ethdev.h
> > >>>> @@ -3944,25 +3944,28 @@ enum rte_eth_event_type {
> > >>>>    	 */
> > >>>>    	RTE_ETH_EVENT_RX_AVAIL_THRESH,
> > >>>>    	/** Port recovering from a hardware or firmware error.
> > >>>> -	 * If PMD supports proactive error recovery,
> > >>>> -	 * it should trigger this event to notify application
> > >>>> -	 * that it detected an error and the recovery is being started.
> > >>>> -	 * Upon receiving the event, the application should not invoke any
> > >>>> control path API
> > >>>> -	 * (such as rte_eth_dev_configure/rte_eth_dev_stop...) until receiving
> > >>>> -	 * RTE_ETH_EVENT_RECOVERY_SUCCESS or
> > >>>> RTE_ETH_EVENT_RECOVERY_FAILED event.
> > >>>> -	 * The PMD will set the data path pointers to dummy functions,
> > >>>> -	 * and re-set the data path pointers to non-dummy functions
> > >>>> -	 * before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> > >>>> -	 * It means that the application cannot send or receive any packets
> > >>>> -	 * during this period.
> > >>>> +	 *
> > >>>> +	 * If PMD supports proactive error recovery, it should trigger this
> > >>>> +	 * event to notify application that it detected an error and the
> > >>>> +	 * recovery is about to start.
> > >>>> +	 *
> > >>>> +	 * Upon receiving the event, the application should not invoke any
> > >>>> +	 * control and data path API until receiving
> > >>>> +	 * RTE_ETH_EVENT_RECOVERY_SUCCESS or
> > >>>> RTE_ETH_EVENT_RECOVERY_FAILED
> > >>>> +	 * event.
> > >>>> +	 *
> > >>>> +	 * Once this event is reported, the PMD will set the data path pointers
> > >>>> +	 * to dummy functions, and re-set the data path pointers to valid
> > >>>> +	 * functions before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS
> > >>>> event.
> > >>> Why do we need to set the data path pointers to dummy functions if
> > >>> the
> > >> application is restricted from invoking any control and data path
> > >> APIs till the recovery process is completed?
> > >>
> > >> You are right, in theory it is not mandatory.
> > >> Though it helps to flag a problem if user will still try to call them
> > >> while recovery is in progress.
> > > Ok, may be in debug mode.
> > > I mean, we have already set an expectation to the application that it should
> > not call and the application has implemented a method to do the same. Why
> > do we need to complicate this?
> > > If the application calls the APIs, it is a programming error.
> >
> >
> > My preference would be to keep it this way for both debug and non-debug
> > mode.
> > It doesn't cost anything to us in terms of perfomance, but helps to catch
> > problems with wrong behaving app.
> 
> This is also causing a synchronization problem. i.e. if this has to be done correctly, we need to use correct synchronization
> mechanisms.
> We cannot set the function pointers and assume that data will be visible to other threads/cores in the correct order.
> A possible mechanism (though I see some problems with this) could be to use a guard variable, which indicates when it is safe to use
> the function pointers on the data plane threads. This would require a load-acquire in the data plane threads.
I do realize that it doesn't provide any synchronization by itself. 
It is just best effort approach (no guarantee) to flag a possible problem to the app developer/maintainer, nothing more.
But it showed itself usefull already - as I remember we cached few bugs with it for dev_stop, etc.
Plus it costs us nothing in terms of performance, so why not to have it.
> >
> > >
> > >> Again, same as we doing in dev_stop().
> > >
> > >>
> > >>>
> > >>>> +	 *
> > >>>>    	 * @note Before the PMD reports the recovery result,
> > >>>>    	 * the PMD may report the RTE_ETH_EVENT_ERR_RECOVERING event
> > >>>> again,
> > >>>>    	 * because a larger error may occur during the recovery.
> > >>>>    	 */
> > >>>>    	RTE_ETH_EVENT_ERR_RECOVERING,
> > >>> I understand this is not a change in this patch. But, just
> > >>> wondering, what is the
> > >> purpose of this? How is the application supposed to use this?
> > >>>
> > >>>>    	/** Port recovers successfully from the error.
> > >>>> -	 * The PMD already re-configured the port,
> > >>>> -	 * and the effect is the same as a restart operation.
> > >>>> +	 *
> > >>>> +	 * The PMD already re-configured the port:
> > >>>>    	 * a) The following operation will be retained: (alphabetically)
> > >>>>    	 *    - DCB configuration
> > >>>>    	 *    - FEC configuration
> > >>>> @@ -3989,6 +3992,9 @@ enum rte_eth_event_type {
> > >>>>    	 *      (@see RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP)
> > >>>>    	 * c) Any other configuration will not be stored
> > >>>>    	 *    and will need to be re-configured.
> > >>>> +	 *
> > >>>> +	 * The application should restore some additional configuration
> > >>>> +	 * (see above case b/c), and then enable data path API invocation.
> > >>>>    	 */
> > >>>>    	RTE_ETH_EVENT_RECOVERY_SUCCESS,
> > >>>>    	/** Port recovery failed.
> > >>>> diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map index
> > >>>> 357d1a88c0..c273e0bdae 100644
> > >>>> --- a/lib/ethdev/version.map
> > >>>> +++ b/lib/ethdev/version.map
> > >>>> @@ -320,6 +320,7 @@ INTERNAL {
> > >>>>    	rte_eth_devices;
> > >>>>    	rte_eth_dma_zone_free;
> > >>>>    	rte_eth_dma_zone_reserve;
> > >>>> +	rte_eth_fp_ops_setup;
> > >>>>    	rte_eth_hairpin_queue_peer_bind;
> > >>>>    	rte_eth_hairpin_queue_peer_unbind;
> > >>>>    	rte_eth_hairpin_queue_peer_update;
> > >>>> --
> > >>>> 2.17.1
> > >>>
> > >
> > > Is there any reason not to design this in the same way as
> > 'rte_eth_dev_reset'? Why does the PMD have to recover by itself?
> >
> > I suppose it is a question for the authors of original patch...
> Appreciate if the authors could comment on this.
> 
> >
> > > We could have a similar API 'rte_eth_dev_recover' to do the recovery
> > functionality.
> >
> > I suppose such approach is also possible.
> > Personally I am fine with both ways: either existing one or what you propose,
> > as long as we'll fix existing race-condition.
> > What is good with what you suggest - that way we probably don't need to
> > worry how to allow user to enable/disable auto-recovery inside PMD.
> >
> > Konstantin
> >
    
    
More information about the dev
mailing list