[PATCH 1/5] ethdev: fix race-condition of proactive error handling mode
    Ajit Khaparde 
    ajit.khaparde at broadcom.com
       
    Thu Mar  9 03:05:06 CET 2023
    
    
  
On Tue, Mar 7, 2023 at 4:40 AM Konstantin Ananyev
<konstantin.ananyev at huawei.com> wrote:
>
>
>
> > >>>>>>>>>>> In the proactive error handling mode, the PMD will set the data path
> > >>>>>>>>>>> pointers to dummy functions and then try recovery, in this period the
> > >>>>>>>>>>> application may still invoking data path API. This will introduce a
> > >>>>>>>>>>> race-condition with data path which may lead to crash [1].
> > >>>>>>>>>>>
> > >>>>>>>>>>> Although the PMD added delay after setting data path pointers to cover
> > >>>>>>>>>>> the above race-condition, it reduces the probability, but it doesn't
> > >>>>>>>>>>> solve the problem.
> > >>>>>>>>>>>
> > >>>>>>>>>>> To solve the race-condition problem fundamentally, the following
> > >>>>>>>>>>> requirements are added:
> > >>>>>>>>>>> 1. The PMD should set the data path pointers to dummy functions after
> > >>>>>>>>>>>     report RTE_ETH_EVENT_ERR_RECOVERING event.
> > >>>>>>>>>>> 2. The application should stop data path API invocation when process
> > >>>>>>>>>>>     the RTE_ETH_EVENT_ERR_RECOVERING event.
> > >>>>>>>>>>> 3. The PMD should set the data path pointers to valid functions before
> > >>>>>>>>>>>     report RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> > >>>>>>>>>>> 4. The application should enable data path API invocation when process
> > >>>>>>>>>>>     the RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> > >>>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> How this is solving the race-condition, by pushing responsibility to
> > >>>>>>>>> stop data path to application?
> > >>>>>>>>
> > >>>>>>>> Exactly, it becomes application responsibility to make sure data-path is
> > >>>>>>>> stopped/suspended before recovery will continue.
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>> From documentation of the feature:
> > >>>>>>>
> > >>>>>>> ``
> > >>>>>>> Because the PMD recovers automatically,
> > >>>>>>> the application can only sense that the data flow is disconnected for a
> > >>>>>>> while and the control API returns an error in this period.
> > >>>>>>>
> > >>>>>>> In order to sense the error happening/recovering, as well as to restore
> > >>>>>>> some additional configuration, three events are available:
> > >>>>>>> ``
> > >>>>>>>
> > >>>>>>> It looks like initial design is to use events mainly inform application
> > >>>>>>> about what happened and mainly for re-configuration.
> > >>>>>>>
> > >>>>>>> Although I am don't disagree to involve the application, I am not sure
> > >>>>>>> that is part of current design.
> > >>>>>>
> > >>>>>> I thought we all agreed that initial design contain some fallacies that
> > >>>>>> need to fixed, no?
> > >>>>>> Statement that with current rte_ethdev design error recovery can be done
> > >>>>>> without interaction with the app (to stop/suspend data/control path)
> > >>>>>> is the main one I think.
> > >>>>>> It needs some interaction with app layer, one way or another.
> > >>>>>>
> > >>>>>>>>>
> > >>>>>>>>> What if application is not interested in recovery modes at all and not
> > >>>>>>>>> registered any callback for the recovery?
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> Are you saying there is no way for application to disable
> > >>>>>>>> automatic recovery in PMD if it is not interested
> > >>>>>>>> (or can't full-fill per-requesties for it)?
> > >>>>>>>> If so, then yes it is a problem and we need to fix it.
> > >>>>>>>> I assumed that such mechanism to disable unwanted events already exists,
> > >>>>>>>> but I can't find anything.
> > >>>>>>>> Wonder what would be the easiest way here - can PMD make a decision
> > >>>>>>>> based on callback return value, or do we need a new API to
> > >>>>>>>> enable/disable callbacks, or ...?
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>> As far as I can see automatic recovery is not configurable by app.
> > >>>>>>>
> > >>>>>>> But that is not all, PMD sends events to application but PMD can't know
> > >>>>>>> if application is handling them or not, so with current design PMD can't
> > >>>>>>> rely on to app.
> > >>>>>>
> > >>>>>> Well, PMD invokes user provided callback.
> > >>>>>> One way to fix that problem - if there is no callback provided,
> > >>>>>> or callback returns an error code - PMD can assume that recovery
> > >>>>>> should not be done.
> > >>>>>> That is probably not the best design choice, but at least it will allow
> > >>>>>> to fix the problem without too many changes and introducing new API.
> > >>>>>> That could be sort of a 'quick fix'.
> > >>>>>> In a meanwhile we can think about new/better approach for that.
> > >>>>>>
> > >>>>>
> > >>>>> -rc2 for 23.03 is a few days away.
> > >>>>>
> > >>>>> What do you think to have 'quick fix' as modifying how driver updates
> > >>>>> burst ops to prevent the race condition, for this release?
> > >>
> > >> The 'quick fix', do you mean only update function pointer (without rxq setting) ?
> > >> Currently the PMDs which announced support "proactive error handling mode" already
> > >> do this.
> > >>
> > >
> > > Yes.
> > > I checked hns3, it does as you said, hns3_eth_dev_fp_ops_config()'
> > > updates all fields in 'rte_eth_fp_ops' but only function pointer seems
> > > changed in the driver, resulting only function pointers to be updated.
> > >
> > > The discussion about race condition started with patch [1], which
> > > mentions a crash because of a race condition. Later in discussions,
> > > recovery event given as a sample for where the race can occur, that is
> > > why we are here.
> > >
> > > But after above info, although there is race condition and a bigger
> > > update (that needs application involvement) is required for recovery
> > > mechanism, there is no crash and NO 'quick fix' is required for recovery.
> > >
> > > @Konstantin, @Chengwen, can you please confirm above understanding is
> > > correct?
> >
> > Yes, that's what.
>
> Yes, I think with Chengwen patch the race condition problem should be fixed.
> Though for that user has to provide a properly implemented callback.
> What is not currently addressed - user can not disable this auto-recovery procedure on his will.
> So if user will not provide a proper call-back the recovery can still proceed and race can happen.
Ideally the user or the application should participate in the recovery
to prevent more catastrophic results which may need a system reboot.
Not all scenarios are recoverable, but based on implementation that
could be a very small percentage.
But the application awareness and participation as an end goal is a
good idea nevertheless.
>
> >
> > >
> > >
> > >
> > > [1]
> > > https://patches.dpdk.org/project/dpdk/patch/20230220060839.1267349-2-ashok.k.kaladi@intel.com/
> > >
> > >>>>>
> > >>>>> And plan a design update for the next release?
> > >>>> +1 on the overall approach.
> > >>>
> > >>> Yep, agree.
> > >>
> > >> Hope for better solution.
> > >> And also, I notice only the openvswitch (from all open-source software which based-on DPDK)
> > >> registers RTE_ETH_EVENT_INTR_RESET callback .
> > >>
> > >> Therefore, hope we build a recovery framework at the DPDK SDK level and be compatible
> > >> with RTE_ETH_EVENT_INTR_RESET and RTE_ETH_EVENT_ERR_RECOVERING mechanism.
> > >>
> > >>>
> > >>>>
> > >>>>>
> > >>>>>
> > >>>>>>>
> > >>>>>>>>> I think driver should not rely on application for this, unless
> > >>>>>>>>> application explicitly says (to driver) that it is handling recovery,
> > >>>>>>>>> right now there is no way for driver to know this.
> > >>>>>>>>
> > >>>>>>>> I think it is visa-versa:
> > >>>>>>>> application should not enable auto-recovery if it can't meet
> > >>>>>>>> per-requeststies for it (provide appropriate callback).
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>> I agree on above, we are saying similar thing in different perspective.
> > >>>>>>
> > >>>>>> Ok, that's good we are on the same page.
> > >>>>>>
> > >>>>>>
> > >>>>>>>
> > >>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>>> Also, this patch introduce a driver internal function
> > >>>>>>>>>>> rte_eth_fp_ops_setup which used as an help function for PMD.
> > >>>>>>>>>>>
> > >>>>>>>>>>> [1]
> > >>>>>>>>>>> http://patchwork.dpdk.org/project/dpdk/patch/20230220060839.1267349-2-ashok.k.kaladi@intel.com/
> > >>>>>>>>>>>
> > >>>>>>>>>>> Fixes: eb0d471a8941 ("ethdev: add proactive error handling mode")
> > >>>>>>>>>>> Cc: stable at dpdk.org
> > >>>>>>>>>>>
> > >>>>>>>>>>> Signed-off-by: Chengwen Feng <fengchengwen at huawei.com>
> > >>>>>>>>>>> ---
> > >>>>>>>>>>>   doc/guides/prog_guide/poll_mode_drv.rst | 20 +++++++---------
> > >>>>>>>>>>>   lib/ethdev/ethdev_driver.c              |  8 +++++++
> > >>>>>>>>>>>   lib/ethdev/ethdev_driver.h              | 10 ++++++++
> > >>>>>>>>>>>   lib/ethdev/rte_ethdev.h                 | 32
> > >>>>>>>>>>> +++++++++++++++----------
> > >>>>>>>>>>>   lib/ethdev/version.map                  |  1 +
> > >>>>>>>>>>>   5 files changed, 46 insertions(+), 25 deletions(-)
> > >>>>>>>>>>>
> > >>>>>>>>>>> diff --git a/doc/guides/prog_guide/poll_mode_drv.rst
> > >>>>>>>>>>> b/doc/guides/prog_guide/poll_mode_drv.rst
> > >>>>>>>>>>> index c145a9066c..e380ff135a 100644
> > >>>>>>>>>>> --- a/doc/guides/prog_guide/poll_mode_drv.rst
> > >>>>>>>>>>> +++ b/doc/guides/prog_guide/poll_mode_drv.rst
> > >>>>>>>>>>> @@ -638,14 +638,9 @@ different from the application invokes recovery
> > >>>>>>>>>>> in PASSIVE mode,
> > >>>>>>>>>>>   the PMD automatically recovers from error in PROACTIVE mode,
> > >>>>>>>>>>>   and only a small amount of work is required for the application.
> > >>>>>>>>>>>
> > >>>>>>>>>>> -During error detection and automatic recovery,
> > >>>>>>>>>>> -the PMD sets the data path pointers to dummy functions
> > >>>>>>>>>>> -(which will prevent the crash),
> > >>>>>>>>>>> -and also make sure the control path operations fail with a return
> > >>>>>>>>>>> code ``-EBUSY``.
> > >>>>>>>>>>> -
> > >>>>>>>>>>> -Because the PMD recovers automatically,
> > >>>>>>>>>>> -the application can only sense that the data flow is disconnected
> > >>>>>>>>>>> for a while
> > >>>>>>>>>>> -and the control API returns an error in this period.
> > >>>>>>>>>>> +During error detection and automatic recovery, the PMD sets the
> > >>>>>>>>>>> data path
> > >>>>>>>>>>> +pointers to dummy functions and also make sure the control path
> > >>>>>>>>>>> operations
> > >>>>>>>>>>> +failed with a return code ``-EBUSY``.
> > >>>>>>>>>>>
> > >>>>>>>>>>>   In order to sense the error happening/recovering,
> > >>>>>>>>>>>   as well as to restore some additional configuration,
> > >>>>>>>>>>> @@ -653,9 +648,9 @@ three events are available:
> > >>>>>>>>>>>
> > >>>>>>>>>>>   ``RTE_ETH_EVENT_ERR_RECOVERING``
> > >>>>>>>>>>>      Notify the application that an error is detected
> > >>>>>>>>>>> -   and the recovery is being started.
> > >>>>>>>>>>> +   and the recovery is about to start.
> > >>>>>>>>>>>      Upon receiving the event, the application should not invoke
> > >>>>>>>>>>> -   any control path function until receiving
> > >>>>>>>>>>> +   any control and data path API until receiving
> > >>>>>>>>>>>      ``RTE_ETH_EVENT_RECOVERY_SUCCESS`` or
> > >>>>>>>>>>> ``RTE_ETH_EVENT_RECOVERY_FAILED`` event.
> > >>>>>>>>>>>
> > >>>>>>>>>>>   .. note::
> > >>>>>>>>>>> @@ -666,8 +661,9 @@ three events are available:
> > >>>>>>>>>>>
> > >>>>>>>>>>>   ``RTE_ETH_EVENT_RECOVERY_SUCCESS``
> > >>>>>>>>>>>      Notify the application that the recovery from error is successful,
> > >>>>>>>>>>> -   the PMD already re-configures the port,
> > >>>>>>>>>>> -   and the effect is the same as a restart operation.
> > >>>>>>>>>>> +   the PMD already re-configures the port.
> > >>>>>>>>>>> +   The application should restore some additional configuration,
> > >>>>>>>>>>> and then
> > >>>>>>>>>>> +   enable data path API invocation.
> > >>>>>>>>>>>
> > >>>>>>>>>>>   ``RTE_ETH_EVENT_RECOVERY_FAILED``
> > >>>>>>>>>>>      Notify the application that the recovery from error failed,
> > >>>>>>>>>>> diff --git a/lib/ethdev/ethdev_driver.c b/lib/ethdev/ethdev_driver.c
> > >>>>>>>>>>> index 0be1e8ca04..f994653fe9 100644
> > >>>>>>>>>>> --- a/lib/ethdev/ethdev_driver.c
> > >>>>>>>>>>> +++ b/lib/ethdev/ethdev_driver.c
> > >>>>>>>>>>> @@ -515,6 +515,14 @@ rte_eth_dma_zone_free(const struct rte_eth_dev
> > >>>>>>>>>>> *dev, const char *ring_name,
> > >>>>>>>>>>>       return rc;
> > >>>>>>>>>>>   }
> > >>>>>>>>>>>
> > >>>>>>>>>>> +void
> > >>>>>>>>>>> +rte_eth_fp_ops_setup(struct rte_eth_dev *dev)
> > >>>>>>>>>>> +{
> > >>>>>>>>>>> +    if (dev == NULL)
> > >>>>>>>>>>> +        return;
> > >>>>>>>>>>> +    eth_dev_fp_ops_setup(rte_eth_fp_ops + dev->data->port_id, dev);
> > >>>>>>>>>>> +}
> > >>>>>>>>>>> +
> > >>>>>>>>>>>   const struct rte_memzone *
> > >>>>>>>>>>>   rte_eth_dma_zone_reserve(const struct rte_eth_dev *dev, const char
> > >>>>>>>>>>> *ring_name,
> > >>>>>>>>>>>                uint16_t queue_id, size_t size, unsigned int align,
> > >>>>>>>>>>> diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
> > >>>>>>>>>>> index 2c9d615fb5..0d964d1f67 100644
> > >>>>>>>>>>> --- a/lib/ethdev/ethdev_driver.h
> > >>>>>>>>>>> +++ b/lib/ethdev/ethdev_driver.h
> > >>>>>>>>>>> @@ -1621,6 +1621,16 @@ int
> > >>>>>>>>>>>   rte_eth_dma_zone_free(const struct rte_eth_dev *eth_dev, const
> > >>>>>>>>>>> char *name,
> > >>>>>>>>>>>            uint16_t queue_id);
> > >>>>>>>>>>>
> > >>>>>>>>>>> +/**
> > >>>>>>>>>>> + * @internal
> > >>>>>>>>>>> + * Setup eth fast-path API to ethdev values.
> > >>>>>>>>>>> + *
> > >>>>>>>>>>> + * @param dev
> > >>>>>>>>>>> + *  Pointer to struct rte_eth_dev.
> > >>>>>>>>>>> + */
> > >>>>>>>>>>> +__rte_internal
> > >>>>>>>>>>> +void rte_eth_fp_ops_setup(struct rte_eth_dev *dev);
> > >>>>>>>>>>> +
> > >>>>>>>>>>>   /**
> > >>>>>>>>>>>    * @internal
> > >>>>>>>>>>>    * Atomically set the link status for the specific device.
> > >>>>>>>>>>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> > >>>>>>>>>>> index 049641d57c..44ee7229c1 100644
> > >>>>>>>>>>> --- a/lib/ethdev/rte_ethdev.h
> > >>>>>>>>>>> +++ b/lib/ethdev/rte_ethdev.h
> > >>>>>>>>>>> @@ -3944,25 +3944,28 @@ enum rte_eth_event_type {
> > >>>>>>>>>>>        */
> > >>>>>>>>>>>       RTE_ETH_EVENT_RX_AVAIL_THRESH,
> > >>>>>>>>>>>       /** Port recovering from a hardware or firmware error.
> > >>>>>>>>>>> -     * If PMD supports proactive error recovery,
> > >>>>>>>>>>> -     * it should trigger this event to notify application
> > >>>>>>>>>>> -     * that it detected an error and the recovery is being started.
> > >>>>>>>>>>> -     * Upon receiving the event, the application should not invoke
> > >>>>>>>>>>> any control path API
> > >>>>>>>>>>> -     * (such as rte_eth_dev_configure/rte_eth_dev_stop...) until
> > >>>>>>>>>>> receiving
> > >>>>>>>>>>> -     * RTE_ETH_EVENT_RECOVERY_SUCCESS or
> > >>>>>>>>>>> RTE_ETH_EVENT_RECOVERY_FAILED event.
> > >>>>>>>>>>> -     * The PMD will set the data path pointers to dummy functions,
> > >>>>>>>>>>> -     * and re-set the data path pointers to non-dummy functions
> > >>>>>>>>>>> -     * before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> > >>>>>>>>>>> -     * It means that the application cannot send or receive any
> > >>>>>>>>>>> packets
> > >>>>>>>>>>> -     * during this period.
> > >>>>>>>>>>> +     *
> > >>>>>>>>>>> +     * If PMD supports proactive error recovery, it should trigger
> > >>>>>>>>>>> this
> > >>>>>>>>>>> +     * event to notify application that it detected an error and the
> > >>>>>>>>>>> +     * recovery is about to start.
> > >>>>>>>>>>> +     *
> > >>>>>>>>>>> +     * Upon receiving the event, the application should not invoke any
> > >>>>>>>>>>> +     * control and data path API until receiving
> > >>>>>>>>>>> +     * RTE_ETH_EVENT_RECOVERY_SUCCESS or RTE_ETH_EVENT_RECOVERY_FAILED
> > >>>>>>>>>>> +     * event.
> > >>>>>>>>>>> +     *
> > >>>>>>>>>>> +     * Once this event is reported, the PMD will set the data path
> > >>>>>>>>>>> pointers
> > >>>>>>>>>>> +     * to dummy functions, and re-set the data path pointers to valid
> > >>>>>>>>>>> +     * functions before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS
> > >>>>>>>>>>> event.
> > >>>>>>>>>>> +     *
> > >>>>>>>>>>>        * @note Before the PMD reports the recovery result,
> > >>>>>>>>>>>        * the PMD may report the RTE_ETH_EVENT_ERR_RECOVERING event
> > >>>>>>>>>>> again,
> > >>>>>>>>>>>        * because a larger error may occur during the recovery.
> > >>>>>>>>>>>        */
> > >>>>>>>>>>>       RTE_ETH_EVENT_ERR_RECOVERING,
> > >>>>>>>>>>>       /** Port recovers successfully from the error.
> > >>>>>>>>>>> -     * The PMD already re-configured the port,
> > >>>>>>>>>>> -     * and the effect is the same as a restart operation.
> > >>>>>>>>>>> +     *
> > >>>>>>>>>>> +     * The PMD already re-configured the port:
> > >>>>>>>>>>>        * a) The following operation will be retained: (alphabetically)
> > >>>>>>>>>>>        *    - DCB configuration
> > >>>>>>>>>>>        *    - FEC configuration
> > >>>>>>>>>>> @@ -3989,6 +3992,9 @@ enum rte_eth_event_type {
> > >>>>>>>>>>>        *      (@see RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP)
> > >>>>>>>>>>>        * c) Any other configuration will not be stored
> > >>>>>>>>>>>        *    and will need to be re-configured.
> > >>>>>>>>>>> +     *
> > >>>>>>>>>>> +     * The application should restore some additional configuration
> > >>>>>>>>>>> +     * (see above case b/c), and then enable data path API invocation.
> > >>>>>>>>>>>        */
> > >>>>>>>>>>>       RTE_ETH_EVENT_RECOVERY_SUCCESS,
> > >>>>>>>>>>>       /** Port recovery failed.
> > >>>>>>>>>>> diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
> > >>>>>>>>>>> index 357d1a88c0..c273e0bdae 100644
> > >>>>>>>>>>> --- a/lib/ethdev/version.map
> > >>>>>>>>>>> +++ b/lib/ethdev/version.map
> > >>>>>>>>>>> @@ -320,6 +320,7 @@ INTERNAL {
> > >>>>>>>>>>>       rte_eth_devices;
> > >>>>>>>>>>>       rte_eth_dma_zone_free;
> > >>>>>>>>>>>       rte_eth_dma_zone_reserve;
> > >>>>>>>>>>> +    rte_eth_fp_ops_setup;
> > >>>>>>>>>>>       rte_eth_hairpin_queue_peer_bind;
> > >>>>>>>>>>>       rte_eth_hairpin_queue_peer_unbind;
> > >>>>>>>>>>>       rte_eth_hairpin_queue_peer_update;
> > >>>>>>>>>>> --
> > >>>>>>>>>>   Acked-by: Konstantin Ananyev <konstantin.ananyev at huawei.com>
> > >>>>>>>>>>
> > >>>>>>>>>>> 2.17.1
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>
> > >>>>>
> > >
> > > .
> > >
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4218 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://mails.dpdk.org/archives/dev/attachments/20230308/31cca998/attachment-0001.bin>
    
    
More information about the dev
mailing list