[dpdk-dev] [RFC PATCH 0/3] librte_ethdev: error recovery support
    Ferruh Yigit 
    ferruh.yigit at intel.com
       
    Fri Jul  3 18:12:52 CEST 2020
    
    
  
On 3/12/2020 7:34 AM, Thomas Monjalon wrote:
> 12/03/2020 04:25, Kalesh Anakkur Purayil:
>> Hi Thomas,
>>
>> On Wed, Mar 11, 2020 at 6:49 PM Thomas Monjalon <thomas at monjalon.net> wrote:
>>
>>> 22/01/2020 11:16, Kalesh A P:
>>>> From: Kalesh AP <kalesh-anakkur.purayil at broadcom.com>
>>>>
>>>> This patch adds support for recovery event in rte_eth_event framework.
>>>> FW error and FW reset conditions would be managed by PMD. Driver uses
>>>
>>> "Driver"? THE driver? :)
>>>
>>>> RTE_ETH_EVENT_INTR_RESET event to notify the applications about the
>>>> FW reset or error.
>>>
>>> Which drivers doe that?
>>>
>> [Kalesh]: Second patch in this series implements this behavior in bnxt PMD.
>> Error recovery is a new feature added in bnxt PMD in 19.11. This change is
>> needed to support error recovery functionality.
>>
>>>
>>>> In such cases, PMD would need recovery events to
>>>> notify application about PMD has recovered from FW reset or FW error.
>>>
>>> Sorry I don't understand. You said application is notified of any error.
>>> But the PMD can recover from this error? So what is the error at the end?
>>> If the error is recovered why notifying the application?
>>>
>> [Kalesh] : Let me give you some insight on this.
>>
>> The error recovery solution is a protocol implemented between firmware and
>> bnxt PMD to recover from the fatal errors without a system reboot. There is
>> an alarm thread which constantly monitors the health of the firmware and
>> initiates a recovery when needed.
>>
>> There are two scenarios here:
>>
>> 1. Hardware or firmware encountered an error which firmware detected.
>> Firmware is in operational status here. In this case, firmware can reset
>> the chip and notify the driver about the reset.
>> 2. Hardware or firmware encountered an error but firmware is dead/hung.
>> Firmware is not in operational status. In this case, the only possible way
>> to recover the adapter is through host driver(bnxt PMD).
>>
>> In both cases, bnxt PMD reinitializes with the FW again after the reset.
>> During that recovery process, data path will be halted and any control path
>> operation would fail. So, bnxt PMD has to notify the application about this
>> reset/error event to prevent any activities from application during this
>> time.
> 
> I think you are changing the meaning of the reset event.
> It was described like this:
> RTE_ETH_EVENT_INTR_RESET,
>             /**< reset interrupt event, sent to VF on PF reset */
> 
> Please update this description as well.
> 
> Of course, we'll need approval from other PMD maintainers
> to accept the new recovery API.
> 
Hi Kalesh,
Is this RFC still relevant/valid?
    
    
More information about the dev
mailing list