[dpdk-dev] Question about hardware error handling policy

Thomas Monjalon thomas at monjalon.net
Thu Jul 22 17:46:12 CEST 2021


22/07/2021 15:50, fengchengwen:
> Hi, all
> 
>     I notice ethdev support dev_reset ops, which could be used to recover from
> errors, and only 13+ drivers support this function.
>     And also there is event for reset: RTE_ETH_EVENT_INTR_RESET, and only 6
> drivers support it (most of them are VF).
> 
>     This provides users with two ways to handle hardware errors:
>     a. driver report RTE_ETH_EVENT_INTR_RESET, and application do reset ops.
>     b. application detect errors (the detection method is unclear), and call
>     reset ops to recover.
> 
>     According to the design of this API, error handling is assigned to the
> application, and the driver is only responsible for reporting events. This
> simplifies the driver design (for example, the driver does not need to maintain
> mutex locks).
> 
>     As we know, many modern NICs come with firmware, have PCIE interfaces,
> support SR-IOV, the hardware errors can have: firmware reboot/PF reset/
> VF reset/FLR, but these errors(particularly firmware/PF) are not addressed in
> most drivers.
> 
>     Question 1: what do we think of these errors(particularly firmware/PF)? Do
> we think that the probability is very low and that there is no need to deal with
> them?

Even rare errors must be managed.

>     Question 2: I prefer to put error handling in the application layer, because
> doing it in the driver can make the driver complex, but there is no app to
> register the INTR_RESET event handler. I think we can build a standard handler
> in testpmd, What do you think?

Absolutely. As any ethdev API, it must be tested with testpmd.




More information about the dev mailing list