[dpdk-dev] [PATCH v3 6/6] net/failsafe: fix removed device handling

Gaëtan Rivet gaetan.rivet at 6wind.com
Mon Jan 8 11:57:39 CET 2018


Hi Matan,

Sorry for the delay on this.

On Wed, Dec 20, 2017 at 10:58:29AM +0000, Matan Azrad wrote:
> Hi Gaetan
> 
> > -----Original Message-----
> > From: Gaëtan Rivet [mailto:gaetan.rivet at 6wind.com]
> > Sent: Wednesday, December 20, 2017 12:22 AM
> > To: Matan Azrad <matan at mellanox.com>
> > Cc: Adrien Mazarguil <adrien.mazarguil at 6wind.com>; Thomas Monjalon
> > <thomas at monjalon.net>; dev at dpdk.org
> > Subject: Re: [PATCH v3 6/6] net/failsafe: fix removed device handling
> > 
> > Hi Matan,
> > 
> > On Tue, Dec 19, 2017 at 05:10:15PM +0000, Matan Azrad wrote:
> > > There is time between the physical removal of the device until
> > > sub-device PMDs get a RMV interrupt. At this time DPDK PMDs and
> > > applications still don't know about the removal and may call
> > > sub-device control operation which should return an error.
> > >
> > > In previous code this error is reported to the application contrary to
> > > fail-safe principle that the app should not be aware of device removal.
> > >
> > > Add an removal check in each relevant control command error flow and
> > > prevent an error report to application when the sub-device is removed.
> > >
> > > Fixes: a46f8d5 ("net/failsafe: add fail-safe PMD")
> > > Fixes: b737a1e ("net/failsafe: support flow API")

As stated previously, please do not include those fixes lines.

> > >
> > > Signed-off-by: Matan Azrad <matan at mellanox.com>
> > > ---
> > 
> > <snip>
> > 
> > > +/*
> > > + * Check if error should be reported to the user.
> > > + */
> > > +static inline bool
> > > +fs_is_error(struct sub_device *sdev, int err) {
> > > +	/* A device removal shouldn't be reported as an error. */
> > > +	if (err == 0 || sdev->remove == 1 || err == -EIO)
> > > +		return false;
> > > +	return true;
> > > +}
> > 
> > This is better, thanks.
> > 
> > However is there a reason you did not follow the same pattern as ethdev
> > with eth_err? I see the two functions as similar in their intent, making them
> > close to each other would be clearer to a reader being familiar with the
> > ethdev API and that would be interested in fail-safe.
> > 
> > What do you think?
> > 
> 
> I think that there is a real different between eth_err function to fs_is_error:
> ethdev uses eth_err function to adjust removal return value to be -EIO.
> fail-safe uses fs_is_error function to check if an error should be reported to the user to save the fail-safe principle that the app should not be aware of device removal  -  this is the main idea that also causes me to change the name from fs_is_removed to fs_is_error.

I would have preferred if it followed the same pattern as ethdev (that
function be used to adjust the return value, not performing a flag check).

While better on its own, the pattern:

    if (fs_is_error(sdev, err)) {
            ERROR("xxxx");
            return err;
    }

is dangerous, as then the author is forbidden from returning err, assuming
err could be -EIO. He or she would be forced to return an explicit "0".
To be clear, here would be an easy mistake to do:

    if (fs_is_error(sdev, err)) {
            ERROR("xxxx");
    }
    return err;

And this kind of code-flow is not unusual, or even unwanted.
I dislike having this kind of implicit rule derived from using a helper
such as fs_is_error().

The alternative

    if ((err = fs_err(sdev, err))) {
            ERROR("xxxx");
            return err;
    }

Forces the value err to be set to the correct one.

This mistake can already be found in your patch:

> @@ -150,7 +150,7 @@
>                         continue;
>                 local_ret = rte_flow_destroy(PORT_ID(sdev),
>                                 flow->flows[i], error);
> -               if (local_ret) {
> +               if (fs_is_error(sdev, local_ret)) {
>                         ERROR("Failed to destroy flow on sub_device %d: %d",
>                                         i, local_ret);
>                         if (ret == 0)

Your environment does not include the function, but this is within
fs_flow_destroy (please update to include the context by the way
it helps a lot the review :). Afterward, line 162 ret is directly
used as return value.

Also, fs_err() would need to transform rte_errno when relevant (mostly
in failsafe_flow.c I think).

This is the kind of subtlety that needs to be avoided when designing
APIs, even internal ones. This will induce errors afterward and
complicate the maintenance of the codebase.

Best regards,

-- 
Gaëtan Rivet
6WIND


More information about the dev mailing list