[PATCH V2 3/7] net/mlx5: add new devargs to control probe optimization

Slava Ovsiienko viacheslavo at nvidia.com
Wed Oct 30 09:16:58 CET 2024


Hi,


> -----Original Message-----
> From: Stephen Hemminger <stephen at networkplumber.org>
> Sent: Tuesday, October 29, 2024 6:07 PM
> To: Minggang(Gavin) Li <gavinl at nvidia.com>
> Cc: Slava Ovsiienko <viacheslavo at nvidia.com>; Matan Azrad
> <matan at nvidia.com>; Ori Kam <orika at nvidia.com>; NBU-Contact-Thomas
> Monjalon (EXTERNAL) <thomas at monjalon.net>; Dariusz Sosnowski
> <dsosnowski at nvidia.com>; Bing Zhao <bingz at nvidia.com>; Suanming Mou
> <suanmingm at nvidia.com>; dev at dpdk.org; Raslan Darawsheh
> <rasland at nvidia.com>; rongwei liu <rongweil at nvidia.com>
> Subject: Re: [PATCH V2 3/7] net/mlx5: add new devargs to control probe
> optimization
> 
> On Tue, 29 Oct 2024 16:27:25 +0800
> "Minggang(Gavin) Li" <gavinl at nvidia.com> wrote:
> 
> > On 10/28/2024 11:47 PM, Stephen Hemminger wrote:
> > > On Mon, 28 Oct 2024 11:18:18 +0200
> > > "Minggang Li(Gavin)" <gavinl at nvidia.com> wrote:
> > >
> > >> +- ``probe_opt_en`` parameter [int]
> > >> +
> > >> +  A non-zero value optimizes the probe process, especially for large
> scale.
> > >> +  PMD will hold the IB device information internally and reuse it.
> > >> +
> > >> +  By default, the PMD will set this value to 0.
> > >> +
> > > Is there ever a case where this should not be used?
> > >
> > > It would be better to just detect and use it if available.
> > > This driver does not need more options...
> > The new mechanism, which is required by few users, so we would not
> > break production and with the option we encourage to use new way only
> > those who actually needs. Once we see the new way is reliable - we
> > will change the default value.
> 
> I understand that philosophy but it leads to a maze of technical debt.

This specific case is not about philosophy in general.

We have users with huge number of SFs/VFs configured and experiencing the issues
with gigantic probing timings (literally - tens of minutes). This story was lasting
long time, we were trying different approaches, then admitted we had to update kernel,
etc., and eventually we had things done and it resulted in this series.

The new approach is event driven and based on the handling the new kernel-generated events.
So, it relies on system-wide environment and might be problematic on some hosts (we do not
expect too much though).

At the same time, the existing probe approach provides acceptable performance and
satisfies the vast majority of the users.  So, our main objective is not to break anything
in production (most users), the second objective - to resolve issues of some users with
configuration specifics (few users). That's why we would prefer to have the devarg
(with all its cons and pros) and set the devarg default value to false. Later, once the new kernel
API spreads and we have good production statistics we can consider altering the default
value to true or obsolete the devarg at all. Does this approach look reasonable?

> Has a full suite of tests been done with both settings of the option?
> Has both values been tested on all combinations of platforms and OS
> releases?

We cannot keep the new approach only - we have to maintain legacy kernel compatibility.
So - there always will be 2 branches of tests, till legacy kernels retirement.  And having the devarg
might even simplify the testing - the single host can be used for both runs, with different devargs values.

> My point is every option adds to the necessary test matrix geometrically!

Once we added the new probing mechanics - the test matrix is ALREADY extended , regardless of devargs
implementation. The devarg just makes our users livings in fields easier.

With best regards,
Slava



More information about the dev mailing list