[dpdk-dev] [PATCH v2 03/17] doc: add detail on using max SIMD bitwidth

Ananyev, Konstantin konstantin.ananyev at intel.com
Mon Sep 7 14:01:41 CEST 2020



> On Sun, Sep 06, 2020 at 10:20:30PM +0000, Ananyev, Konstantin wrote:
> > > This patch adds documentation on the usage of the max SIMD bitwidth EAL
> > > setting, and how to use it to enable AVX-512 at runtime.
> > >
> > > Cc: Anatoly Burakov <anatoly.burakov at intel.com>
> > > Cc: John McNamara <john.mcnamara at intel.com>
> > > Cc: Marko Kovacevic <marko.kovacevic at intel.com>
> > >
> > > Signed-off-by: Ciara Power <ciara.power at intel.com>
> > > ---
> > >  doc/guides/howto/avx512.rst                   | 36 +++++++++++++++++++
> > >  doc/guides/linux_gsg/eal_args.include.rst     | 12 +++++++
> > >  .../prog_guide/env_abstraction_layer.rst      | 31 ++++++++++++++++
> > >  3 files changed, 79 insertions(+)
> > >  create mode 100644 doc/guides/howto/avx512.rst
> > >
> > > diff --git a/doc/guides/howto/avx512.rst b/doc/guides/howto/avx512.rst
> > > new file mode 100644
> > > index 0000000000..ebae0f2b4f
> > > --- /dev/null
> > > +++ b/doc/guides/howto/avx512.rst
> > > @@ -0,0 +1,36 @@
> > > +..  SPDX-License-Identifier: BSD-3-Clause
> > > +    Copyright(c) 2020 Intel Corporation.
> > > +
> > > +
> > > +Using AVX-512 with DPDK
> > > +=======================
> > > +
> > > +AVX-512 is not used by default in DPDK, but it can be selected at runtime by apps through the use of EAL API,
> > > +and by the user with a commandline argument. DPDK has a setting for max SIMD bitwidth,
> > > +which can be modified and will then limit the vector path taken by the code.
> >
> > It's is a good idea to have such ability,
> > though just one global variable for all DPDK lib/drivers
> > seems a bit coarse to me.
> > Let say we have 2 libs: libA and libB.
> > Both do have RTE_MAX_512_SIMD specific code-path,
> > though libA  would cause frequency level change, while libB wouldn't.
> > So user (to avoid frequency level change) would have to block
> > 512_SIMD for both libs.
> > I think it would be much better to follow the strategy we use for log-level:
> > there is a global simd_width, but each DDPK entity (lib/driver) also has
> > it's own simd_width that overrules a global one (more fine-grained control).
> 
> That for me is a nightmare scenario. How is the user meant to know what
> libs could cause him a frequency or not, or is he meant to determine that
> empirically by trial and error on each platform? 

I suppose yes.
Let say user can try to run the appp with global
--force-max-simd-bitwidth=256 and --force-max-simd-bitwidth=512
and check the diffenrence.
If he is happy with performance he get, he can stick with one of global values (256/512).
If not he can try further with choosing different max-simd-width for different components.

>This scenario is
> completely unlike logging in that it's non-obvious to the user, and so
> needs to be kept as consumable as possible to the app-developer and the
> user.

This feature is totally optional, if user feels like he doesn't need to care about it,
he can simply ignore it and use default values.
Though for those who do care, one global value seems too restrictive.

> Unless we find a concrete scenario where having a single switch is
> causing real user problems, I'd much rather keep things simple.

As an example, I run several perf tests with acl avx512 code path and
so far didn't see any switches to CORE_POWER.LVL2_TURBO_LICENSE
(heavy AVX512 instructions).
I presume there might be other light-weight avx512 codepaths (lpm, etc.).
Though for crypto cpu PMDs (aesni-mb, etc.) I think it would cause switch
to the LVL2.

> See also answer below, where I point out that the main target of this is developers,
> who can use this flag to indicate what vector bitwidth their app uses,
> and then allow DPDK to match that.

But in majority if cases developer doesn't know for sure on what platform his app will run
(unless quite rare situation when app is developed for one particular platform).
Again for complex/multi-purpose applications (like VPP, DPDK-OVS) developer can't even
always predict what modules will be used and which wouldn't.
Again app can be configured in a way that different modules can run on different cores
(let say module that does ACL lookup on core X, module that does actual crypto on core Y).  
All that depends on particular deployment scenarios.
So in many cases only end-user has all information to decide what max-simd width will be optimal.  

> 
> >
> > > +
> > > +
> > > +Using the API in apps
> > > +---------------------
> > > +
> > > +Apps can request DPDK uses AVX-512 at runtime, if it provides improved application performance.
> > > +This can be done by modifying the EAL setting for max SIMD bitwidth to 512, as by default it is 256,
> > > +which does not allow for AVX-512.
> > > +
> > > +.. code-block:: c
> > > +
> > > +   rte_set_max_simd_bitwidth(RTE_MAX_512_SIMD);
> > > +
> > > +This API should only be called once at initialization, before EAL init.
> >
> > If the only possible usage scenario for that function is init time before  EAL init,
> > then do we really need it at all?
> > As we have cmd-line flag anyway?
> > User can achieve similar goal, by just:  rte_eal_init(,..."--force-max-simd-bitwidth=..."...);
> 
> Ideally, the user should never know or care about the cmdline flag, it's
> only for testing. The main criteria for allowing DPDK to use longer
> instruction sets is whether the application itself will similarly use them,
> and that's something for the programmer to do.

Unfortunately, I don't think programmer also has all information to make such decisions.
A lot depends on deployment scenarios, see above. 
 
> Having the programmer muck
> about with cmdline arguments is less than ideal, so a proper API is
> warrented here. 

Agree, function call is more convenient for the developer.

>The reason for the note about EAL init, is that we don't
> want libraries to have to check the max bitwidth each time an API is
> called, so we want to have a way to prevent people changing things at
> runtime. This therefore seemed simplest.

I understand that, but for that purpose just cmd-line flag is enough,
that's why I asked do we need an API call at all.
It seems a bit strange to me to introduce an API that supposed to be called
only *before* eal_init(), but from other side I don't see much harm from it either.
So if you and other guys still prefer to keep it - ok by me.
Konstantin
 






More information about the dev mailing list