[dpdk-dev] How to disable SVE auto vectorization while using GCC

Honnappa Nagarahalli Honnappa.Nagarahalli at arm.com
Tue May 11 16:10:22 CEST 2021


<snip>
> >
> >>
> >> Thanks for your suggestions, we found that the -fno-tree-vectorize
> >> option works.
> >> PS: This option is not successfully added in the earliest test.
> >>
> >> Solution:
> >> 1. use the -fno-tree-vectorize option to prevent compiler generate
> >> auto vetorization
> >>    code, so tha slow-path will work fine.
> >> 2. add '-march=armv8-a+sve+crc' line of implementer_generic in
> >> arm/meson.build
> >>         'part_number_config': {
> >>                 'generic': {'machine_args': ['-march=armv8-a+crc',
> >>                                              '-march=armv8-a+sve+crc',
> >>                                              '-moutline-atomics']}
> >>         }
> >>    If compiler doesn't support '-march=armv8-a+sve+crc', then it will
> fallback
> >>    supports '-march=armv8-a+crc'.
> >>    If compiler supports '-march=armv8-a+sve+crc', then it will
> >> compile SVE- related
> >>    code, so the IO-path could support SVE.
> >>
> >> Base above we could achieve initial target.
> > The 'generic' target is for generating a binary that would work on all ArmV8
> machines. If you are building with '-march=armv8-a+sve+crc', the IO-Path
> would not work on non-SVE machines.
> >
> 
> The 'generic' only used in local CI (note: the two platforms are both ARMv8
> machines)
> 
> In the IO-path, we support NEON and SVE Rx/Tx, the code was written by
> ACLE, so it will not affect by the -fno-tree-vectorize option.
> 
> If compiler supports '-march=armv8-a+sve+crc', then it will compile both
> NEON and SVE related code.
Using '-march=armv8-a+sve+crc' and '-fno-tree-vectorize' does not provide an absolute guarantee that the compiler will not use SVE elsewhere.

The safest way to ensure that only specific functions use SVE is to compile without +sve (e.g. using -march=armv8-a) and use pragmas around the functions that are allowed to use SVE.  Ex:

#pragma GCC push_options
#pragma GCC target ("+sve")
void f(int *x) {
	for (int i = 0; i < 100; ++i) x[i] = i;
}
#pragma GCC pop_options
void g(int *x) {
	for (int i = 0; i < 100; ++i) x[i] = i;
}

compiles f() using SVE and g() with standard options.

You can also follow the function multiversioning discussed in the other thread.

> In the runtime, driver supports detect the platform whether support SVE, if
> not it will select the NEON.
> 
> Best regards.
> 
> >>
> >>
> >> On 2021/5/1 4:54, Honnappa Nagarahalli wrote:
> >>> <snip>
> >>>
> >>>>
> >>>> On Fri, Apr 30, 2021 at 5:27 PM fengchengwen
> >>>> <fengchengwen at huawei.com> wrote:
> >>>>>
> >>>>> Hi, ALL
> >>>>> We have a question for your help:
> >>>>>   1. We have two platforms, both of which are ARM64, one of which
> >>>> supports
> >>>>>      both NEON and SVE, the other only support NEON.
> >>>>>   2. We want to run on both platforms with a single binary file,
> >>>>> and use
> >> the
> >>>>>      highest vector capability of the corresponding platform
> >>>>> whenever
> >>>> possible.
> >>>>
> >>>> I see VPP has a similar feature. IMO, it is not present in DPDK.
> >>>> Basically, In order to do this.
> >>>> - Compile slow-path code(90% of DPDK) with minimal CPU instruction
> >>>> set support
> >>>> - Have fastpath function compile with different CPU instruction set
> >>>> levels -In slowpath, Attach the fastpath function pointer-based on
> >>>> CPU instruction- level support.
> >>> Agree.
> >>>
> >>>>
> >>>>
> >>>>>   3. So we build the DPDK program with -march=armv8-a+sve+crc (GCC
> >>>> 10.2).
> >>> This defines the minimum capabilities of the target machine.
> >>>
> >>>>>      However, it is found that invalid instructions occur when the
> program
> >>>>>      runs on a machine that does not support SVE (pls see below).
> >>>>>   4. The problem is caused by the introduction of SVE in GCC
> >>>>> automatic
> >>>> vector
> >>>>>      optimization.
> >>>>>
> >>>>>   So Is there a way to disable GCC automatic vector optimization
> >>>>> or use
> >> only
> >>>>>   NEON to perform automatic vector optimization?
> >>> I do not think this is safe. Once SVE is enabled, compiler is
> >>> allowed to use
> >> the SVE instructions wherever it finds it fit.
> >>>
> >>>>>
> >>>>>   BTW: we already test -fno-tree-vectorize (as link below) but
> >>>>> found no
> >>>> effect.
> >>>>>
> >>>>> https://stackoverflow.com/questions/7778174/how-can-i-disable-vect
> >>>>> or
> >>>>> iz
> >>>>> ation-while-using-gcc
> >>>>>
> >>>>>
> >>>>> The GDB output:
> >>>>>      EAL: Detected 128 lcore(s)
> >>>>>      EAL: Detected 4 NUMA nodes
> >>>>>      Option -w, --pci-whitelist is deprecated, use -a, --allow
> >>>>> option instead
> >>>>>
> >>>>>      Program received signal SIGILL, Illegal instruction.
> >>>>>      0x0000000000671b88 in eal_adjust_config ()
> >>>>>      (gdb)
> >>>>>      (gdb) where
> >>>>>      #0  0x0000000000671b88 in eal_adjust_config ()
> >>>>>      #1  0x0000000000682840 in rte_eal_init ()
> >>>>>      #2  0x000000000051c870 in main ()
> >>>>>      (gdb)
> >>>>>
> >>>>> The disassembly output of eal_adjust_config:
> >>>>>      671b7c:       f8237a81        str     x1, [x20, x3, lsl #3]
> >>>>>      671b80:       f110001f        cmp     x0, #0x400
> >>>>>      671b84:       54ffff21        b.ne    671b68 <eal_adjust_config+0x1f4>
> //
> >>>> b.any
> >>>>>      671b88:       043357f5        addvl   x21, x19, #-1
> >>>>>      671b8c:       043457e1        addvl   x1, x20, #-1
> >>>>>      671b90:       910562b5        add     x21, x21, #0x158
> >>>>>      671b94:       04e0e3e0        cntd    x0
> >>>>>      671b98:       914012b5        add     x21, x21, #0x4, lsl #12
> >>>>>      671b9c:       52800218        mov     w24, #0x10                      // #16
> >>>>>      671ba0:       25d8e3e1        ptrue   p1.d
> >>>>>      671ba4:       25f80fe0        whilelo p0.d, wzr, w24
> >>>>>      671ba8:       a5e04020        ld1d    {z0.d}, p0/z, [x1, x0, lsl #3]
> >>>>>
> >>>>>
> >>>>> Best regards.
> >>>>>
> >



More information about the dev mailing list