[dpdk-dev] How to disable SVE auto vectorization while using GCC
    fengchengwen 
    fengchengwen at huawei.com
       
    Tue May 11 13:23:04 CEST 2021
    
    
  
On 2021/5/9 2:46, Honnappa Nagarahalli wrote:
> <snip>
> 
>>
>> Thanks for your suggestions, we found that the -fno-tree-vectorize option
>> works.
>> PS: This option is not successfully added in the earliest test.
>>
>> Solution:
>> 1. use the -fno-tree-vectorize option to prevent compiler generate auto
>> vetorization
>>    code, so tha slow-path will work fine.
>> 2. add '-march=armv8-a+sve+crc' line of implementer_generic in
>> arm/meson.build
>>         'part_number_config': {
>>                 'generic': {'machine_args': ['-march=armv8-a+crc',
>>                                              '-march=armv8-a+sve+crc',
>>                                              '-moutline-atomics']}
>>         }
>>    If compiler doesn't support '-march=armv8-a+sve+crc', then it will fallback
>>    supports '-march=armv8-a+crc'.
>>    If compiler supports '-march=armv8-a+sve+crc', then it will compile SVE-
>> related
>>    code, so the IO-path could support SVE.
>>
>> Base above we could achieve initial target.
> The 'generic' target is for generating a binary that would work on all ArmV8 machines. If you are building with '-march=armv8-a+sve+crc', the IO-Path would not work on non-SVE machines.
> 
The 'generic' only used in local CI (note: the two platforms are both ARMv8 machines)
In the IO-path, we support NEON and SVE Rx/Tx, the code was written by ACLE, so it will
not affect by the -fno-tree-vectorize option.
If compiler supports '-march=armv8-a+sve+crc', then it will compile both NEON and SVE
related code.
In the runtime, driver supports detect the platform whether support SVE, if not it will
select the NEON.
Best regards.
>>
>>
>> On 2021/5/1 4:54, Honnappa Nagarahalli wrote:
>>> <snip>
>>>
>>>>
>>>> On Fri, Apr 30, 2021 at 5:27 PM fengchengwen
>>>> <fengchengwen at huawei.com> wrote:
>>>>>
>>>>> Hi, ALL
>>>>> We have a question for your help:
>>>>>   1. We have two platforms, both of which are ARM64, one of which
>>>> supports
>>>>>      both NEON and SVE, the other only support NEON.
>>>>>   2. We want to run on both platforms with a single binary file, and use
>> the
>>>>>      highest vector capability of the corresponding platform
>>>>> whenever
>>>> possible.
>>>>
>>>> I see VPP has a similar feature. IMO, it is not present in DPDK.
>>>> Basically, In order to do this.
>>>> - Compile slow-path code(90% of DPDK) with minimal CPU instruction
>>>> set support
>>>> - Have fastpath function compile with different CPU instruction set
>>>> levels -In slowpath, Attach the fastpath function pointer-based on
>>>> CPU instruction- level support.
>>> Agree.
>>>
>>>>
>>>>
>>>>>   3. So we build the DPDK program with -march=armv8-a+sve+crc (GCC
>>>> 10.2).
>>> This defines the minimum capabilities of the target machine.
>>>
>>>>>      However, it is found that invalid instructions occur when the program
>>>>>      runs on a machine that does not support SVE (pls see below).
>>>>>   4. The problem is caused by the introduction of SVE in GCC
>>>>> automatic
>>>> vector
>>>>>      optimization.
>>>>>
>>>>>   So Is there a way to disable GCC automatic vector optimization or use
>> only
>>>>>   NEON to perform automatic vector optimization?
>>> I do not think this is safe. Once SVE is enabled, compiler is allowed to use
>> the SVE instructions wherever it finds it fit.
>>>
>>>>>
>>>>>   BTW: we already test -fno-tree-vectorize (as link below) but found
>>>>> no
>>>> effect.
>>>>>
>>>>> https://stackoverflow.com/questions/7778174/how-can-i-disable-vector
>>>>> iz
>>>>> ation-while-using-gcc
>>>>>
>>>>>
>>>>> The GDB output:
>>>>>      EAL: Detected 128 lcore(s)
>>>>>      EAL: Detected 4 NUMA nodes
>>>>>      Option -w, --pci-whitelist is deprecated, use -a, --allow
>>>>> option instead
>>>>>
>>>>>      Program received signal SIGILL, Illegal instruction.
>>>>>      0x0000000000671b88 in eal_adjust_config ()
>>>>>      (gdb)
>>>>>      (gdb) where
>>>>>      #0  0x0000000000671b88 in eal_adjust_config ()
>>>>>      #1  0x0000000000682840 in rte_eal_init ()
>>>>>      #2  0x000000000051c870 in main ()
>>>>>      (gdb)
>>>>>
>>>>> The disassembly output of eal_adjust_config:
>>>>>      671b7c:       f8237a81        str     x1, [x20, x3, lsl #3]
>>>>>      671b80:       f110001f        cmp     x0, #0x400
>>>>>      671b84:       54ffff21        b.ne    671b68 <eal_adjust_config+0x1f4>  //
>>>> b.any
>>>>>      671b88:       043357f5        addvl   x21, x19, #-1
>>>>>      671b8c:       043457e1        addvl   x1, x20, #-1
>>>>>      671b90:       910562b5        add     x21, x21, #0x158
>>>>>      671b94:       04e0e3e0        cntd    x0
>>>>>      671b98:       914012b5        add     x21, x21, #0x4, lsl #12
>>>>>      671b9c:       52800218        mov     w24, #0x10                      // #16
>>>>>      671ba0:       25d8e3e1        ptrue   p1.d
>>>>>      671ba4:       25f80fe0        whilelo p0.d, wzr, w24
>>>>>      671ba8:       a5e04020        ld1d    {z0.d}, p0/z, [x1, x0, lsl #3]
>>>>>
>>>>>
>>>>> Best regards.
>>>>>
> 
    
    
More information about the dev
mailing list