[dpdk-dev] AVX512 bug on SkyLake

Yongseok Koh yskoh at mellanox.com
Fri Nov 9 00:01:03 CET 2018


> On Nov 8, 2018, at 9:21 AM, Ferruh Yigit <ferruh.yigit at intel.com> wrote:
> 
> On 11/8/2018 3:59 PM, Thomas Monjalon wrote:
>> Hi,
>> 
>> We need to gather more information about this bug.
>> More below.
>> 
>> 07/11/2018 10:04, Wiles, Keith:
>>>> On Nov 6, 2018, at 9:30 PM, Yongseok Koh <yskoh at mellanox.com> wrote:
>>>>> On Nov 5, 2018, at 6:06 AM, Wiles, Keith <keith.wiles at intel.com> wrote:
>>>>>> On Nov 2, 2018, at 9:04 PM, Yongseok Koh <yskoh at mellanox.com> wrote:
>>>>>> 
>>>>>> This is a workaround to prevent a crash, which might be caused by
>>>>>> optimization of newer gcc (7.3.0) on Intel Skylake.
>>>>> 
>>>>> Should the code below not also test for the gcc version and
>>>>> the Sky Lake processor, maybe I am wrong but it seems it is
>>>>> turning AVX512 for all GCC builds
>>>> 
>>>> I didn't want to check gcc version as 7.3.0 is very new. Only gcc 8 is newly up since then (gcc 8.2).
>>>> Also, I wasn't able to test every gcc versions and I wanted to be a bit conservative for this crash.
>>>> Performance drop (if any) by disabling a new (experimental) feature would be less risky than unaccountable crash.
>>>> And, it does disable the feature only if CONFIG_RTE_ENABLE_AVX512=n. Please refer to v3.
>>> 
>>> Are you not turning off all of the GCC versions for AVX512.
>>> And you can test for range or greater then GCC version and
>>> it just seems like we are turning off every gcc version, is that true?
>> 
>> Do we know exactly which GCC versions are affected?
>> 
>>>>> Also bug 97 seems a bit obscure reference, maybe you know
>>>>> the bug report, but more details would be good?
>>>> 
>>>> I sent out the report to dev list two month ago.
>>>> And I created the Bug 97 in order to reference it
>>>> in the commit message.
>>>> I didn't want to repeat same message here and there,
>>>> but it would've been better to have some sort of summary
>>>> of the Bug, although v3 has a few more words.
>>>> However, v3 has been merged.
>>> 
>>> Still this is too obscure if nothing else give a link to
>>> a specific bug not just 97.
>> 
>> The URL is
>> 	https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.dpdk.org%2Fshow_bug.cgi%3Fid%3D97&data=02%7C01%7Cyskoh%40mellanox.com%7C90ff6c361faf422b976108d6459eb490%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636772945282345908&sdata=2o%2Fg203aWrKCYg16S6oI4BcS41igpLu1DloS%2FrRnknc%3D&reserved=0
>> The bug is also pointing to an email:
>> 	https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmails.dpdk.org%2Farchives%2Fdev%2F2018-September%2F111522.html&data=02%7C01%7Cyskoh%40mellanox.com%7C90ff6c361faf422b976108d6459eb490%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636772945282345908&sdata=NCFKxaREd69iZ8eyFKg%2FWBP73CLTXkxrNQQeii%2Bbsao%3D&reserved=0
>> 
>> Summary:
>> 	- CPU: Intel Skylake
>> 	- Linux environment: Ubuntu 18.04
>> 	- Compiler: gcc-7.3 (Ubuntu 7.3.0-16ubuntu3)
> 
> Is it possible to test a few other gcc versions to check if the issue is
> specific to this compiler version?

Nothing's impossible but even with my quick search in gcc.gnu.org,
I could find the following documents mention mavx512f support:

GCC 4.9.0
April 22, 2014 (changes, documentation)
 
GCC 5.1
April 22, 2015 (changes, documentation)
 
GCC 6.4
July 4, 2017 (changes, documentation)
 
GCC 7.1
May 2, 2017 (changes, documentation)
 
GCC 8.1
May 2, 2018 (changes, documentation)

We altogether have to put quite large resource to verify all of the versions.
 
I assumed older than gcc 7 would have the same issue. I know it was a speculation
but like I mentioned I wanted to be more conservative. I didn't mean this is a permanent fix.
For two months, we couldn't have any tangible solution (actually nobody cared including myself),
so I submitted the patch to temporarily disable mavx512f.

I'm still not sure what the best option is...

Thanks,
Yongseok

> 
>> 	- Scenario: testpmd crashes when it starts forwarding
>> 	- Behaviour: AVX2 version of rte_memcpy() optimized with 512b instructions
>> 	- Fix: disable AVX512 optimization with -mno-avx512f
>> 
>> It seems to have been reproduced only when using mlx5 PMD so far.
>> Any other experience?



More information about the dev mailing list