[dpdk-dev] [dpdk-stable] AVX512 bug on SkyLake

Ananyev, Konstantin konstantin.ananyev at intel.com
Mon Nov 12 10:26:36 CET 2018


> 11/11/2018 15:15, Ananyev, Konstantin:
> > Hi Thomas,
> >
> > > Below is my conclusion for this bug.
> > > An expert of x86 is required to follow-up.
> > >
> > > Summary:
> > > 	- CPU: Intel Skylake
> > > 	- Linux environment: Ubuntu 18.04
> > > 	- Compiler: GCC 7 or 8
> > > 	- Scenario: testpmd crashes when it starts forwarding
> > > 	- Behaviour: AVX2 version of rte_memcpy() fails if optimized for AVX512
> > > 	- Context: inline rte_memcpy() is called from
> > > 			inline rte_mempool_put_bulk(), called from
> > > 			mlx5_tx_complete() (inline or not)
> > > 	- Analysis: AVX512 optimization changes vmovdqu to vmovdqu8
> > >
> > > Latest status can be found in Bugzilla:
> > > 	https://bugs.dpdk.org/show_bug.cgi?id=97#c35
> >
> >
> > Looking at dissamled output from the bug report, it seems that the
> > problem is not in vmovdqu8 instruction itself, but in the wrong offsets
> > generated by the compiler:
> >
> >    vmovdqu8 xmm0,XMMWORD PTR [rax*8+0x2]
> >    vinserti128 ymm0,ymm0,XMMWORD PTR [rax*8+0x30],0x1
> >     vmovups XMMWORD PTR [rsi+0x20],xmm0
> >     vextracti128 XMMWORD PTR [rsi+0x30],ymm0,0x1
> >     vmovdqu8 xmm0,XMMWORD PTR [rax*8+0x4]
> >     vinserti128 ymm0,ymm0,XMMWORD PTR [rax*8+0x50],0x1
> >     vmovups XMMWORD PTR [rsi+0x40],xmm0
> >     vextracti128 XMMWORD PTR [rsi+0x50],ymm0,0x1
> >     vmovdqu8 xmm0,XMMWORD PTR [rax*8+0x6]
> >
> > Should be:
> > vmovdqu8 xmm0,XMMWORD PTR [rax*8+0x20]
> > I think.
> >
> > Same for next two offsets: 0x4 and 0x6 respectively should be 0x40 and 0x60.
> 
> Yes, you're right, I missed it, thank you!
> 
> The full diff is below:
> 
> --- bad-avx512-enabled
> +++ good-avx512-disabled
> -    vmovdqu8 xmm0,XMMWORD PTR [rax*8+0x0]
> +    vmovdqu xmm0,XMMWORD PTR [rax*8+0x0]
>      vinserti128 ymm0,ymm0,XMMWORD PTR [rax*8+0x10],0x1
>      vmovups XMMWORD PTR [rsi],xmm0
>      vextracti128 XMMWORD PTR [rsi+0x10],ymm0,0x1
> -    vmovdqu8 xmm0,XMMWORD PTR [rax*8+0x2]
> +    vmovdqu xmm0,XMMWORD PTR [rax*8+0x20]
>      vinserti128 ymm0,ymm0,XMMWORD PTR [rax*8+0x30],0x1
>      vmovups XMMWORD PTR [rsi+0x20],xmm0
>      vextracti128 XMMWORD PTR [rsi+0x30],ymm0,0x1
> -    vmovdqu8 xmm0,XMMWORD PTR [rax*8+0x4]
> +    vmovdqu xmm0,XMMWORD PTR [rax*8+0x40]
>      vinserti128 ymm0,ymm0,XMMWORD PTR [rax*8+0x50],0x1
>      vmovups XMMWORD PTR [rsi+0x40],xmm0
>      vextracti128 XMMWORD PTR [rsi+0x50],ymm0,0x1
> -    vmovdqu8 xmm0,XMMWORD PTR [rax*8+0x6]
> +    vmovdqu xmm0,XMMWORD PTR [rax*8+0x60]
>      vinserti128 ymm0,ymm0,XMMWORD PTR [rax*8+0x70],0x1
>      vmovups XMMWORD PTR [rsi+0x60],xmm0
>      vextracti128 XMMWORD PTR [rsi+0x70],ymm0,0x1
> 
> > Not sure what causing compiler behaves that way.
> > BTW, looking though testpmd objdump output - it seems that only mlx5 driver
> > exhibits such problem (I didn't enable mlx4 actually, probably same problem here).
> > Which looks a bit weird to me.
> 
> Yes it's weird. I don't see how the mlx5 code could influence
> the compiler to generate this bad code in AVX512 mode.

Same here, looked through mlx5_rxtx code, it is unclear to me
what triggers the issue.
So far looks like gcc bug to me.
Konstantin



More information about the dev mailing list