<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Exchange Server">
<!-- converted from rtf -->
<style><!-- .EmailQuote { margin-left: 1pt; padding-left: 4pt; border-left: #800000 2px solid; } --></style>
</head>
<body>
<font face="Calibri" size="2"><span style="font-size:10pt;">
<div style="padding-right:5pt;padding-left:5pt;"><font color="blue">[AMD Official Use Only - AMD Internal Distribution Only]<br>
</font></div>
<div style="margin-top:5pt;"><font face="Times New Roman" size="3"><span style="font-size:12pt;"><br>
</span></font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;"><snipped></span></font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;"> </span></font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;">> > > >> --- a/app/test-pmd/macswap_sse.h</span></font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;">> > > >> +++ b/app/test-pmd/macswap_sse.h</span></font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;">> > > >> @@ -16,13 +16,13 @@ do_macswap(struct rte_mbuf *pkts[], uint16_t</span></font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;">> nb,</span></font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;">> > > >> uint64_t ol_flags;</span></font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;">> > > >> int i;</span></font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;">> > > >> int r;</span></font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;">> > > >> - __m128i addr0, addr1, addr2, addr3;</span></font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;">> > > >> + register __m128i addr0, addr1, addr2, addr3;</span></font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;">> > > > Some compilers treat register as a no-op. Are you sure? Did you check</span></font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;">> with godbolt.</span></font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;">> > ></span></font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;">> > > Thank you Stephen, I have tested the code changes on Linux using GCC</span></font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;">> > > and Clang compiler.</span></font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;">> > ></span></font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;">> > > In both cases in Linux environment, we have seen the the values</span></font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;">> > > loaded onto register `xmm`.</span></font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;">> > ></span></font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;">> > > ```</span></font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;">> > > registerconst__m128i shfl_msk = _mm_set_epi8(15, 14, 13, 12, 5, 4,</span></font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;">> > > 3, 2, 1, 0, 11, 10, 9, 8, 7, 6); vmovdqaxmm0, xmmwordptr[rip+</span></font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;">> > > .LCPI0_0]</span></font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;">> </span></font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;">> Yep, that what I would probably expect: one time load before the loop starts,</span></font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;">> right?</span></font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;">> Curious what exactly it would generate then if 'register' keyword is missed?</span></font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;">> BTW, on my box, gcc-11 with '-O3 -msse4.2 ...' I am seeing expected</span></font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;">> behavior without 'register' keyword.</span></font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;">> Is it some particular compiler version that misbehaves?</span></font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;"> </span></font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;">Thank you, Konstantin, for this pointer. I have been trying this understand this a bit more internally. Here are my observations</span></font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;"> </span></font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;">1. shuf simd ISA works on XMM register only.</span></font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;">2. Any values from variables has to be loaded to `xmm` register before processing.</span></font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;">3. when compiled for `-march=native` with compiler not aware (SoC Arch gcc weights) without patch might have generating with `<font face="Consolas" size="2" color="blue"><span style="font-size:10.5pt;">
</span></font>movzx eax, BYTE PTR [rbp-48]`</span></font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;">4. when register keyword is applied for both shufl_mask and addr, the compiler generates trying to get the variables directly into xmm using ` vmovdqu (%rsi),%xmm1`</span></font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;"> </span></font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;">So, I think you are right, from gcc12.3 and gcc 13.1 which supports `-march=znver4` this problem will not come. </span></font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;"> </span></font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;">> </span></font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;">> > ></span></font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;">> > > ```</span></font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;">> > ></span></font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;">> > > Both cases we have performance improvement.</span></font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;">> > ></span></font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;">> > ></span></font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;">> > > Can you please help us understand if we have missed out something?</span></font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;">> ></span></font></div>
<div><font face="Calibri" size="2"><span style="font-size:11pt;">> > Ok, not sure why compiler would not decide to already use a register here?</span></font></div>
</span></font>
</body>
</html>