<div style="line-height:1.7;color:#000000;font-size:14px;font-family:Arial"><div style="margin: 0;">Hi,Stephen</div><div style="margin: 0;"><span style="font-size: 14px;">Thank you very much for your reply!</span></div><div style="margin: 0;"><span style="font-size: 14px;">></span><span style="font-family: arial; white-space: pre-wrap;">I would just replace all of the rte_memcpy with memcpy</span> </div><div style="margin: 0;"><span style="font-size: 14px;">I will </span><span style="font-family: arial; white-space: pre-wrap;">replace all of the rte_memcpy with memcpy</span><span style="font-size: 14px;">.</span></div><div style="margin: 0;"><pre style="width: 826.78px; word-break: break-word !important;">>I expect that rte_memcpy() is able to do better than memcpy() for larger copies because it is >likely to use bigger vector instructions and check for alignment. >For small copies just doing the mov's directly is going to be as fast or faster. >In fact, lots of places in DPDK should >replace rte_memcpy() with simple structure assignment to preserve type safety.</pre><pre style="width: 826.78px; word-break: break-word !important;">I don't know the dividing line(the size of the data) between rte_memcpy and memcpy.</pre><pre style="width: 826.78px; word-break: break-word !important;">We simply test 1500 bytes of replication, memcpy seems to be faster, maybe our test is not accurate enough.</pre><pre style="width: 826.78px; word-break: break-word !important;">>This is somewhat historical data, it might be wrong. It would be worthwhile to have benchmarks >across different sizes (variable and fixed), different compilers, and different CPU's. >There might be surprising results.</pre><pre style="width: 826.78px; word-break: break-word !important;">So I hope this can go on and provide a more professional rte_memcpy manual.Thanks!</pre><pre style="width: 826.78px; word-break: break-word !important;">Huichao,Cai</pre><pre style="width: 826.78px; word-break: break-word !important;"><br></pre></div></div>