[dpdk-dev] [PATCH] Clean up rte_memcpy.h file

Ravi Kerur rkerur at gmail.com
Wed Apr 15 23:00:51 CEST 2015


On Tue, Apr 14, 2015 at 11:32 PM, Pawel Wodkowski <
pawelx.wodkowski at intel.com> wrote:

> On 2015-04-14 23:31, Ravi Kerur wrote:
>
>> +
>> +               for (i = 0; i < 8; i++) {
>> +                       ymm = _mm256_loadu_si256((const __m256i *)(src +
>> i * 32));
>> +                       _mm256_storeu_si256((__m256i *)(dst + i * 32),
>> ymm);
>> +               }
>> +
>>                 n -= 256;
>> -               ymm1 = _mm256_loadu_si256((const __m256i *)((const
>> uint8_t *)src + 1 * 32));
>> -               ymm2 = _mm256_loadu_si256((const __m256i *)((const
>> uint8_t *)src + 2 * 32));
>> -               ymm3 = _mm256_loadu_si256((const __m256i *)((const
>> uint8_t *)src + 3 * 32));
>> -               ymm4 = _mm256_loadu_si256((const __m256i *)((const
>> uint8_t *)src + 4 * 32));
>> -               ymm5 = _mm256_loadu_si256((const __m256i *)((const
>> uint8_t *)src + 5 * 32));
>> -               ymm6 = _mm256_loadu_si256((const __m256i *)((const
>> uint8_t *)src + 6 * 32));
>> -               ymm7 = _mm256_loadu_si256((const __m256i *)((const
>> uint8_t *)src + 7 * 32));
>> -               src = (const uint8_t *)src + 256;
>> -               _mm256_storeu_si256((__m256i *)((uint8_t *)dst + 0 * 32),
>> ymm0);
>> -               _mm256_storeu_si256((__m256i *)((uint8_t *)dst + 1 * 32),
>> ymm1);
>> -               _mm256_storeu_si256((__m256i *)((uint8_t *)dst + 2 * 32),
>> ymm2);
>> -               _mm256_storeu_si256((__m256i *)((uint8_t *)dst + 3 * 32),
>> ymm3);
>> -               _mm256_storeu_si256((__m256i *)((uint8_t *)dst + 4 * 32),
>> ymm4);
>> -               _mm256_storeu_si256((__m256i *)((uint8_t *)dst + 5 * 32),
>> ymm5);
>> -               _mm256_storeu_si256((__m256i *)((uint8_t *)dst + 6 * 32),
>> ymm6);
>> -               _mm256_storeu_si256((__m256i *)((uint8_t *)dst + 7 * 32),
>> ymm7);
>> -               dst = (uint8_t *)dst + 256;
>> +               src = src + 256;
>> +               dst = dst + 256;
>>         }
>>
>
> Did you perform a performance test on that part?
>
>
I ran "make test" which runs "memcpy perf"  results  were given in
"cover-letter". I am pasting it here again.

/**********************With changes*************************************/
Start memcpy_perf:             Success                       [00m 00s]
Memcpy performance autotest:   Success               [09m 36s] [17m 45s]
/**********************Without changes**********************************/
Start memcpy_perf:             Success                       [00m 00s]
Memcpy performance autotest:   Success               [09m 35s] [13m 57s]


-- 
> Pawel
>


More information about the dev mailing list