[dpdk-dev] [PATCH v2] eal/ppc64: improve rte_rdtsc with ppc_get_timebase

Thinh Tran thinhtr at linux.vnet.ibm.com
Mon Feb 10 18:53:32 CET 2020


Hi, Sorry for late response.
Yes this is the enhancement for powerpc. Observations on our power8/9
the __ppc_get_timebase calls __builtin_ppc_get_timebase () which is 
result in calling the mftb instruction
    __ppc_get_timebase():
       mftb     rA
this instruction on a 64-bit implementation copies the entire time base 
(TBU||TBL) into rA, which also reduces number of cycles significantly 
comparing to the current code (same as last block)

Take the simple reciprocal division perf test on power9 that heavily 
calls rte_rdtsc() to demonstrate:
- without this batch:
Validating unsigned 32bit division.
32bit Division results:
Total number of cycles normal division     : 73744549935
Total number of cycles reciprocal division : 76954877143
Cycles per division(normal) : 17.17
Cycles per division(reciprocal) : 17.92
Validating unsigned 64bit division.
64bit Division results:
Total number of cycles normal division     : 73932937051
Total number of cycles reciprocal division : 74598584339
Cycles per division(normal) : 17.21
Cycles per division(reciprocal) : 17.37
Validating unsigned 64bit division with 32bit divisor.
64bit Division results:
Total number of cycles normal division     : 78660556171
Total number of cycles reciprocal division : 74566630579
Cycles per division(normal) : 18.31
Cycles per division(reciprocal) : 17.36
Validating division by power of 2.
64bit Division results:
Total number of cycles normal division     : 1097
Total number of cycles reciprocal division : 1201
Cycles per division(normal) : 17.14
Cycles per division(reciprocal) : 18.77
Test OK
RTE>>
- with the patch:
Validating unsigned 32bit division.
32bit Division results:
Total number of cycles normal division     : 41690214596
Total number of cycles reciprocal division : 44446377795
Cycles per division(normal) : 9.71
Cycles per division(reciprocal) : 10.35
Validating unsigned 64bit division.
64bit Division results:
Total number of cycles normal division     : 41687737031
Total number of cycles reciprocal division : 41666358052
Cycles per division(normal) : 9.71
Cycles per division(reciprocal) : 9.70
Validating unsigned 64bit division with 32bit divisor.
64bit Division results:
Total number of cycles normal division     : 46386969228
Total number of cycles reciprocal division : 41663680498
Cycles per division(normal) : 10.80
Cycles per division(reciprocal) : 9.70
Validating division by power of 2.
64bit Division results:
Total number of cycles normal division     : 618
Total number of cycles reciprocal division : 618
Cycles per division(normal) : 9.66
Cycles per division(reciprocal) : 9.66
Test OK
RTE>>

I hope this explains it.
Thanks,
Thinh Tran
On 2/5/2020 3:29 PM, David Marchand wrote:
> On Fri, Jan 31, 2020 at 11:04 PM Thinh Tran <thinhtr at linux.vnet.ibm.com> wrote:
>>
>>    __ppc_get_timebase() is GNU extension and is more efficient
> 
> The commit title and log are quite short and give little idea on what
> this is about.
> 
> 
> I had a look at this glibc helper:
> 
> /* Read the Time Base Register.   */
> static __inline__ uint64_t
> __ppc_get_timebase (void)
> {
> #if __GNUC_PREREQ (4, 8)
>    return __builtin_ppc_get_timebase ();
> #else
> # ifdef __powerpc64__
>    uint64_t __tb;
>    /* "volatile" is necessary here, because the user expects this assembly
>       isn't moved after an optimization.  */
>    __asm__ volatile ("mfspr %0, 268" : "=r" (__tb));
>    return __tb;
> # else  /* not __powerpc64__ */
>    uint32_t __tbu, __tbl, __tmp; \
>    __asm__ volatile ("0:\n\t"
>                      "mftbu %0\n\t"
>                      "mftbl %1\n\t"
>                      "mftbu %2\n\t"
>                      "cmpw %0, %2\n\t"
>                      "bne- 0b"
>                      : "=r" (__tbu), "=r" (__tbl), "=r" (__tmp));
>    return (((uint64_t) __tbu << 32) | __tbl);
> # endif  /* not __powerpc64__ */
> #endif
> }
> 
> The last block is exactly the code we had in dpdk.
> So I suppose we are trying to use mfspr for register 268 which seems
> linked to timebase (looking at the linux kernel sources).
> 
> Please, confirm this is an enhancement (and how this improves current
> ppc support).
> Thanks.
> 
> 


More information about the dev mailing list