[PATCH v9 1/8] eal: generic 64 bit counter

Stephen Hemminger stephen at networkplumber.org
Wed May 22 21:51:53 CEST 2024


On Wed, 22 May 2024 12:01:12 -0700
Tyler Retzlaff <roretzla at linux.microsoft.com> wrote:

> On Wed, May 22, 2024 at 07:57:01PM +0200, Morten Brørup wrote:
> > > From: Stephen Hemminger [mailto:stephen at networkplumber.org]
> > > Sent: Wednesday, 22 May 2024 17.38
> > > 
> > > On Wed, 22 May 2024 10:31:39 +0200
> > > Morten Brørup <mb at smartsharesystems.com> wrote:
> > >   
> > > > > +/* On 32 bit platform, need to use atomic to avoid load/store  
> > > tearing */  
> > > > > +typedef RTE_ATOMIC(uint64_t) rte_counter64_t;  
> > > >
> > > > As shown by Godbolt experiments discussed in a previous thread [2],  
> > > non-tearing 64 bit counters can be implemented without using atomic
> > > instructions on all 32 bit architectures supported by DPDK. So we should
> > > use the counter/offset design pattern for RTE_ARCH_32 too.  
> > > >
> > > > [2]:  
> > > https://inbox.dpdk.org/dev/98CBD80474FA8B44BF855DF32C47DC35E9F433@smarts
> > > erver.smartshare.dk/
> > > 
> > > 
> > > This code built with -O3 and -m32 on godbolt shows split problem.
> > > 
> > > #include <stdint.h>
> > > 
> > > typedef uint64_t rte_counter64_t;
> > > 
> > > void
> > > rte_counter64_add(rte_counter64_t *counter, uint32_t val)
> > > {
> > > 	*counter += val;
> > > }
> > > …	*counter = val;
> > > }
> > > 
> > > rte_counter64_add:
> > >         push    ebx
> > >         mov     eax, DWORD PTR [esp+8]
> > >         xor     ebx, ebx
> > >         mov     ecx, DWORD PTR [esp+12]
> > >         add     DWORD PTR [eax], ecx
> > >         adc     DWORD PTR [eax+4], ebx
> > >         pop     ebx
> > >         ret
> > > 
> > > rte_counter64_read:
> > >         mov     eax, DWORD PTR [esp+4]
> > >         mov     edx, DWORD PTR [eax+4]
> > >         mov     eax, DWORD PTR [eax]
> > >         ret
> > > rte_counter64_set:
> > >         movq    xmm0, QWORD PTR [esp+8]
> > >         mov     eax, DWORD PTR [esp+4]
> > >         movq    QWORD PTR [eax], xmm0
> > >         ret  
> > 
> > Sure, atomic might be required on some 32 bit architectures and/or with some compilers.  
> 
> in theory i think you should be able to use generic atomics and
> depending on the target you get codegen that works. it might be
> something more expensive on 32-bit and nothing on 64-bit etc..
> 
> what's the damage if we just use atomic generic and relaxed ordering? is
> the codegen not optimal?

If we use atomic with relaxed memory order, then compiler for x86 still generates
a locked increment in the fast path. This costs about 100 extra cycles due
to cache and prefetch stall. This whole endeavor is an attempt to avoid that.

PS: looking at the locked increment code for 32 bit involves locked compare
exchange and potential retry. Probably don't care about performance on that platform
anymore.




More information about the dev mailing list