[PATCH v9 1/8] eal: generic 64 bit counter

Mattias Rönnblom hofors at lysator.liu.se
Sun May 26 16:39:34 CEST 2024


On 2024-05-22 21:01, Tyler Retzlaff wrote:
> On Wed, May 22, 2024 at 07:57:01PM +0200, Morten Brørup wrote:
>>> From: Stephen Hemminger [mailto:stephen at networkplumber.org]
>>> Sent: Wednesday, 22 May 2024 17.38
>>>
>>> On Wed, 22 May 2024 10:31:39 +0200
>>> Morten Brørup <mb at smartsharesystems.com> wrote:
>>>
>>>>> +/* On 32 bit platform, need to use atomic to avoid load/store
>>> tearing */
>>>>> +typedef RTE_ATOMIC(uint64_t) rte_counter64_t;
>>>>
>>>> As shown by Godbolt experiments discussed in a previous thread [2],
>>> non-tearing 64 bit counters can be implemented without using atomic
>>> instructions on all 32 bit architectures supported by DPDK. So we should
>>> use the counter/offset design pattern for RTE_ARCH_32 too.
>>>>
>>>> [2]:
>>> https://inbox.dpdk.org/dev/98CBD80474FA8B44BF855DF32C47DC35E9F433@smarts
>>> erver.smartshare.dk/
>>>
>>>
>>> This code built with -O3 and -m32 on godbolt shows split problem.
>>>
>>> #include <stdint.h>
>>>
>>> typedef uint64_t rte_counter64_t;
>>>
>>> void
>>> rte_counter64_add(rte_counter64_t *counter, uint32_t val)
>>> {
>>> 	*counter += val;
>>> }
>>> …	*counter = val;
>>> }
>>>
>>> rte_counter64_add:
>>>          push    ebx
>>>          mov     eax, DWORD PTR [esp+8]
>>>          xor     ebx, ebx
>>>          mov     ecx, DWORD PTR [esp+12]
>>>          add     DWORD PTR [eax], ecx
>>>          adc     DWORD PTR [eax+4], ebx
>>>          pop     ebx
>>>          ret
>>>
>>> rte_counter64_read:
>>>          mov     eax, DWORD PTR [esp+4]
>>>          mov     edx, DWORD PTR [eax+4]
>>>          mov     eax, DWORD PTR [eax]
>>>          ret
>>> rte_counter64_set:
>>>          movq    xmm0, QWORD PTR [esp+8]
>>>          mov     eax, DWORD PTR [esp+4]
>>>          movq    QWORD PTR [eax], xmm0
>>>          ret
>>
>> Sure, atomic might be required on some 32 bit architectures and/or with some compilers.
> 
> in theory i think you should be able to use generic atomics and
> depending on the target you get codegen that works. it might be
> something more expensive on 32-bit and nothing on 64-bit etc..
> 
> what's the damage if we just use atomic generic and relaxed ordering? is
> the codegen not optimal?
>   

Below is what I originally proposed in the "make stats reset reliable" 
thread.

struct counter
{
     uint64_t count;
     uint64_t offset;
};

/../
     struct counter rx_pkts;
     struct counter rx_bytes;
/../

static uint64_t
counter_value(const struct counter *counter)
{
     uint64_t count = __atomic_load_n(&counter->count, __ATOMIC_RELAXED);
     uint64_t offset = __atomic_load_n(&counter->offset, __ATOMIC_RELAXED);

     return count - offset;
}

static void
counter_reset(struct counter *counter)
{
     uint64_t count = __atomic_load_n(&counter->count, __ATOMIC_RELAXED);

     __atomic_store_n(&counter->offset, count, __ATOMIC_RELAXED);
}

static void
counter_add(struct counter *counter, uint64_t operand)
{
     __atomic_store_n(&counter->count, counter->count + operand, 
__ATOMIC_RELAXED);
}

I think this solution generally compiles to something that's equivalent 
to just using non-atomic loads/stores and hope for the best.

Using a non-atomic load in counter_add() will generate better code, but 
doesn't work if you using _Atomic (w/o casts).

Atomic load/stores seems to have volatile semantics, so multiple counter 
updates to the same counter cannot be merged. That is a drawback.

>> I envision a variety of 32 bit implementations, optimized for certain architectures/compilers.
>>
>> Some of them can provide non-tearing 64 bit load/store, so we should also use the counter/offset design pattern for those.
>>


More information about the dev mailing list