[PATCH v9 1/8] eal: generic 64 bit counter
Mattias Rönnblom
hofors at lysator.liu.se
Sun May 26 16:39:34 CEST 2024
On 2024-05-22 21:01, Tyler Retzlaff wrote:
> On Wed, May 22, 2024 at 07:57:01PM +0200, Morten Brørup wrote:
>>> From: Stephen Hemminger [mailto:stephen at networkplumber.org]
>>> Sent: Wednesday, 22 May 2024 17.38
>>>
>>> On Wed, 22 May 2024 10:31:39 +0200
>>> Morten Brørup <mb at smartsharesystems.com> wrote:
>>>
>>>>> +/* On 32 bit platform, need to use atomic to avoid load/store
>>> tearing */
>>>>> +typedef RTE_ATOMIC(uint64_t) rte_counter64_t;
>>>>
>>>> As shown by Godbolt experiments discussed in a previous thread [2],
>>> non-tearing 64 bit counters can be implemented without using atomic
>>> instructions on all 32 bit architectures supported by DPDK. So we should
>>> use the counter/offset design pattern for RTE_ARCH_32 too.
>>>>
>>>> [2]:
>>> https://inbox.dpdk.org/dev/98CBD80474FA8B44BF855DF32C47DC35E9F433@smarts
>>> erver.smartshare.dk/
>>>
>>>
>>> This code built with -O3 and -m32 on godbolt shows split problem.
>>>
>>> #include <stdint.h>
>>>
>>> typedef uint64_t rte_counter64_t;
>>>
>>> void
>>> rte_counter64_add(rte_counter64_t *counter, uint32_t val)
>>> {
>>> *counter += val;
>>> }
>>> … *counter = val;
>>> }
>>>
>>> rte_counter64_add:
>>> push ebx
>>> mov eax, DWORD PTR [esp+8]
>>> xor ebx, ebx
>>> mov ecx, DWORD PTR [esp+12]
>>> add DWORD PTR [eax], ecx
>>> adc DWORD PTR [eax+4], ebx
>>> pop ebx
>>> ret
>>>
>>> rte_counter64_read:
>>> mov eax, DWORD PTR [esp+4]
>>> mov edx, DWORD PTR [eax+4]
>>> mov eax, DWORD PTR [eax]
>>> ret
>>> rte_counter64_set:
>>> movq xmm0, QWORD PTR [esp+8]
>>> mov eax, DWORD PTR [esp+4]
>>> movq QWORD PTR [eax], xmm0
>>> ret
>>
>> Sure, atomic might be required on some 32 bit architectures and/or with some compilers.
>
> in theory i think you should be able to use generic atomics and
> depending on the target you get codegen that works. it might be
> something more expensive on 32-bit and nothing on 64-bit etc..
>
> what's the damage if we just use atomic generic and relaxed ordering? is
> the codegen not optimal?
>
Below is what I originally proposed in the "make stats reset reliable"
thread.
struct counter
{
uint64_t count;
uint64_t offset;
};
/../
struct counter rx_pkts;
struct counter rx_bytes;
/../
static uint64_t
counter_value(const struct counter *counter)
{
uint64_t count = __atomic_load_n(&counter->count, __ATOMIC_RELAXED);
uint64_t offset = __atomic_load_n(&counter->offset, __ATOMIC_RELAXED);
return count - offset;
}
static void
counter_reset(struct counter *counter)
{
uint64_t count = __atomic_load_n(&counter->count, __ATOMIC_RELAXED);
__atomic_store_n(&counter->offset, count, __ATOMIC_RELAXED);
}
static void
counter_add(struct counter *counter, uint64_t operand)
{
__atomic_store_n(&counter->count, counter->count + operand,
__ATOMIC_RELAXED);
}
I think this solution generally compiles to something that's equivalent
to just using non-atomic loads/stores and hope for the best.
Using a non-atomic load in counter_add() will generate better code, but
doesn't work if you using _Atomic (w/o casts).
Atomic load/stores seems to have volatile semantics, so multiple counter
updates to the same counter cannot be merged. That is a drawback.
>> I envision a variety of 32 bit implementations, optimized for certain architectures/compilers.
>>
>> Some of them can provide non-tearing 64 bit load/store, so we should also use the counter/offset design pattern for those.
>>
More information about the dev
mailing list