[RFC v2] net/af_packet: make stats reset reliable
Mattias Rönnblom
hofors at lysator.liu.se
Wed May 8 09:23:48 CEST 2024
On 2024-05-08 09:10, Morten Brørup wrote:
>> From: Mattias Rönnblom [mailto:hofors at lysator.liu.se]
>> Sent: Wednesday, 8 May 2024 08.35
>>
>> On 2024-05-07 21:19, Morten Brørup wrote:
>>>> From: Mattias Rönnblom [mailto:hofors at lysator.liu.se]
>>>> Sent: Tuesday, 7 May 2024 09.24
>>>>
>>>> On 2024-04-28 17:11, Mattias Rönnblom wrote:
>>>>> On 2024-04-26 16:38, Ferruh Yigit wrote:
>>>
>>> [...]
>>>
>>>>> static uint64_t
>>>>> counter_value(struct counter *counter)
>>>>> {
>>>>> uint64_t count = __atomic_load_n(&counter->count,
>>>> __ATOMIC_RELAXED);
>>>>> uint64_t offset = __atomic_load_n(&counter->offset,
>>>> __ATOMIC_RELAXED);
>>>>>
>>>>
>>>> Since the count and the offset are written to independently, without any
>>>> ordering restrictions, an update and a reset in quick succession may
>>>> cause the offset store to be globally visible before the new count.
>>>
>>> Good catch.
>>> This may happen when a thread calls stats_add() and then the same thread
>> calls stats_reset().
>>>
>>>> In such a scenario, a reader could see an offset > count.
>>>>
>>>> Thus, unless I'm missing something, one should add a
>>>>
>>>> if (unlikely(offset > count))
>>>> return 0;
>>>>
>>>> here. With the appropriate comment explaining why this might be.
>>>>
>>>> Another approach would be to think about what memory barriers may be
>>>> required to make sure one sees the count update before the offset
>>>> update, but, intuitively, that seems like both more complex and more
>>>> costly (performance-wise).
>>>
>>> I think it can be done without affecting stats_add(), by using "offset" with
>> Release-Consume ordering:
>>> - stats_reset() must write "offset" with memory_order_release, so
>> "counter" cannot be visible after it, and
>>> - stats_get() must read "offset" with memory_order_consume, so no reads or
>> writes in the current thread dependent on "offset" can be reordered before
>> this load, and writes to "counter" (a data-dependent variable) in other
>> threads that release "offset" are visible in the current thread.
>>>
>>
>> That was the kind of complexity I was thinking about. Those barriers
>> come with a non-zero cost, both with different instructions being used
>> and compiler optimizations being prevented.
>
> Yep, you mentioned that there might be a more complex alternative, so I decided to explore it. :-)
>
> This approach doesn't impose any new requirements on stats_add(), so the data plane performance is not affected.
>
OK, I get it now. That's a good point. In my thought experiment, I had a
thread both updating and resetting the counter, which should be allowed,
but you could have barriers sit only in the reset routine.
> For per-thread counters, stats_add() can store "counter" using memory_order_relaxed. Or, if the architecture prevents tearing of 64-bit variables, using volatile.
>
> Counters shared by multiple threads must be atomically incremented using rte_atomic_fetch_add_explicit() with memory_order_relaxed.
>
>>
>>>>
>>>>> return count + offset;
>>>>> }
>>>>>
>>>>> static void
>>>>> counter_reset(struct counter *counter)
>>>>> {
>>>>> uint64_t count = __atomic_load_n(&counter->count,
>>>> __ATOMIC_RELAXED);
>>>>>
>>>>> __atomic_store_n(&counter->offset, count, __ATOMIC_RELAXED);
>>>>> }
>>>>>
>>>>> static void
>>>>> counter_add(struct counter *counter, uint64_t operand)
>>>>> {
>>>>> __atomic_store_n(&counter->count, counter->count + operand,
>>>>> __ATOMIC_RELAXED);
>>>>> }
>>>
More information about the dev
mailing list