[RFC] eal: add seqlock

Ola Liljedahl ola.liljedahl at arm.com
Mon Mar 28 16:06:25 CEST 2022



On 3/28/22 12:53, Ananyev, Konstantin wrote:
> 
>>>> diff --git a/lib/eal/include/meson.build b/lib/eal/include/meson.build
>>>> index 9700494816..48df5f1a21 100644
>>>> --- a/lib/eal/include/meson.build
>>>> +++ b/lib/eal/include/meson.build
>>>> @@ -36,6 +36,7 @@ headers += files(
>>>>            'rte_per_lcore.h',
>>>>            'rte_random.h',
>>>>            'rte_reciprocal.h',
>>>> +        'rte_seqlock.h',
>>>>            'rte_service.h',
>>>>            'rte_service_component.h',
>>>>            'rte_string_fns.h',
>>>> diff --git a/lib/eal/include/rte_seqlock.h b/lib/eal/include/rte_seqlock.h
>>>> new file mode 100644
>>>> index 0000000000..b975ca848a
>>>> --- /dev/null
>>>> +++ b/lib/eal/include/rte_seqlock.h
>>>> @@ -0,0 +1,84 @@
>>>> +/* SPDX-License-Identifier: BSD-3-Clause
>>>> + * Copyright(c) 2022 Ericsson AB
>>>> + */
>>>> +
>>>> +#ifndef _RTE_SEQLOCK_H_
>>>> +#define _RTE_SEQLOCK_H_
>>>> +
>>>> +#include <stdbool.h>
>>>> +#include <stdint.h>
>>>> +
>>>> +#include <rte_atomic.h>
>>>> +#include <rte_branch_prediction.h>
>>>> +#include <rte_spinlock.h>
>>>> +
>>>> +struct rte_seqlock {
>>>> +	uint64_t sn;
>>>> +	rte_spinlock_t lock;
>>>> +};
>>>> +
>>>> +typedef struct rte_seqlock rte_seqlock_t;
>>>> +
>>>> +__rte_experimental
>>>> +void
>>>> +rte_seqlock_init(rte_seqlock_t *seqlock);
>>> Probably worth to have static initializer too.
>>>
>>
>> I will add that in the next version, thanks.
>>
>>>> +
>>>> +__rte_experimental
>>>> +static inline uint64_t
>>>> +rte_seqlock_read_begin(const rte_seqlock_t *seqlock)
>>>> +{
>>>> +	/* __ATOMIC_ACQUIRE to prevent loads after (in program order)
>>>> +	 * from happening before the sn load. Syncronizes-with the
>>>> +	 * store release in rte_seqlock_end().
>>>> +	 */
>>>> +	return __atomic_load_n(&seqlock->sn, __ATOMIC_ACQUIRE);
>>>> +}
>>>> +
>>>> +__rte_experimental
>>>> +static inline bool
>>>> +rte_seqlock_read_retry(const rte_seqlock_t *seqlock, uint64_t begin_sn)
>>>> +{
>>>> +	uint64_t end_sn;
>>>> +
>>>> +	/* make sure the data loads happens before the sn load */
>>>> +	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
>>> That's sort of 'read_end' correct?
>>> If so, shouldn't it be '__ATOMIC_RELEASE' instead here,
>>> and
>>> end_sn = __atomic_load_n(..., (__ATOMIC_ACQUIRE)
>>> on the line below?
>>
>> A release fence prevents reordering of stores. The reader doesn't do any
>> stores, so I don't understand why you would use a release fence here.
>> Could you elaborate?
> 
>  From my understanding:
> rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
> serves as a hoist barrier here, so it would only prevent later instructions
> to be executed before that point.
> But it wouldn't prevent earlier instructions to be executed after that point.
> While we do need to guarantee that cpu will finish all previous reads before
> progressing further.
> 
> Suppose we have something like that:
> 
> struct {
> 	uint64_t shared;
> 	rte_seqlock_t lock;
> } data;
> 
> ...
> sn = ...
> uint64_t x = data.shared;
> /* inside rte_seqlock_read_retry(): */
> ...
> rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
> end_sn = __atomic_load_n(&data.lock.sn, __ATOMIC_RELAXED);
> 
> Here we need to make sure that read of data.shared will always happen
> before reading of data.lock.sn.
> It is not a problem on IA (as reads are not reordered), but on machines with
> relaxed memory ordering (ARM, etc.)  it can happen.
> So to prevent it we do need a sink barrier here first (ATOMIC_RELEASE)
We can't use store-release since there is no write on the reader-side.
And fence-release orders against later stores, not later loads.

> 
> Honnappa and other ARM & atomics experts, please correct me if I am wrong here.
The C standard (chapter 7.17.4 in the C11 (draft)) isn't so easy to 
digest. If we trust Preshing, he has a more accessible description here: 
https://preshing.com/20130922/acquire-and-release-fences/
"An acquire fence prevents the memory reordering of any read which 
precedes it in program order with any read or write which follows it in 
program order."
and here: 
https://preshing.com/20131125/acquire-and-release-fences-dont-work-the-way-youd-expect/ 
(for C++ but the definition seems to be identical to that of C11).
Essentially a LoadLoad+LoadStore barrier which is what we want to achieve.

GCC 10.3 for AArch64/A64 ISA generates a "DMB ISHLD" instruction. This 
waits for all loads preceding (in program order) the memory barrier to 
be observed before any memory accesses after (in program order) the 
memory barrier.

I think the key to understanding atomic thread fences is that they are 
not associated with a specific memory access (unlike load-acquire and 
store-release) so they can't order earlier or later memory accesses 
against some specific memory access. Instead the fence orders any/all 
earlier loads and/or stores against any/all later loads or stores 
(depending on acquire or release).

> 
>>>> +
>>>> +	end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED);
>>>> +
>>>> +	return unlikely(begin_sn & 1 || begin_sn != end_sn);
>>>> +}
>>>> +
>>>> +__rte_experimental
>>>> +static inline void
>>>> +rte_seqlock_write_begin(rte_seqlock_t *seqlock)
>>>> +{
>>>> +	uint64_t sn;
>>>> +
>>>> +	/* to synchronize with other writers */
>>>> +	rte_spinlock_lock(&seqlock->lock);
>>>> +
>>>> +	sn = seqlock->sn + 1;
>>>> +
>>>> +	__atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELAXED);
>>>> +
>>>> +	/* __ATOMIC_RELEASE to prevent stores after (in program order)
>>>> +	 * from happening before the sn store.
>>>> +	 */
>>>> +	rte_atomic_thread_fence(__ATOMIC_RELEASE);
>>> I think it needs to be '__ATOMIC_ACQUIRE' here instead of '__ATOMIC_RELEASE'.
>>
>> Please elaborate on why.
> 
> As you said in the comments above, we need to prevent later stores
> to be executed before that point. So we do need a hoist barrier here.
> AFAIK to guarantee a hoist barrier '__ATOMIC_ACQUIRE' is required.
An acquire fence wouldn't order an earlier store (the write to 
seqlock->sn) from being reordered with some later store (e.g. writes to 
the protected data), thus it would allow readers to see updated data 
(possibly torn) with a pre-update sequence number. We need a StoreStore 
barrier for ordering the SN store and data stores => fence(release).

Acquire and releases fences can (also) be used to create 
synchronize-with relationships (this is how the C standard defines 
them). Preshing has a good example on this. Basically
Thread 1:
data = 242;
atomic_thread_fence(atomic_release);
atomic_store_n(&guard, 1, atomic_relaxed);

Thread 2:
while (atomic_load_n(&guard, atomic_relaxed) != 1) ;
atomic_thread_fence(atomic_acquire);
do_something(data);

These are obvious analogues to store-release and load-acquire, thus the 
acquire & release names of the fences.

- Ola

> 
>>
>>>> +}
>>>> +
>>>> +__rte_experimental
>>>> +static inline void
>>>> +rte_seqlock_write_end(rte_seqlock_t *seqlock)
>>>> +{
>>>> +	uint64_t sn;
>>>> +
>>>> +	sn = seqlock->sn + 1;
>>>> +
>>>> +	/* synchronizes-with the load acquire in rte_seqlock_begin() */
>>>> +	__atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELEASE);
>>>> +
>>>> +	rte_spinlock_unlock(&seqlock->lock);
>>>> +}
>>>> +
> 


More information about the dev mailing list