[PATCH v3] eal: add seqlock

Ola Liljedahl Ola.Liljedahl at arm.com
Sun Apr 3 20:11:46 CEST 2022


(Now using macOS Mail program in plain text mode, hope this works)

> On 2 Apr 2022, at 21:31, Honnappa Nagarahalli <Honnappa.Nagarahalli at arm.com> wrote:
> 
> <snip>
> 
>>> +__rte_experimental
>>> +static inline bool
>>> +rte_seqlock_read_tryunlock(const rte_seqlock_t *seqlock, uint32_t
>>> +*begin_sn) {
>>> +	uint32_t end_sn;
>>> +
>>> +	/* make sure the data loads happens before the sn load */
>>> +	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
>>> +
>>> +	end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED);
>> 
>> Since we are reading and potentially returning the sequence number here
>> (repeating the read of the protected data), we need to use load-acquire.
>> I assume it is not expected that the user will call
>> rte_seqlock_read_lock() again.
> Good point, we need a load-acquire (due to changes done in v3).
> 
>> 
>> Seeing this implementation, I might actually prefer the original
>> implementation, I think it is cleaner. But I would like for the begin function
>> also to wait for an even sequence number, the end function would only have
>> to check for same sequence number, this might improve performance a little
>> bit as readers won't perform one or several broken reads while a write is in
>> progress. The function names are a different thing though.
> I think we need to be optimizing for the case where there is no contention between readers and writers (as that happens most of the time). From this perspective, not checking for an even seq number in the begin function would reduce one 'if' statement.
The number of statements in C is not relevant, instead we need to look at the generated code. On x86, I would assume an if-statement like “if ((sn & 1) || (sn == sl->sn))” to generate two separate evaluations with their own conditional jump instructions. On AArch64, the two evaluations could probably be combined using a CCMP instruction and need only one conditional branch instruction. With branch prediction, it is doubtful we will see any difference in performance.

> 
> Going back to the earlier model is better as well, because of the load-acquire required in the 'rte_seqlock_read_tryunlock' function. The earlier model would not introduce the load-acquire for the no contention case.
The earlier model still had load-acquire in the read_begin function which would have to invoked again. There is no difference in the number or type of memory accesses. We just need to copy the implementation of read_begin into the read_tryunlock function if we decide that the user should not have to re-invoke read_begin on a failed read_tryunlock.

> 
>> 
>> The writer side behaves much more like a lock with mutual exclusion so
>> write_lock/write_unlock makes sense.
>> 
>>> +
>>> +	if (unlikely(end_sn & 1 || *begin_sn != end_sn)) {
>>> +		*begin_sn = end_sn;
>>> +		return false;
>>> +	}
>>> +
>>> +	return true;
>>> +}
>>> +



More information about the dev mailing list