[dpdk-dev] [PATCH] librte_eal: fix mcslock hang on weak memory

Honnappa Nagarahalli Honnappa.Nagarahalli at arm.com
Tue Oct 20 23:49:46 CEST 2020


<snip>

> 
> Honnappa?
> 
> 07/10/2020 11:55, Diogo Behrens:
> > Hi Thomas,
> >
> > we are still waiting for the comments from Honnappa. In our
> > understanding, the missing barrier is a bug according to the model. We
> > reproduced the scenario in herd7, which represents the authoritative
> > memory model:
> > https://developer.arm.com/architectures/cpu-architecture/a-profile/mem
> > ory-model-tool
> >
> > Here is a litmus code that shows that the XCHG (when compiled to LDAXR
> and STLR) is not atomic wrt memory updates to other locations:
> > -----
> > AArch64 XCHG-nonatomic
> > {
> > 0:X1=locked; 0:X3=next;
> > 1:X1=locked; 1:X3=next; 1:X5=tail;
> > }
> >  P0		| P1;
> >  LDR W0, [X3]	| MOV W0, #1;
> >  CBZ W0, end	| STR W0, [X1]; (* init locked *)
> >  MOV W2, #2	| MOV W2, #0;
> >  STR W2, [X1]	| xchg:;
> >  end:		| LDAXR W6, [X5];
> >  NOP		| STLXR W4, W0, [X5];
> >  NOP		| CBNZ W4, xchg;
> >  NOP		| STR W0, [X3]; (* set next *)
> > exists
> > (0:X2=2 /\ locked=1)
> > -----
> > (web version of herd7: http://diy.inria.fr/www/?record=aarch64)
> >
> > P1 is trying to acquire the lock:
> > - initializes locked
> > - does the xchg on the tail of the mcslock
> > - sets the next
> >
> > P0 is releasing the lock:
> > - if next is not set, just terminates
> > - if next is set, stores 2 in locked
> >
> > The initialization of locked should never overwrite the store 2 to locked, but
> it does.
> > To avoid that reordering to happen, one should make the last store of P1 to
> have a "release" barrier, ie, STLR.
> >
> > This is equivalent to the reordering occurring in the mcslock of librte_eal.
> >
> > Best regards,
> > -Diogo
> >
> > -----Original Message-----
> > From: Thomas Monjalon [mailto:thomas at monjalon.net]
> > Sent: Tuesday, October 6, 2020 11:50 PM
> > To: Phil Yang <Phil.Yang at arm.com>; Diogo Behrens
> > <diogo.behrens at huawei.com>; Honnappa Nagarahalli
> > <Honnappa.Nagarahalli at arm.com>
> > Cc: dev at dpdk.org; nd <nd at arm.com>
> > Subject: Re: [dpdk-dev] [PATCH] librte_eal: fix mcslock hang on weak
> > memory
> >
> > 31/08/2020 20:45, Honnappa Nagarahalli:
> > >
> > > Hi Diogo,
> > >
> > > Thanks for your explanation.
> > >
> > > As documented in
> https://developer.arm.com/documentation/ddi0487/fc  B2.9.5 Load-
> Exclusive and Store-Exclusive instruction usage restrictions:
> > > " Between the Load-Exclusive and the Store-Exclusive, there are no
> > > explicit memory accesses, preloads, direct or indirect System
> > > register writes, address translation instructions, cache or TLB
> maintenance instructions, exception generating instructions, exception
> returns, or indirect branches."
> > > [Honnappa] This is a requirement on the software, not on the micro-
> architecture.
> > > We are having few discussions internally, will get back soon.
> > >
> > > So it is not allowed to insert (1) & (4) between (2, 3). The cmpxchg
> operation is atomic.
> >
> >
> > Please what is the conclusion?
Apologies for not updating on this sooner.

Unfortunately, memory ordering questions are hard topics. I have been discussing this internally with few experts and it is still ongoing, hope to conclude soon.

My focus has been to replace __atomic_exchange_n(msl, me, __ATOMIC_ACQ_REL) with __atomic_exchange_n(msl, me, __ATOMIC_SEQ_CST). However, the generated code is the same in the second case as well (for load-store exclusives), which I am not sure if it is correct.

I think we have 2 choices here:
1) Accept the patch - when my internal discussion concludes, I can make the change and backport according to the conclusion.
2) Wait till the discussion is over - it might take another couple of weeks

> 
> 



More information about the dev mailing list