[dpdk-dev] [PATCH] librte_eal: fix mcslock hang on weak memory

Honnappa Nagarahalli Honnappa.Nagarahalli at arm.com
Mon Nov 23 16:06:06 CET 2020


<snip>

> > >
> > > 07/10/2020 11:55, Diogo Behrens:
> > > > Hi Thomas,
> > > >
> > > > we are still waiting for the comments from Honnappa. In our
> > > > understanding, the missing barrier is a bug according to the
> > > > model. We reproduced the scenario in herd7, which represents the
> > > > authoritative memory model:
> > > > https://developer.arm.com/architectures/cpu-architecture/a-profile
> > > > /mem
> > > > ory-model-tool
> > > >
> > > > Here is a litmus code that shows that the XCHG (when compiled to
> > > > LDAXR
> > > and STLR) is not atomic wrt memory updates to other locations:
> > > > -----
> > > > AArch64 XCHG-nonatomic
> > > > {
> > > > 0:X1=locked; 0:X3=next;
> > > > 1:X1=locked; 1:X3=next; 1:X5=tail; }
> > > >  P0		| P1;
> > > >  LDR W0, [X3]	| MOV W0, #1;
> > > >  CBZ W0, end	| STR W0, [X1]; (* init locked *)
> > > >  MOV W2, #2	| MOV W2, #0;
> > > >  STR W2, [X1]	| xchg:;
> > > >  end:		| LDAXR W6, [X5];
> > > >  NOP		| STLXR W4, W0, [X5];
> > > >  NOP		| CBNZ W4, xchg;
> > > >  NOP		| STR W0, [X3]; (* set next *)
> > > > exists
> > > > (0:X2=2 /\ locked=1)
> > > > -----
> > > > (web version of herd7: http://diy.inria.fr/www/?record=aarch64)
> > > >
> > > > P1 is trying to acquire the lock:
> > > > - initializes locked
> > > > - does the xchg on the tail of the mcslock
> > > > - sets the next
> > > >
> > > > P0 is releasing the lock:
> > > > - if next is not set, just terminates
> > > > - if next is set, stores 2 in locked
> > > >
> > > > The initialization of locked should never overwrite the store 2 to
> > > > locked, but
> > > it does.
> > > > To avoid that reordering to happen, one should make the last store
> > > > of P1 to
> > > have a "release" barrier, ie, STLR.
> > > >
> > > > This is equivalent to the reordering occurring in the mcslock of librte_eal.
> > > >
> > > > Best regards,
> > > > -Diogo
> > > >
> > > > -----Original Message-----
> > > > From: Thomas Monjalon [mailto:thomas at monjalon.net]
> > > > Sent: Tuesday, October 6, 2020 11:50 PM
> > > > To: Phil Yang <Phil.Yang at arm.com>; Diogo Behrens
> > > > <diogo.behrens at huawei.com>; Honnappa Nagarahalli
> > > > <Honnappa.Nagarahalli at arm.com>
> > > > Cc: dev at dpdk.org; nd <nd at arm.com>
> > > > Subject: Re: [dpdk-dev] [PATCH] librte_eal: fix mcslock hang on
> > > > weak memory
> > > >
> > > > 31/08/2020 20:45, Honnappa Nagarahalli:
> > > > >
> > > > > Hi Diogo,
> > > > >
> > > > > Thanks for your explanation.
> > > > >
> > > > > As documented in
> > > https://developer.arm.com/documentation/ddi0487/fc  B2.9.5 Load-
> > > Exclusive and Store-Exclusive instruction usage restrictions:
> > > > > " Between the Load-Exclusive and the Store-Exclusive, there are
> > > > > no explicit memory accesses, preloads, direct or indirect System
> > > > > register writes, address translation instructions, cache or TLB
> > > maintenance instructions, exception generating instructions,
> > > exception returns, or indirect branches."
> > > > > [Honnappa] This is a requirement on the software, not on the
> > > > > micro-
> > > architecture.
> > > > > We are having few discussions internally, will get back soon.
> > > > >
> > > > > So it is not allowed to insert (1) & (4) between (2, 3). The
> > > > > cmpxchg
> > > operation is atomic.
> > > >
> > > >
> > > > Please what is the conclusion?
> > Apologies for not updating on this sooner.
> >
> > Unfortunately, memory ordering questions are hard topics. I have been
> discussing this internally with few experts and it is still ongoing, hope to
> conclude soon.
> >
> > My focus has been to replace __atomic_exchange_n(msl, me,
> __ATOMIC_ACQ_REL) with __atomic_exchange_n(msl, me,
> __ATOMIC_SEQ_CST). However, the generated code is the same in the second
> case as well (for load-store exclusives), which I am not sure if it is correct.
> >
> > I think we have 2 choices here:
> > 1) Accept the patch - when my internal discussion concludes, I can make the
> change and backport according to the conclusion.
> > 2) Wait till the discussion is over - it might take another couple of
> > weeks
> 
> One month passed since this last update.
> We are keeping this issue in DPDK 20.11.0 I guess.
> 
I can accept this patch and move forward for 20.11. It is a stronger barrier and I do not see any issues from the code perspective. I will run tests on few platforms and provide my ACK.

It is work in progress with few changes for me to make sure we have an optimal solution for all platforms. Those changes can go into 21.02.


More information about the dev mailing list