[dpdk-dev] [PATCH] lib: move rte_ring read barrier to correct location

Ananyev, Konstantin konstantin.ananyev at intel.com
Tue Jul 12 19:58:00 CEST 2016


> 
> 
> Hi Juhamatti,
> 
> >
> > Hello,
> >
> > > > > > -----Original Message-----
> > > > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Juhamatti
> > > > > > Kuusisaari
> > > > > > Sent: Monday, July 11, 2016 11:21 AM
> > > > > > To: dev at dpdk.org
> > > > > > Subject: [dpdk-dev] [PATCH] lib: move rte_ring read barrier to
> > > > > > correct location
> > > > > >
> > > > > > Fix the location of the rte_ring data dependency read barrier.
> > > > > > It needs to be called before accessing indexed data to ensure that
> > > > > > the data itself is guaranteed to be correctly updated.
> > > > > >
> > > > > > See more details at kernel/Documentation/memory-barriers.txt
> > > > > > section 'Data dependency barriers'.
> > > > >
> > > > >
> > > > > Any explanation why?
> > > > > From my point smp_rmb()s are on the proper places here :) Konstantin
> > > >
> > > > The problem here is that on a weak memory model system the CPU is
> > > > allowed to load the address data out-of-order in advance.
> > > > If the read barrier is after the DEQUEUE, you might end up having the
> > > > old data there on a race situation when the buffer is continuously full.
> > > > Having it before the DEQUEUE guarantees that the load is not done in
> > > > advance.
> > >
> > > Sorry, still didn't see any race condition in the current code.
> > > Can you provide any particular example?
> > > From other side, moving smp_rmb() before dequeueing the objects, could
> > > introduce a race condition, on cpus where later writes can be reordered with
> > > earlier reads.
> >
> > Here is a simplified example sequence from time perspective:
> > 1. Consumer CPU (CCPU) loads value y from r->ring[x] out-of-order
> > (the key of the problem)
> 
> To read the value of ring[x] cpu has to calculate x first.
> And to calculate x it needs to read cons.head and prod.tail first.
> Are you saying that some modern cpu can:
>  -'speculate' value of cons.head and  prod.tail
>   (based on what?)
>  -calculate x based on these speculated values.
> - read ring[x]
> - read cons.head and prod.tail
> - if read values are not equal to speculated ones , then
>   re-caluclate x and re-read ring[x]
> - else use speculatively read ring[x]
> ?
> If such thing is possible  (is it really? and if yes on which cpu?),

As I can see, neither ARM or PPC support  such things.
Both of them do obey address dependency.
(ARM & PPC guys feel free to correct me here, if I am wrong here).
So what cpu we are talking about?
Konstantin

> then yes, we might need an extra  smp_rmb() before DEQUEUE_PTRS()
> for __rte_ring_sc_do_dequeue().
> For __rte_ring_mc_do_dequeue(), I think we are ok, as
> there is CAS just before DEQUEUE_PTRS().
> 
> > 2. Producer CPU (PCPU) updates r->ring[x] to value be z
> > 3. PCPU updates prod_tail to be x
> > 4. CCPU updates cons_head to be x
> > 5. CCPU loads r->ring[x] by using out-of-order loaded value y [is z in reality]
> >
> > The problem here is that on weak memory model, the CCPU is allowed to load
> > r->ring[x] value in advance, if it decides to do so (CCPU needs to be able to see
> > in advance that x will be an interesting index worth loading). The index value x
> > is updated atomically,  but it does not matter here. Also, the write barrier on PCPU
> > side guarantees that CCPU cannot see update of x before PCPU has really updated
> > the r->ring[x] to z and moved the tail, but still allows to do the out-of-order loads
> > without proper read barrier.
> >
> > When the read barrier is moved between steps 4 and 5, it disallows to use
> > any out-of-order loads so far and forces to drop r->ring[x] y value and
> > load current value z.
> >
> > The ring queue appears to work well as this is a rare corner case. Due to the
> > head,tail-structure the problem needs queue to be full and also CCPU needs
> > to see r->ring[x] update later than it does the out-of-order load. In addition,
> > the HW needs to be able to predict and choose the load to the future index
> > (which should be quite possible, considering modern CPUs). If you have seen
> > in the past problems and noticed that a larger ring queue works better as a
> > workaround, you may have encountered the problem already.
> 
> I don't understand what means 'larger rings works better' here.
> What we are talking about is  race condition, that if hit, would
> cause data corruption and most likely a crash.
> 
> >
> > It is quite safe to move the barrier before DEQUEUE because after the DEQUEUE
> > there is nothing really that we would want to protect with a read barrier.
> 
> I don't think so.
> If you remove barrier after DEQUEUE(), that means on systems with relaxed memory ordering
> cons.tail could be updated before DEQUEUE() will be finished and producer can overwrite
> queue entries that were not yet dequeued.
> So if cpu can really do such speculative out of order loads,
> then we do need for  __rte_ring_sc_do_dequeue() something like:
> 
> rte_smp_rmb();
> DEQUEUE_PTRS();
> rte_smp_rmb();
> 
> Konstantin
> 
> > The read
> > barrier is mapped to a compiler barrier on strong memory model systems and this
> > works fine too as the order of the head,tail updates is still guaranteed on the new
> > location. Even if the problem would be theoretical on most systems, it is worth fixing
> > as the risk for problems is very low.
> >
> > --
> >  Juhamatti
> >
> > > Konstantin
> >
> >
> >
> >
> > > > > >
> > > > > > Signed-off-by: Juhamatti Kuusisaari
> > > > > > <juhamatti.kuusisaari at coriant.com>
> > > > > > ---
> > > > > >  lib/librte_ring/rte_ring.h | 6 ++++--
> > > > > >  1 file changed, 4 insertions(+), 2 deletions(-)
> > > > > >
> > > > > > diff --git a/lib/librte_ring/rte_ring.h
> > > > > > b/lib/librte_ring/rte_ring.h index eb45e41..a923e49 100644
> > > > > > --- a/lib/librte_ring/rte_ring.h
> > > > > > +++ b/lib/librte_ring/rte_ring.h
> > > > > > @@ -662,9 +662,10 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r,
> > > > > void **obj_table,
> > > > > >                                               cons_next);
> > > > > >         } while (unlikely(success == 0));
> > > > > >
> > > > > > +       rte_smp_rmb();
> > > > > > +
> > > > > >         /* copy in table */
> > > > > >         DEQUEUE_PTRS();
> > > > > > -       rte_smp_rmb();
> > > > > >
> > > > > >         /*
> > > > > >          * If there are other dequeues in progress that preceded
> > > > > > us, @@ -746,9 +747,10 @@ __rte_ring_sc_do_dequeue(struct rte_ring
> > > > > > *r,
> > > > > void **obj_table,
> > > > > >         cons_next = cons_head + n;
> > > > > >         r->cons.head = cons_next;
> > > > > >
> > > > > > +       rte_smp_rmb();
> > > > > > +
> > > > > >         /* copy in table */
> > > > > >         DEQUEUE_PTRS();
> > > > > > -       rte_smp_rmb();
> > > > > >
> > > > > >         __RING_STAT_ADD(r, deq_success, n);
> > > > > >         r->cons.tail = cons_next;
> > > > > > --
> > > > > > 2.9.0
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > ==========================================================
> > > > > ==
> > > > > > The information contained in this message may be privileged and
> > > > > > confidential and protected from disclosure. If the reader of this
> > > > > > message is not the intended recipient, or an employee or agent
> > > > > > responsible for delivering this message to the intended recipient,
> > > > > > you are hereby notified that any reproduction, dissemination or
> > > > > > distribution of this communication is strictly prohibited. If you
> > > > > > have received this communication in error, please notify us
> > > > > > immediately by replying to the message and deleting it from your
> > > computer. Thank you.
> > > > > > Coriant-Tellabs
> > > > > >
> > > > >
> > > ==========================================================
> > > > > ==


More information about the dev mailing list