[dpdk-dev] [PATCH v2] ring: check for zero objects mc dequeue / mp enqueue

Bruce Richardson bruce.richardson at intel.com
Tue Mar 29 10:54:43 CEST 2016


On Mon, Mar 28, 2016 at 06:48:07PM +0300, Lazaros Koromilas wrote:
> Hi Olivier,
> 
> We could have two threads (running on different cores in the general
> case) that both succeed the cmpset operation. In the dequeue path,
> when n == 0, then cons_next == cons_head, and cmpset will always
> succeed. Now, if they both see an old r->cons.tail value from a
> previous dequeue, they can get stuck in the while

Hi,

I don't see how threads reading an "old r->cons.tail" value is even possible.
The head and tail pointers on the ring are marked in the code as volatile, so
all reads and writes to those values are always done from memory and not cached
in registers. No deadlock should be possible on that while loop, unless a 
process crashes in the middle of a ring operation. Each thread which updates
the head pointer from x to y, is responsible for updating the tail pointer in
a similar manner. The loop ensures the tail updates are in the same order as the
head updates.

If you believe deadlock is possible, can you outline the sequence of operations
which would lead to such a state, because I cannot see how it could occur without
a crash inside one of the threads.

/Bruce

> (unlikely(r->cons.tail != cons_head)) loop. I tried, however, to
> reproduce (without the patch) and it seems that there is still a
> window for deadlock.
> 
> I'm pasting some debug output below that shows two processes' state.
> It's two receivers doing interleaved mc_dequeue(32)/mc_dequeue(0), and
> one sender doing mp_enqueue(32) on the same ring.
> 
> gdb --args ./ring-test -l0 --proc-type=primary
> gdb --args ./ring-test -l1 --proc-type=secondary
> gdb --args ./ring-test -l2 --proc-type=secondary -- tx
> 
> This is what I would usually see, process 0 and 1 both stuck at the same state:
> 
> 663             while (unlikely(r->cons.tail != cons_head)) {
> (gdb) p n
> $1 = 0
> (gdb) p r->cons.tail
> $2 = 576416
> (gdb) p cons_head
> $3 = 576448
> (gdb) p cons_next
> $4 = 576448
> 
> But now I managed to get the two processes stuck at this state too.
> 
> process 0:
> 663             while (unlikely(r->cons.tail != cons_head)) {
> (gdb) p n
> $1 = 32
> (gdb) p r->cons.tail
> $2 = 254348832
> (gdb) p cons_head
> $3 = 254348864
> (gdb) p cons_next
> $4 = 254348896
> 
> proccess 1:
> 663             while (unlikely(r->cons.tail != cons_head)) {
> (gdb) p n
> $1 = 32
> (gdb) p r->cons.tail
> $2 = 254348832
> (gdb) p cons_head
> $3 = 254348896
> (gdb) p cons_next
> $4 = 254348928
> 

Where is the thread which updated the head pointer from 832 to 864? That thread
was the one which would update the tail pointer to 864 to allow your thread 0
to continue.

/Bruce

> I haven't been able to trigger this with the patch so far, but it
> should be possible.
> 
> Lazaros.
> 
> On Fri, Mar 25, 2016 at 1:15 PM, Olivier Matz <olivier.matz at 6wind.com> wrote:
> > Hi Lazaros,
> >
> > On 03/17/2016 04:49 PM, Lazaros Koromilas wrote:
> >> Issuing a zero objects dequeue with a single consumer has no effect.
> >> Doing so with multiple consumers, can get more than one thread to succeed
> >> the compare-and-set operation and observe starvation or even deadlock in
> >> the while loop that checks for preceding dequeues.  The problematic piece
> >> of code when n = 0:
> >>
> >>     cons_next = cons_head + n;
> >>     success = rte_atomic32_cmpset(&r->cons.head, cons_head, cons_next);
> >>
> >> The same is possible on the enqueue path.
> >
> > Just a question about this patch (that has been applied). Thomas
> > retitled the commit from your log message:
> >
> >   ring: fix deadlock in zero object multi enqueue or dequeue
> >   http://dpdk.org/browse/dpdk/commit/?id=d0979646166e
> >
> > I think this patch does not fix a deadlock, or did I miss something?
> >
> > As explained in the following links, the ring may not perform well
> > if several threads running on the same cpu use it:
> >
> >   http://dpdk.org/ml/archives/dev/2013-November/000714.html
> >   http://www.dpdk.org/ml/archives/dev/2014-January/001070.html
> >   http://www.dpdk.org/ml/archives/dev/2014-January/001162.html
> >   http://dpdk.org/ml/archives/dev/2015-July/020659.html
> >
> > A deadlock could occur if the threads running on the same core
> > have different priority.
> >
> > Regards,
> > Olivier


More information about the dev mailing list