[dpdk-dev] [PATCH v4 16/17] ring: add sched_yield to avoid spin forever

Ananyev, Konstantin konstantin.ananyev at intel.com
Mon Feb 9 16:43:10 CET 2015


Hi Olivier,

> -----Original Message-----
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Olivier MATZ
> Sent: Friday, February 06, 2015 3:20 PM
> To: Liang, Cunming; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v4 16/17] ring: add sched_yield to avoid spin forever
> 
> Hi,
> 
> On 02/02/2015 03:02 AM, Cunming Liang wrote:
> > Add a sched_yield() syscall if the thread spins for too long, waiting other thread to finish its operations on the ring.
> > That gives pre-empted thread a chance to proceed and finish with ring enqnue/dequeue operation.
> > The purpose is to reduce contention on the ring.
> >
> > Signed-off-by: Cunming Liang <cunming.liang at intel.com>
> > ---
> >  lib/librte_ring/rte_ring.h | 35 +++++++++++++++++++++++++++++------
> >  1 file changed, 29 insertions(+), 6 deletions(-)
> >
> > diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
> > index 39bacdd..c402c73 100644
> > --- a/lib/librte_ring/rte_ring.h
> > +++ b/lib/librte_ring/rte_ring.h
> > @@ -126,6 +126,7 @@ struct rte_ring_debug_stats {
> >
> >  #define RTE_RING_NAMESIZE 32 /**< The maximum length of a ring name. */
> >  #define RTE_RING_MZ_PREFIX "RG_"
> > +#define RTE_RING_PAUSE_REP 0x100  /**< yield after num of times pause. */
> >
> >  /**
> >   * An RTE ring structure.
> > @@ -410,7 +411,7 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
> >  	uint32_t cons_tail, free_entries;
> >  	const unsigned max = n;
> >  	int success;
> > -	unsigned i;
> > +	unsigned i, rep;
> >  	uint32_t mask = r->prod.mask;
> >  	int ret;
> >
> > @@ -468,8 +469,19 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
> >  	 * If there are other enqueues in progress that preceded us,
> >  	 * we need to wait for them to complete
> >  	 */
> > -	while (unlikely(r->prod.tail != prod_head))
> > -		rte_pause();
> > +	do {
> > +		/* avoid spin too long waiting for other thread finish */
> > +		for (rep = RTE_RING_PAUSE_REP;
> > +		     rep != 0 && r->prod.tail != prod_head; rep--)
> > +			rte_pause();
> > +
> > +		/*
> > +		 * It gives pre-empted thread a chance to proceed and
> > +		 * finish with ring enqnue operation.
> > +		 */
> > +		if (rep == 0)
> > +			sched_yield();
> > +	} while (rep == 0);
> >
> >  	r->prod.tail = prod_next;
> >  	return ret;
> > @@ -589,7 +601,7 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
> >  	uint32_t cons_next, entries;
> >  	const unsigned max = n;
> >  	int success;
> > -	unsigned i;
> > +	unsigned i, rep;
> >  	uint32_t mask = r->prod.mask;
> >
> >  	/* move cons.head atomically */
> > @@ -634,8 +646,19 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
> >  	 * If there are other dequeues in progress that preceded us,
> >  	 * we need to wait for them to complete
> >  	 */
> > -	while (unlikely(r->cons.tail != cons_head))
> > -		rte_pause();
> > +	do {
> > +		/* avoid spin too long waiting for other thread finish */
> > +		for (rep = RTE_RING_PAUSE_REP;
> > +		     rep != 0 && r->cons.tail != cons_head; rep--)
> > +			rte_pause();
> > +
> > +		/*
> > +		 * It gives pre-empted thread a chance to proceed and
> > +		 * finish with ring denqnue operation.
> > +		 */
> > +		if (rep == 0)
> > +			sched_yield();
> > +	} while (rep == 0);
> >
> >  	__RING_STAT_ADD(r, deq_success, n);
> >  	r->cons.tail = cons_next;
> >
> 
> The ring library was designed with the assumption that the code is not
> preemptable. The code is lock-less but not wait-less. Actually, if the
> code is preempted at a bad moment, it can spin forever until it's
> unscheduled.
> 
> I wonder if adding a sched_yield() may not penalize the current
> implementations that only use one pthread per core? Even if there
> is only one pthread in the scheduler queue for this CPU, calling
> the scheduler code may cost thousands of cycles.
> 
> Also, where does this value "RTE_RING_PAUSE_REP 0x100" comes from?
> Why 0x100 is better than 42 or than 10000?

The idea was to have something few times bigger than actual number
active cores in the system, to minimise chance of  a sched_yield() being called
for the case when we have one thread per physical core.  
My thought was that having that many repeats would make such chance neglectable.
Though, right now, I don't have any data to back it up.     

> I think it could be good to check if there is a performance impact
> with this change, especially where there is a lot of contention on
> the ring. If it has an impact, what about adding a compile or runtime
> option?

Good idea, probably we should make RTE_RING_PAUSE_REP  configuration option
and let say avoid emitting ' sched_yield();' at all, if  RTE_RING_PAUSE_REP == 0.

Konstantin

> 
> 
> Regards,
> Olivier


More information about the dev mailing list