[dpdk-dev] [PATCH] eal: fix threads block on barrier

Stephen Hemminger stephen at networkplumber.org
Sat Apr 28 03:21:41 CEST 2018


On Fri, 27 Apr 2018 21:52:26 +0200
Thomas Monjalon <thomas at monjalon.net> wrote:

> 27/04/2018 19:45, Shreyansh Jain:
> > From: Stephen Hemminger [mailto:stephen at networkplumber.org]  
> > > Shreyansh Jain <shreyansh.jain at nxp.com> wrote:  
> > > > From: Jianfeng Tan  
> > > > > Below commit introduced pthread barrier for synchronization.
> > > > > But two IPC threads block on the barrier, and never wake up.
> > > > >
> > > > >   (gdb) bt
> > > > >   #0  futex_wait (private=0, expected=0, futex_word=0x7fffffffcff4)
> > > > >       at ../sysdeps/unix/sysv/linux/futex-internal.h:61
> > > > >   #1  futex_wait_simple (private=0, expected=0,
> > > > > futex_word=0x7fffffffcff4)
> > > > >       at ../sysdeps/nptl/futex-internal.h:135
> > > > >   #2  __pthread_barrier_wait (barrier=0x7fffffffcff0) at
> > > > > pthread_barrier_wait.c:184
> > > > >   #3  rte_thread_init (arg=0x7fffffffcfe0)
> > > > >       at ../dpdk/lib/librte_eal/common/eal_common_thread.c:160
> > > > >   #4  start_thread (arg=0x7ffff6ecf700) at pthread_create.c:333
> > > > >   #5  clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
> > > > >
> > > > > Through analysis, we find the barrier defined on the stack
> > > > > could be the root cause. This patch will change to use heap
> > > > > memory as the barrier.
> > > > >
> > > > > Fixes: d651ee4919cd ("eal: set affinity for control threads")
> > > > >
> > > > > Cc: Olivier Matz <olivier.matz at 6wind.com>
> > > > > Cc: Anatoly Burakov <anatoly.burakov at intel.com>
> > > > >
> > > > > Signed-off-by: Jianfeng Tan <jianfeng.tan at intel.com>  
> > > >
> > > > Though I have seen Stephen's comment on this (possibly a library  
> > > bug), this at least fixes an issue which was dogging dpaa and dpaa2 -
> > > generating bus errors and futex errors with variation in core masks
> > > provided to applications.  
> > > >
> > > > Thanks a lot for this.
> > > >
> > > > Acked-by: Shreyansh Jain <shreyansh.jain at nxp.com>  
> 
> Applied, thanks Jianfeng.
> 
> > > Could you verify there is not a use after free by using valgrind or
> > > some library that poisons memory on free.  
> > 
> > I will probably do that soon - but for the time being I don't want
> > this issue to block the dpaa/dpaa2 for RC1 - these drivers were
> > completely unusable without this patch.  
> 
> Please Shreyansh, continue the analysis of this bug.
> Thanks
> 
> 

I think the patch needs to change.
The attributes need be either global (or leak and never free).

The glibc source for init keeps the pointer to the attributes.


static const struct pthread_barrierattr default_barrierattr =
  {
    .pshared = PTHREAD_PROCESS_PRIVATE
  };


int
__pthread_barrier_init (pthread_barrier_t *barrier,
			const pthread_barrierattr_t *attr, unsigned int count)
{
  struct pthread_barrier *ibarrier;

  /* XXX EINVAL is not specified by POSIX as a possible error code for COUNT
     being too large.  See pthread_barrier_wait for the reason for the
     comparison with BARRIER_IN_THRESHOLD.  */
  if (__glibc_unlikely (count == 0 || count >= BARRIER_IN_THRESHOLD))
    return EINVAL;

  const struct pthread_barrierattr *iattr
    = (attr != NULL
       ? (struct pthread_barrierattr *) attr
       : &default_barrierattr);

  ibarrier = (struct pthread_barrier *) barrier;

  /* Initialize the individual fields.  */
  ibarrier->in = 0;
  ibarrier->out = 0;
  ibarrier->count = count;
  ibarrier->current_round = 0;
  ibarrier->shared = (iattr->pshared == PTHREAD_PROCESS_PRIVATE
		      ? FUTEX_PRIVATE : FUTEX_SHARED);

  return 0;
}
weak_alias (__pthread_barrier_init, pthread_barrier_init)


More information about the dev mailing list