[dpdk-dev] [PATCH v1 1/2] app/test: remove unnecessary barriers for ring stress test

Stephen Hemminger stephen at networkplumber.org
Fri Jan 29 05:58:47 CET 2021


On Fri, 29 Jan 2021 03:17:50 +0000
Honnappa Nagarahalli <Honnappa.Nagarahalli at arm.com> wrote:

> <snip>
> 
> >   
> > > >
> > > > Hi Feifei,
> > > >  
> > > > >
> > > > > The variable "wrk_cmd" is a signal to control threads from running
> > > > > and stopping. When worker lcores load "wrk_cmd ==  
> > WRK_CMD_RUN",  
> > > > > they  
> > > > start  
> > > > > running and when worker lcores load "wrk_cmd == WRK_CMD_STOP",  
> > > > they  
> > > > > stop.
> > > > >
> > > > > For the wmb in test_mt1, no storing operations must keep the order
> > > > > after storing "wrk_cmd". Thus the wmb is unnecessary.  
> > > >
> > > > I think there is a bug in my original code, we should do smp_wmb()
> > > > *before* setting wrk_cmd, not after:
> > > >
> > > >         /* launch on all workers */
> > > >         RTE_LCORE_FOREACH_WORKER(lc) {
> > > >                 arg[lc].rng = r;
> > > >                 arg[lc].stats = init_stat;
> > > >                 rte_eal_remote_launch(test, &arg[lc], lc);
> > > >         }
> > > >
> > > >         /* signal worker to start test */
> > > > +      rte_smp_wmb();
> > > >         wrk_cmd = WRK_CMD_RUN;
> > > > -       rte_smp_wmb();
> > > >
> > > >         usleep(run_time * US_PER_S);
> > > >
> > > >
> > > > I still think we'd better have some synchronisation here.
> > > > Otherwise what would prevent compiler and/or cpu to update wrk_cmd
> > > > out of order (before _init_ phase is completed)?
> > > > We probably can safely assume no reordering from the compiler here,
> > > > as we have function calls straight before and after 'wrk_cmd =  
> > WRK_CMD_RUN;'  
> > > > But for consistency and easier maintenance, I still think it is
> > > > better to have something here, after all it is not performance critical pass.  
> > > Agree that this is not performance critical.
> > >
> > > This is more about correctness (as usually people refer to code to
> > > understand the concepts). You can refer to video [1]. Essentially, the
> > > pthread_create has 'happens-before' behavior. i.e. all the memory
> > > operations before the pthread_create are visible to the new thread.
> > > The
> > > rte_smp_rmb() barrier in the thread function is not required as it reads the  
> > data that was set before the thread was launched.
> > 
> > rte_eal_remote_launch() doesn't call pthread_create().
> > All it does -  updates global variable (lcore_config) and writes/reads to/from
> > the pipe.
> >   
> Thanks for the reminder ☹
> I think rte_eal_remote_launch and rte_eal_wait_lcore need to provide behavior similar to pthread_launch and pthread_join respectively.
> 
> There is use of rte_smp_*mb in those functions as well. Those need to be fixed first and then look at these.

Looks like you want __atomic_thread_fence() here.




More information about the dev mailing list