[dpdk-dev] [PATCH v3 2/2] net/i40e: replace SMP barrier with thread fence

Zhang, Qi Z qi.z.zhang at intel.com
Thu Jul 8 16:26:11 CEST 2021



> -----Original Message-----
> From: Lance Richardson <lance.richardson at broadcom.com>
> Sent: Thursday, July 8, 2021 9:51 PM
> To: Zhang, Qi Z <qi.z.zhang at intel.com>
> Cc: Joyce Kong <joyce.kong at arm.com>; Xing, Beilei <beilei.xing at intel.com>;
> ruifeng.wang at arm.com; honnappa.nagarahalli at arm.com; Richardson, Bruce
> <bruce.richardson at intel.com>; Zhang, Helin <helin.zhang at intel.com>;
> dev at dpdk.org; stable at dpdk.org; nd at arm.com
> Subject: Re: [dpdk-dev] [PATCH v3 2/2] net/i40e: replace SMP barrier with
> thread fence
> 
> On Thu, Jul 8, 2021 at 8:09 AM Zhang, Qi Z <qi.z.zhang at intel.com> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: Joyce Kong <joyce.kong at arm.com>
> > > Sent: Tuesday, July 6, 2021 2:54 PM
> > > To: Xing, Beilei <beilei.xing at intel.com>; Zhang, Qi Z
> <qi.z.zhang at intel.com>;
> > > ruifeng.wang at arm.com; honnappa.nagarahalli at arm.com; Richardson,
> Bruce
> > > <bruce.richardson at intel.com>; Zhang, Helin <helin.zhang at intel.com>
> > > Cc: dev at dpdk.org; stable at dpdk.org; nd at arm.com
> > > Subject: [PATCH v3 2/2] net/i40e: replace SMP barrier with thread fence
> > >
> > > Simply replace the SMP barrier with atomic thread fence for i40e hw ring
> sacn,
> > > if there is no synchronization point.
> > >
> > > Signed-off-by: Joyce Kong <joyce.kong at arm.com>
> > > Reviewed-by: Ruifeng Wang <ruifeng.wang at arm.com>
> > > ---
> > >  drivers/net/i40e/i40e_rxtx.c | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c index
> > > 9aaabfd92..86e2f083e 100644
> > > --- a/drivers/net/i40e/i40e_rxtx.c
> > > +++ b/drivers/net/i40e/i40e_rxtx.c
> > > @@ -482,7 +482,8 @@ i40e_rx_scan_hw_ring(struct i40e_rx_queue *rxq)
> > >
> I40E_RXD_QW1_STATUS_SHIFT;
> > >               }
> > >
> > > -             rte_smp_rmb();
> > > +             /* This barrier is to order loads of different words in the
> descriptor */
> > > +             rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
> >
> > Now for x86, you actually replace a compiler barrier with a memory fence,
> this may have potential performance impact which need additional resource to
> investigate
> 
> No memory fence instruction is generated for
> __ATOMIC_ACQUIRE on x86 for any version of gcc
> or clang that I've tried, based on experiments here:
> 
>     https://godbolt.org/z/Yxr1vGhKP

Nice tool!
I try to write some dummy code combined with or without __atomic_thread_fence(__ATOMIC_ACQUIRE)
but I didn't see any difference of the generated assembly code, does that means __atomic_thread_fence(__ATOMIC_ACQUIRE) just does nothing on x86?



More information about the dev mailing list