[dpdk-dev] [PATCH 1/2] net/i40e: desc loading is unnecessarily ordered for aarch64

Gavin Hu (Arm Technology China) Gavin.Hu at arm.com
Fri Aug 30 10:33:47 CEST 2019


Hi Honnappa,

> -----Original Message-----
> From: Honnappa Nagarahalli <Honnappa.Nagarahalli at arm.com>
> Sent: Thursday, August 29, 2019 6:10 AM
> To: Gavin Hu (Arm Technology China) <Gavin.Hu at arm.com>;
> dev at dpdk.org
> Cc: nd <nd at arm.com>; thomas at monjalon.net; jerinj at marvell.com;
> pbhagavatula at marvell.com; qi.z.zhang at intel.com;
> bruce.richardson at intel.com; stable at dpdk.org; Honnappa Nagarahalli
> <Honnappa.Nagarahalli at arm.com>; nd <nd at arm.com>
> Subject: RE: [PATCH 1/2] net/i40e: desc loading is unnecessarily ordered for
> aarch64
> 
> Thanks Gavin, few comments are inline
> 
> > -----Original Message-----
> > From: Gavin Hu <gavin.hu at arm.com>
> > Sent: Tuesday, August 13, 2019 5:44 AM
> > To: dev at dpdk.org
> > Cc: nd <nd at arm.com>; thomas at monjalon.net; jerinj at marvell.com;
> > pbhagavatula at marvell.com; Honnappa Nagarahalli
> > <Honnappa.Nagarahalli at arm.com>; qi.z.zhang at intel.com;
> > bruce.richardson at intel.com; stable at dpdk.org
> > Subject: [PATCH 1/2] net/i40e: desc loading is unnecessarily ordered for
> > aarch64
> >
> > For x86, the descriptors needs to be loaded in order, so in between two
> > descriptors loading, there is a compiler barrier in place.
> IMO, we can skip the above as this change applies to Arm platforms. Instead,
> capture this in the code in comments to explain why the ordering of the
> loads is not required. This will help others reading the code.

As the line of code was removed, there is no suitable place to add a comment.
Instead adding it in the commit log makes the story complete and easy to understand. 

> [1] For aarch64, a
> > patch [2] is in place to survive with discontinuous DD bits, the barriers can
> be
> > removed to take full advantage of out-of-order execution.
> >
> > 50% performance gain in the RFC2544 NDR test was measured on
> ThunderX2.
> > 12.50% performan gain in the RFC2544 NDR test was measured on
> Ampere
> > eMAG80 platform.
> >
> > [1]
> >
> http://inbox.dpdk.org/users/039ED4275CED7440929022BC67E7061153D71
> > 548@
> > SHSMSX105.ccr.corp.intel.com/
> > [2] https://mails.dpdk.org/archives/stable/2017-October/003324.html
> >
> > Fixes: ae0eb310f253 ("net/i40e: implement vector PMD for ARM")
> > Cc: stable at dpdk.org
> >
> > Signed-off-by: Gavin Hu <gavin.hu at arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang at arm.com>
> > Reviewed-by: Steve Capper <steve.capper at arm.com>
> > ---
> >  drivers/net/i40e/i40e_rxtx_vec_neon.c | 1 -
> >  1 file changed, 1 deletion(-)
> >
> > diff --git a/drivers/net/i40e/i40e_rxtx_vec_neon.c
> > b/drivers/net/i40e/i40e_rxtx_vec_neon.c
> > index 83572ef..5555e9b 100644
> > --- a/drivers/net/i40e/i40e_rxtx_vec_neon.c
> > +++ b/drivers/net/i40e/i40e_rxtx_vec_neon.c
> > @@ -285,7 +285,6 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq,
> > struct rte_mbuf **rx_pkts,
> >  		/* Read desc statuses backwards to avoid race condition */
> >  		/* A.1 load 4 pkts desc */
> >  		descs[3] =  vld1q_u64((uint64_t *)(rxdp + 3));
> > -		rte_rmb();
> >
> >  		/* B.2 copy 2 mbuf point into rx_pkts  */
> >  		vst1q_u64((uint64_t *)&rx_pkts[pos], mbp1);
> > --
> > 2.7.4


More information about the dev mailing list