[dpdk-dev] [PATCH v1] net/i40e: remove the SMP barrier in HW scanning func

Zhang, Qi Z qi.z.zhang at intel.com
Mon Jun 7 16:55:45 CEST 2021



> -----Original Message-----
> From: Honnappa Nagarahalli <Honnappa.Nagarahalli at arm.com>
> Sent: Monday, June 7, 2021 2:33 AM
> To: Zhang, Qi Z <qi.z.zhang at intel.com>; Joyce Kong <Joyce.Kong at arm.com>;
> Xing, Beilei <beilei.xing at intel.com>; Ruifeng Wang <Ruifeng.Wang at arm.com>
> Cc: dev at dpdk.org; nd <nd at arm.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli at arm.com>; nd <nd at arm.com>
> Subject: RE: [PATCH v1] net/i40e: remove the SMP barrier in HW scanning
> func
> 
> <snip>
> 
> > >
> > > Add the logic to determine how many DD bits have been set for
> > > contiguous packets, for removing the SMP barrier while reading descs.
> >
> > I didn't understand this.
> > The current logic already guarantee the read out DD bits are from
> > continue packets, as it read Rx descriptor in a reversed order from the ring.
> Qi, the comments in the code mention that there is a race condition if the
> descriptors are not read in the reverse order. But, they do not mention what
> the race condition is and how it can occur. Appreciate if you could explain
> that.

The Race condition happens between the NIC and CPU, if write and read DD bit in the same order, there might be a hole (e.g. 1011)  with the reverse read order, we make sure no more "1" after the first "0"
as the read address are declared as volatile, compiler will not re-ordered them.

> 
> On x86, the reads are not re-ordered (though the compiler can re-order). On
> ARM, the reads can get re-ordered and hence the barriers are required. In
> order to avoid the barriers, we are trying to process only those descriptors
> whose DD bits are set such that they are contiguous. i.e. if the DD bits are
> 1011, we process only the first descriptor.

Ok, I see. thanks for the explanation.
At this moment, I may prefer not change the behavior of x86, so compile option for arm can be added, in future when we observe no performance impact for x86 as well, we can consider to remove it, what do you think?

> 
> > So I didn't see the a new logic be added, would you describe more
> > clear about the purpose of this patch?
> >
> > >
> > > Signed-off-by: Joyce Kong <joyce.kong at arm.com>
> > > Reviewed-by: Ruifeng Wang <ruifeng.wang at arm.com>
> > > ---
> > >  drivers/net/i40e/i40e_rxtx.c | 13 ++++++++-----
> > >  1 file changed, 8 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/drivers/net/i40e/i40e_rxtx.c
> > > b/drivers/net/i40e/i40e_rxtx.c index
> > > 6c58decec..410a81f30 100644
> > > --- a/drivers/net/i40e/i40e_rxtx.c
> > > +++ b/drivers/net/i40e/i40e_rxtx.c
> > > @@ -452,7 +452,7 @@ i40e_rx_scan_hw_ring(struct i40e_rx_queue *rxq)
> > >  	uint16_t pkt_len;
> > >  	uint64_t qword1;
> > >  	uint32_t rx_status;
> > > -	int32_t s[I40E_LOOK_AHEAD], nb_dd;
> > > +	int32_t s[I40E_LOOK_AHEAD], var, nb_dd;
> > >  	int32_t i, j, nb_rx = 0;
> > >  	uint64_t pkt_flags;
> > >  	uint32_t *ptype_tbl = rxq->vsi->adapter->ptype_tbl; @@ -482,11
> > > +482,14 @@ i40e_rx_scan_hw_ring(struct i40e_rx_queue *rxq)
> > >  					I40E_RXD_QW1_STATUS_SHIFT;
> > >  		}
> > >
> > > -		rte_smp_rmb();
> >
> > Any performance gain by removing this? and it is not necessary to be
> > combined with below change, right?
> >
> > > -
> > >  		/* Compute how many status bits were set */
> > > -		for (j = 0, nb_dd = 0; j < I40E_LOOK_AHEAD; j++)
> > > -			nb_dd += s[j] & (1 <<
> > I40E_RX_DESC_STATUS_DD_SHIFT);
> > > +		for (j = 0, nb_dd = 0; j < I40E_LOOK_AHEAD; j++) {
> > > +			var = s[j] & (1 << I40E_RX_DESC_STATUS_DD_SHIFT);
> > > +			if (var)
> > > +				nb_dd += 1;
> > > +			else
> > > +				break;
> > > +		}
> > >
> > >  		nb_rx += nb_dd;
> > >
> > > --
> > > 2.17.1



More information about the dev mailing list