[dpdk-dev] [PATCH v3 6/7] net/mlx4: mitigate Tx path memory barriers

Adrien Mazarguil adrien.mazarguil at 6wind.com
Tue Oct 31 14:21:45 CET 2017


Hi Matan,

On Tue, Oct 31, 2017 at 11:35:29AM +0000, Matan Azrad wrote:
<snip>
> > -----Original Message-----
> > From: Adrien Mazarguil [mailto:adrien.mazarguil at 6wind.com]
> > Sent: Tuesday, October 31, 2017 12:17 PM
> > To: Matan Azrad <matan at mellanox.com>
> > Cc: dev at dpdk.org; Ophir Munk <ophirmu at mellanox.com>
> > Subject: Re: [PATCH v3 6/7] net/mlx4: mitigate Tx path memory barriers
> > 
> > Hi Matan,
> > 
> > On Mon, Oct 30, 2017 at 07:47:20PM +0000, Matan Azrad wrote:
> > > Hi Adrien
> > >
> > > > -----Original Message-----
> > > > From: Adrien Mazarguil [mailto:adrien.mazarguil at 6wind.com]
> > > > Sent: Monday, October 30, 2017 4:24 PM
> > > > To: Matan Azrad <matan at mellanox.com>
> > > > Cc: dev at dpdk.org; Ophir Munk <ophirmu at mellanox.com>
> > > > Subject: Re: [PATCH v3 6/7] net/mlx4: mitigate Tx path memory
> > > > barriers
> > > >
> > > > On Mon, Oct 30, 2017 at 10:07:28AM +0000, Matan Azrad wrote:
> > > > > Replace most of the memory barriers by compiler barriers since
> > > > > they are all targeted to the DRAM; This improves code efficiency
> > > > > for systems which force store order between different addresses.
> > > > >
> > > > > Only the doorbell record store should be protected by memory
> > > > > barrier since it is targeted to the PCI memory domain.
> > > > >
> > > > > Limit pre byte count store compiler barrier for systems with cache
> > > > > line size smaller than 64B (TXBB size).
> > > > >
> > > > > Signed-off-by: Matan Azrad <matan at mellanox.com>
<snip>
> > > > >  	cq->cons_index = cons_index;
> > > > >  	*cq->set_ci_db = rte_cpu_to_be_32(cq->cons_index &
> > > > MLX4_CQ_DB_CI_MASK);
> > > > > -	rte_wmb();
> > > > > +	rte_io_wmb();
> > > >
> > > > This one could be removed entirely as well, which is more or less
> > > > what the move to a compiler barrier does. Nothing in subsequent code
> > > > depends on this doorbell being written, so this can piggy back on
> > > > any subsequent rte_wmb().
> > >
> > > Yes, you right, probably this code was taken from multi thread
> > implementation.
> > > >
> > > > On the other hand in my opinion a barrier (compiler or otherwise)
> > > > might be needed before the doorbell write, to make clear it cannot
> > > > somehow be done earlier in case something attempts to optimize it
> > away.
> > > >
> > > I think we can remove it entirely (compiler can't optimize the ci_db store
> > since in depends in previous code (cons_index).
> > 
> > Right, however you may still run into issues if the compiler determines the
> > final cons_index value by looking at the loop and decides to store it before
> > entering/leaving it. That's the kind of problematic optimization I was thinking
> > of.
> > 
> > The barrier in that sense is just to assert the order of seemingly unrelated
> > load/stores.
> 
> I think that If I left the rte_io_rmb after CQE owner check we can earn both:
> 1. The concern of read ordering while reading the owner before using CQE.
> 2. The ci_db concern: the compiler must read the last CQE(which is not valid and we have no more stamp to do) before it knows the last value of cons_index. 
> So we can remove entirely this rte_io_wmb in completion function.
> What do you think? 

That's right, this means there's a barrier before the doorbell write in any
case, OK then.

Just make sure cq->set_ci_db is volatile in a prior "fix" commit as
described in my previous suggestion, otherwise the remaining barriers won't
guarantee much. Thanks.

-- 
Adrien Mazarguil
6WIND


More information about the dev mailing list