[PATCH v2] net/mlx5: use just sufficient barrier for Arm platforms
    Honnappa Nagarahalli 
    Honnappa.Nagarahalli at arm.com
       
    Thu Mar  9 03:42:29 CET 2023
    
    
  
> -----Original Message-----
> From: Slava Ovsiienko <viacheslavo at nvidia.com>
> Sent: Tuesday, March 7, 2023 10:08 AM
> To: Honnappa Nagarahalli <Honnappa.Nagarahalli at arm.com>; dev at dpdk.org;
> Ruifeng Wang <Ruifeng.Wang at arm.com>; Matan Azrad <matan at nvidia.com>;
> Shahaf Shuler <shahafs at nvidia.com>
> Cc: nd <nd at arm.com>; Matan Azrad <matan at nvidia.com>; stable at dpdk.org;
> nd <nd at arm.com>
> Subject: RE: [PATCH v2] net/mlx5: use just sufficient barrier for Arm platforms
> 
> Hi, Honnappa
> 
> I'm sorry for delay - I had to play with patch, to compile this one with assembly
> listings and check what code is actually generated on x86/ARM.
> 
> On x86 there is no difference at all (with and w/o patch), so no objection from
> my side.
> On ARM we have:
> w/o patch: dmb oshld
> with patch: dmb ishld
> 
> What is the purpose of the barrier - to not allow the CQE read access reordering.
> On x86, "compiler barrier" is quite enough (due to strong load/store ordering).
> On ARM load/store might be reordered, AFAIU.
> 
> CQE resides in the host memory and can be directly written by the NIC via PCIe.
> (In my understanding it can cause core cache(s) invalidations).
> I have a question - should we consider this as outer sharable domain?
> Is it safe to have a barrier for inner domain only in our case?
Inner domain is enough, hence the reason for this patch.
> 
> We have updated the cqe_check() routine, sorry for this. Could you, please,
> update the patch and send v3?
Sent V3
> 
> With best regards,
> Slava
> 
> > -----Original Message-----
> > From: Honnappa Nagarahalli <Honnappa.Nagarahalli at arm.com>
> > Sent: вторник, 15 ноября 2022 г. 03:46
> > To: Slava Ovsiienko <viacheslavo at nvidia.com>; dev at dpdk.org; Ruifeng
> > Wang <Ruifeng.Wang at arm.com>; Matan Azrad <matan at nvidia.com>;
> Shahaf
> > Shuler <shahafs at nvidia.com>
> > Cc: nd <nd at arm.com>; Matan Azrad <matan at nvidia.com>;
> stable at dpdk.org;
> > nd <nd at arm.com>
> > Subject: RE: [PATCH v2] net/mlx5: use just sufficient barrier for Arm
> > platforms
> >
> > <snip>
> >
> > >
> > > >
> > > > Hi, Honnappa
> > > Hi Slava, thanks for the feedback.
> > >
> > > >
> > > > We discussed the barrier here:
> > > > http://patches.dpdk.org/project/dpdk/patch/20210606164948.35997-1-
> > > > honnappa.nagarahalli at arm.com/
> > > Yes, I have changed the patch according to the discussion. i.e.
> > > barrier is needed, but different (inner sharable domain) barrier is required.
> > >
> > > >
> > > > (BTW, it is good practice to keep the reference to previous patch
> > > > versions below Commit Message of the next ones).
> > > >
> > > > This barrier is not about compiler ordering, it is about external
> > > > HW agent memory action completions.
> > > > So, I'm not sure the rte_atomic_thread_fence() is safe for x86 -
> > > > patch impacts
> > > > x86 as well.
> > > The earlier barrier 'rte_io_rmb()', resolves to a compiler barrier on x86 [1].
> > > The rte_atomic_thread_fence(__ATOMIC_ACQUIRE) on x86 also acts as a
> > > compiler barrier. So, there is no change for x86.
> > >
> > >
> > > [1]
> > > https://github.com/DPDK/dpdk/blob/main/lib/eal/x86/include/rte_atomic.
> > > h#
> > > L80
> > Hi Slava, any more comments on this?
> >
> > >
> > > >
> > > > With best regards,
> > > > Slava
> > > >
> > > > > -----Original Message-----
> > > > > From: Honnappa Nagarahalli <honnappa.nagarahalli at arm.com>
> > > > > Sent: Tuesday, August 30, 2022 23:01
> > > > > To: dev at dpdk.org; honnappa.nagarahalli at arm.com;
> > > > ruifeng.wang at arm.com;
> > > > > Matan Azrad <matan at nvidia.com>; Shahaf Shuler
> > > <shahafs at nvidia.com>;
> > > > > Slava Ovsiienko <viacheslavo at nvidia.com>
> > > > > Cc: nd at arm.com; Matan Azrad <matan at nvidia.com>; stable at dpdk.org
> > > > > Subject: [PATCH v2] net/mlx5: use just sufficient barrier for
> > > > > Arm platforms
> > > > >
> > > > > cqe->op_own indicates if the CQE is owned by the NIC. The rest
> > > > > cqe->of
> > > > > the fields in CQE should be read only after op_own is read. On
> > > > > Arm platforms using "dmb ishld" is sufficient to enforce this.
> > > > >
> > > > > Fixes: 88c0733535d6 ("net/mlx5: extend Rx completion with error
> > > > > handling")
> > > > > Cc: matan at mellanox.com
> > > > > Cc: stable at dpdk.org
> > > > >
> > > > > Signed-off-by: Honnappa Nagarahalli
> > > > > <honnappa.nagarahalli at arm.com>
> > > > > Reviewed-by: Ruifeng Wang <ruifeng.wang at arm.com>
> > > > > ---
> > > > >  drivers/common/mlx5/mlx5_common.h | 6 +++++-
> > > > >  1 file changed, 5 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/drivers/common/mlx5/mlx5_common.h
> > > > > b/drivers/common/mlx5/mlx5_common.h
> > > > > index 5028a05b49..ac2e85b15f 100644
> > > > > --- a/drivers/common/mlx5/mlx5_common.h
> > > > > +++ b/drivers/common/mlx5/mlx5_common.h
> > > > > @@ -195,7 +195,11 @@ check_cqe(volatile struct mlx5_cqe *cqe,
> > > > > const uint16_t cqes_n,
> > > > >
> > > > >  	if (unlikely((op_owner != (!!(idx))) || (op_code ==
> > > > > MLX5_CQE_INVALID)))
> > > > >  		return MLX5_CQE_STATUS_HW_OWN;
> > > > > -	rte_io_rmb();
> > > > > +	/* Prevent speculative reading of other fields in CQE until
> > > > > +	 * CQE is valid.
> > > > > +	 */
> > > > > +	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
> > > > > +
> > > > >  	if (unlikely(op_code == MLX5_CQE_RESP_ERR ||
> > > > >  		     op_code == MLX5_CQE_REQ_ERR))
> > > > >  		return MLX5_CQE_STATUS_ERR;
> > > > > --
> > > > > 2.17.1
    
    
More information about the dev
mailing list