[dpdk-dev] [PATCH v2 2/7] net/mlx4: inline more Tx functions

Ophir Munk ophirmu at mellanox.com
Wed Oct 25 23:42:46 CEST 2017


Hi Adrien,

On Wednesday, October 25, 2017 7:50 PM, Adrien Mazarguil wrote:
> 
> Hi Ophir,
> 
> On Mon, Oct 23, 2017 at 02:21:55PM +0000, Ophir Munk wrote:
> > Change functions to inline on Tx fast path to improve performance
> >
> > Inside the inline function call other functions to handle "unlikely"
> > cases such that the inline function code footprint is small.
> >
> > Signed-off-by: Ophir Munk <ophirmu at mellanox.com>
> 
> Reading this, it's like adding __rte_always_inline improves performance at
> all, which I doubt unless you can show proof through performance results.
> 
> When in doubt, leave it to the compiler, the static keyword is usually enough
> of a hint. Too much forced inlining may actually be harmful.
> 
> What this patch really does is splitting the heavy lookup/registration function
> in two halves with one small static inline function for the lookup part that
> calls the separate registration part in the unlikely event MR is not already
> registered.
> 
> Thankfully the compiler doesn't inline the large registration function back,
> which results in the perceived performance improvement for the time being,
> however there is no guarantee it won't happen in the future (you didn't use
> the noinline keyword on the registration function for that).
> 
> Therefore I have a bunch of comments and suggestions, see below.
> 
> > ---
> >  drivers/net/mlx4/mlx4_rxtx.c | 43
> > ++++++------------------------------
> >  drivers/net/mlx4/mlx4_rxtx.h | 52
> > +++++++++++++++++++++++++++++++++++++++++++-
> >  2 files changed, 58 insertions(+), 37 deletions(-)
> >
> > diff --git a/drivers/net/mlx4/mlx4_rxtx.c
> > b/drivers/net/mlx4/mlx4_rxtx.c index 011ea79..ae37f9b 100644
> > --- a/drivers/net/mlx4/mlx4_rxtx.c
> > +++ b/drivers/net/mlx4/mlx4_rxtx.c
> > @@ -220,54 +220,25 @@ mlx4_txq_complete(struct txq *txq)
> >  	return 0;
> >  }
> >
> > -/**
> > - * Get memory pool (MP) from mbuf. If mbuf is indirect, the pool from
> > which
> > - * the cloned mbuf is allocated is returned instead.
> > - *
> > - * @param buf
> > - *   Pointer to mbuf.
> > - *
> > - * @return
> > - *   Memory pool where data is located for given mbuf.
> > - */
> > -static struct rte_mempool *
> > -mlx4_txq_mb2mp(struct rte_mbuf *buf)
> > -{
> > -	if (unlikely(RTE_MBUF_INDIRECT(buf)))
> > -		return rte_mbuf_from_indirect(buf)->pool;
> > -	return buf->pool;
> > -}
> >
> >  /**
> > - * Get memory region (MR) <-> memory pool (MP) association from txq-
> >mp2mr[].
> > - * Add MP to txq->mp2mr[] if it's not registered yet. If mp2mr[] is
> > full,
> > - * remove an entry first.
> > + * Add memory region (MR) <-> memory pool (MP) association to txq-
> >mp2mr[].
> > + * If mp2mr[] is full, remove an entry first.
> >   *
> >   * @param txq
> >   *   Pointer to Tx queue structure.
> >   * @param[in] mp
> > - *   Memory pool for which a memory region lkey must be returned.
> > + *   Memory pool for which a memory region lkey must be added
> > + * @param[in] i
> > + *   Index in memory pool (MP) where to add memory region (MR)
> >   *
> >   * @return
> > - *   mr->lkey on success, (uint32_t)-1 on failure.
> > + *   Added mr->lkey on success, (uint32_t)-1 on failure.
> >   */
> > -uint32_t
> > -mlx4_txq_mp2mr(struct txq *txq, struct rte_mempool *mp)
> > +uint32_t mlx4_txq_add_mr(struct txq *txq, struct rte_mempool *mp,
> > +uint32_t i)
> >  {
> > -	unsigned int i;
> >  	struct ibv_mr *mr;
> >
> > -	for (i = 0; (i != RTE_DIM(txq->mp2mr)); ++i) {
> > -		if (unlikely(txq->mp2mr[i].mp == NULL)) {
> > -			/* Unknown MP, add a new MR for it. */
> > -			break;
> > -		}
> > -		if (txq->mp2mr[i].mp == mp) {
> > -			assert(txq->mp2mr[i].lkey != (uint32_t)-1);
> > -			assert(txq->mp2mr[i].mr->lkey == txq-
> >mp2mr[i].lkey);
> > -			return txq->mp2mr[i].lkey;
> > -		}
> > -	}
> >  	/* Add a new entry, register MR first. */
> >  	DEBUG("%p: discovered new memory pool \"%s\" (%p)",
> >  	      (void *)txq, mp->name, (void *)mp); diff --git
> > a/drivers/net/mlx4/mlx4_rxtx.h b/drivers/net/mlx4/mlx4_rxtx.h index
> > e10bbca..719ef45 100644
> > --- a/drivers/net/mlx4/mlx4_rxtx.h
> > +++ b/drivers/net/mlx4/mlx4_rxtx.h
> > @@ -53,6 +53,7 @@
> >
> >  #include "mlx4.h"
> >  #include "mlx4_prm.h"
> > +#include "mlx4_utils.h"
> 
> Why?
> 
> >
> >  /** Rx queue counters. */
> >  struct mlx4_rxq_stats {
> > @@ -160,7 +161,6 @@ void mlx4_rx_queue_release(void *dpdk_rxq);
> >
> >  /* mlx4_rxtx.c */
> >
> > -uint32_t mlx4_txq_mp2mr(struct txq *txq, struct rte_mempool *mp);
> > uint16_t mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts,
> >  		       uint16_t pkts_n);
> >  uint16_t mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, @@
> > -169,6 +169,8 @@ uint16_t mlx4_tx_burst_removed(void *dpdk_txq,
> struct rte_mbuf **pkts,
> >  			       uint16_t pkts_n);
> >  uint16_t mlx4_rx_burst_removed(void *dpdk_rxq, struct rte_mbuf **pkts,
> >  			       uint16_t pkts_n);
> > +uint32_t mlx4_txq_add_mr(struct txq *txq, struct rte_mempool *mp,
> > +				unsigned int i);
> >
> >  /* mlx4_txq.c */
> >
> > @@ -177,4 +179,52 @@ int mlx4_tx_queue_setup(struct rte_eth_dev
> *dev, uint16_t idx,
> >  			const struct rte_eth_txconf *conf);  void
> > mlx4_tx_queue_release(void *dpdk_txq);
> >
> > +/**
> > + * Get memory pool (MP) from mbuf. If mbuf is indirect, the pool from
> > +which
> > + * the cloned mbuf is allocated is returned instead.
> > + *
> > + * @param buf
> > + *   Pointer to mbuf.
> > + *
> > + * @return
> > + *   Memory pool where data is located for given mbuf.
> > + */
> > +static __rte_always_inline struct rte_mempool * mlx4_txq_mb2mp(struct
> > +rte_mbuf *buf) {
> > +	if (unlikely(RTE_MBUF_INDIRECT(buf)))
> > +		return rte_mbuf_from_indirect(buf)->pool;
> > +	return buf->pool;
> > +}
> > +
> > +/**
> > + * Get memory region (MR) <-> memory pool (MP) association from txq-
> >mp2mr[].
> > + * Call mlx4_txq_add_mr() if MP is not registered yet.
> > + *
> > + * @param txq
> > + *   Pointer to Tx queue structure.
> > + * @param[in] mp
> > + *   Memory pool for which a memory region lkey must be returned.
> > + *
> > + * @return
> > + *   mr->lkey on success, (uint32_t)-1 on failure.
> > + */
> > +static __rte_always_inline uint32_t
> 
> Note __rte_always_inline is defined in rte_common.h and should be
> explicitly included (however don't do that, see below).
> 
> > +mlx4_txq_mp2mr(struct txq *txq, struct rte_mempool *mp) {
> > +	unsigned int i;
> > +
> > +	for (i = 0; (i != RTE_DIM(txq->mp2mr)); ++i) {
> > +		if (unlikely(txq->mp2mr[i].mp == NULL)) {
> > +			/* Unknown MP, add a new MR for it. */
> > +			break;
> > +		}
> > +		if (txq->mp2mr[i].mp == mp) {
> > +			assert(txq->mp2mr[i].lkey != (uint32_t)-1);
> > +			assert(txq->mp2mr[i].mr->lkey == txq-
> >mp2mr[i].lkey);
> 
> assert() requires assert.h (but don't include it, see subsequent suggestion).
> 
> > +			return txq->mp2mr[i].lkey;
> > +		}
> > +	}
> > +	return mlx4_txq_add_mr(txq, mp, i);
> > +}
> >  #endif /* MLX4_RXTX_H_ */
> 
> So as described above, these functions do not need the __rte_always_inline,
> please remove it. They also do not need to be located in a header file; the
> reason it's the case for their mlx5 counterparts is that they have to be shared
> between vectorized/non-vectorized code. No such requirement here, you
> should move them back to their original spot.
> 

Static function mlx4_txq_mp2mr() must be in a header file because it is shared by 2 files: mlx4_txq.c and mlx4_rxtx.c.
It is not related to vectorized/non-vectorized code in mlx5.
Having said that -__rte_always_inline is required as well otherwise compilation fails with 
drivers/net/mlx4/mlx4_rxtx.h:200:1: error: 'mlx4_txq_mp2mr' defined but not used [-Werror=unused-function]
for files which include mlx4_rxtx.h

> My suggestion for this performance improvement is to move
> mlx4_txq_add_mr() to a different file, mlx4_mr.c looks like a good
> candidate. This fact will ensure it's never inlined and far away from the data
> path.
> 

Function mlx4_txq_add_mr() is relatively small. 
What do you say about preceding it with __attribute((noinline)) instead of creating a new file?

> --
> Adrien Mazarguil
> 6WIND


More information about the dev mailing list