[dpdk-dev] [PATCH v2 4/7] net/mlx4: merge Tx path functions
    Nélio Laranjeiro 
    nelio.laranjeiro at 6wind.com
       
    Wed Oct 25 09:50:06 CEST 2017
    
    
  
On Tue, Oct 24, 2017 at 08:36:52PM +0000, Ophir Munk wrote:
> Hi,
> 
> On Tuesday, October 24, 2017 4:52 PM, Nélio Laranjeiro wrote:
> > 
> > On Mon, Oct 23, 2017 at 02:21:57PM +0000, Ophir Munk wrote:
> > > From: Matan Azrad <matan at mellanox.com>
> > >
> > > Merge tx_burst and mlx4_post_send functions to prevent double asking
> > > about WQ remain space.
> > >
> > > This should improve performance.
> > >
> > > Signed-off-by: Matan Azrad <matan at mellanox.com>
> > > ---
> > >  drivers/net/mlx4/mlx4_rxtx.c | 353
> > > +++++++++++++++++++++----------------------
> > >  1 file changed, 170 insertions(+), 183 deletions(-)
> > 
> > What are the real expectation you have on the remaining patches of the
> > series?
> > 
> > According to the comment of this commit log "This should improve
> > performance" there are too many barriers at each packet/segment level to
> > improve something.
> > 
> > The point is, mlx4_burst_tx() should write all the WQE without any barrier as
> > it is processing a burst of packets (whereas Verbs functions which may only
> > process a single packet).  
> 
> > The lonely barrier which should be present is the
> > one to ensure that all the host memory is flushed before triggering the Tx
> > doorbell.
> > 
> 
> There is a known ConnectX-3 HW limitation: the first 4 bytes of every
> TXWBB (64 bytes chunks) should be 
> written in a reversed order (from last TXWBB to first TXWBB).
This means the first WQE filled by the burst function is the doorbell.
In such situation, the first four bytes of it can be written before
leaving the burst function and after a write memory barrier.
Until this first WQE is not complete, the NIC won't start processing the
packets.  Memory barriers per packets becomes useless.
It gives something like:
 uint32_t tx_bb_db = 0;
 void *first_wqe = NULL;
 /*
  * Prepare all Packets by writing the WQEs without the 4 first bytes of
  * the first WQE.
  */
 for () {
 	if (!wqe) {
		first_wqe = wqe;
		tx_bb_db = foo;
	}
 }
 /* Leaving. */
 rte_wmb();
 *(uin32_t*)wqe = tx_bb_db;
 return n;
> The last 60 bytes of any TXWBB can be written in any order (before
> writing the first 4 bytes).
> Is your last statement (using lonely barrier) is in accordance with
> this limitation? Please explain.
> 
> > There is also too many cases handled which are useless in bursts situation,
> > this function needs to be re-written to its minimal use case i.e.  processing a
> > valid burst of packets/segments and triggering at the end of the burst the Tx
> > doorbell.
> > 
Regards,
-- 
Nélio Laranjeiro
6WIND
    
    
More information about the dev
mailing list