[dpdk-dev] [PATCH v2 4/7] net/mlx4: merge Tx path functions
Matan Azrad
matan at mellanox.com
Thu Oct 26 12:31:06 CEST 2017
Hi Nelio
I think the memory barrier discussion is not relevant for this patch (if it will be relevant I will create new one).
Please see my comments inline.
Regarding this specific patch, I didn't see any comment from you, Are you agree with it?
> -----Original Message-----
> From: Nélio Laranjeiro [mailto:nelio.laranjeiro at 6wind.com]
> Sent: Wednesday, October 25, 2017 10:50 AM
> To: Ophir Munk <ophirmu at mellanox.com>
> Cc: Adrien Mazarguil <adrien.mazarguil at 6wind.com>; dev at dpdk.org;
> Thomas Monjalon <thomas at monjalon.net>; Olga Shern
> <olgas at mellanox.com>; Matan Azrad <matan at mellanox.com>
> Subject: Re: [dpdk-dev] [PATCH v2 4/7] net/mlx4: merge Tx path functions
>
> On Tue, Oct 24, 2017 at 08:36:52PM +0000, Ophir Munk wrote:
> > Hi,
> >
> > On Tuesday, October 24, 2017 4:52 PM, Nélio Laranjeiro wrote:
> > >
> > > On Mon, Oct 23, 2017 at 02:21:57PM +0000, Ophir Munk wrote:
> > > > From: Matan Azrad <matan at mellanox.com>
> > > >
> > > > Merge tx_burst and mlx4_post_send functions to prevent double
> > > > asking about WQ remain space.
> > > >
> > > > This should improve performance.
> > > >
> > > > Signed-off-by: Matan Azrad <matan at mellanox.com>
> > > > ---
> > > > drivers/net/mlx4/mlx4_rxtx.c | 353
> > > > +++++++++++++++++++++----------------------
> > > > 1 file changed, 170 insertions(+), 183 deletions(-)
> > >
> > > What are the real expectation you have on the remaining patches of
> > > the series?
> > >
> > > According to the comment of this commit log "This should improve
> > > performance" there are too many barriers at each packet/segment
> > > level to improve something.
> > >
> > > The point is, mlx4_burst_tx() should write all the WQE without any
> > > barrier as it is processing a burst of packets (whereas Verbs
> > > functions which may only process a single packet).
> >
> > > The lonely barrier which should be present is the one to ensure that
> > > all the host memory is flushed before triggering the Tx doorbell.
> > >
> >
> > There is a known ConnectX-3 HW limitation: the first 4 bytes of every
> > TXWBB (64 bytes chunks) should be
> > written in a reversed order (from last TXWBB to first TXWBB).
>
> This means the first WQE filled by the burst function is the doorbell.
> In such situation, the first four bytes of it can be written before
> leaving the burst function and after a write memory barrier.
>
> Until this first WQE is not complete, the NIC won't start processing the
> packets. Memory barriers per packets becomes useless.
I think this is not true, Since mlx4 HW can prefetch advanced TXbbs if their first 4
bytes are valid in spite of the first WQE is still not valid (please read the spec).
>
> It gives something like:
>
> uint32_t tx_bb_db = 0;
> void *first_wqe = NULL;
>
> /*
> * Prepare all Packets by writing the WQEs without the 4 first bytes of
> * the first WQE.
> */
> for () {
> if (!wqe) {
> first_wqe = wqe;
> tx_bb_db = foo;
> }
> }
> /* Leaving. */
> rte_wmb();
> *(uin32_t*)wqe = tx_bb_db;
> return n;
>
I will take care to check if we can do 2 loops:
Write all last 60B per TXbb.
Memory barrier.
Write all first 4B per TXbbs.
> > The last 60 bytes of any TXWBB can be written in any order (before
> > writing the first 4 bytes).
> > Is your last statement (using lonely barrier) is in accordance with
> > this limitation? Please explain.
> >
> > > There is also too many cases handled which are useless in bursts
> situation,
> > > this function needs to be re-written to its minimal use case i.e.
> processing a
> > > valid burst of packets/segments and triggering at the end of the burst the
> Tx
> > > doorbell.
> > >
>
> Regards,
>
> --
> Nélio Laranjeiro
> 6WIND
More information about the dev
mailing list