[dpdk-dev] [dpdk-techboard] [PATCH 1/1] mbuf: move pool pointer in first half
Stephen Hemminger
stephen at networkplumber.org
Mon Nov 9 19:04:30 CET 2020
On Mon, 9 Nov 2020 11:21:02 +0100
Morten Brørup <mb at smartsharesystems.com> wrote:
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bruce Richardson
> > Sent: Monday, November 9, 2020 11:06 AM
> >
> > On Mon, Nov 09, 2020 at 09:16:27AM +0100, Morten Brørup wrote:
> > > +CC techboard
> > >
> > > > From: Jerin Jacob [mailto:jerinjacobk at gmail.com]
> > > > Sent: Monday, November 9, 2020 6:18 AM
> > > >
> > > > On Sun, Nov 8, 2020 at 2:03 AM Thomas Monjalon
> > <thomas at monjalon.net>
> > > > wrote:
> > > > >
> > > > > 07/11/2020 20:05, Jerin Jacob:
> > > > > > On Sun, Nov 8, 2020 at 12:09 AM Thomas Monjalon
> > > > <thomas at monjalon.net> wrote:
> > > > > > > 07/11/2020 18:12, Jerin Jacob:
> > > > > > > > On Sat, Nov 7, 2020 at 10:04 PM Thomas Monjalon
> > > > <thomas at monjalon.net> wrote:
> > > > > > > > >
> > > > > > > > > The mempool pointer in the mbuf struct is moved
> > > > > > > > > from the second to the first half.
> > > > > > > > > It should increase performance on most systems having 64-
> > byte
> > > > cache line,
> > > > > > > >
> > > > > > > > > i.e. mbuf is split in two cache lines.
> > > > > > > >
> > > > > > > > But In any event, Tx needs to touch the pool to freeing
> > back to
> > > > the
> > > > > > > > pool upon Tx completion. Right?
> > > > > > > > Not able to understand the motivation for moving it to the
> > > > first 64B cache line?
> > > > > > > > The gain varies from driver to driver. For example, a
> > Typical
> > > > > > > > ARM-based NPU does not need to
> > > > > > > > touch the pool in Rx and its been filled by HW. Whereas it
> > > > needs to
> > > > > > > > touch in Tx if the reference count is implemented.
> > > > > >
> > > > > > See below.
> > > > > >
> > > > > > > >
> > > > > > > > > Due to this change, tx_offload is moved, so some vector
> > data
> > > > paths
> > > > > > > > > may need to be adjusted. Note: OCTEON TX2 check is
> > removed
> > > > temporarily!
> > > > > > > >
> > > > > > > > It will be breaking the Tx path, Please just don't remove
> > the
> > > > static
> > > > > > > > assert without adjusting the code.
> > > > > > >
> > > > > > > Of course not.
> > > > > > > I looked at the vector Tx path of OCTEON TX2,
> > > > > > > it's close to be impossible to understand :)
> > > > > > > Please help!
> > > > > >
> > > > > > Off course. Could you check the above section any share the
> > > > rationale
> > > > > > for this change
> > > > > > and where it helps and how much it helps?
> > > > >
> > > > > It has been concluded in the techboard meeting you were part of.
> > > > > I don't understand why we restart this discussion again.
> > > > > I won't have the energy to restart this process myself.
> > > > > If you don't want to apply the techboard decision, then please
> > > > > do the necessary to request another quick decision.
> > > >
> > > > Yes. Initially, I thought it is OK as we have 128B CL, After
> > looking
> > > > into Thomas's change, I realized
> > > > it is not good for ARM64 64B catchlines based NPU as
> > > > - A Typical ARM-based NPU does not need to touch the pool in Rx
> > and
> > > > its been filled by HW. Whereas it needs to
> > > > touch in Tx if the reference count is implemented.
> > >
> > > Jerin, I don't understand what the problem is here...
> > >
> > > Since RX doesn't touch m->pool, it shouldn't matter for RX which
> > cache line m->pool resides in. I get that.
> > >
> > > You are saying that TX needs to touch m->pool if the reference count
> > is implemented. I get that. But I don't understand why it is worse
> > having m->pool in the first cache line than in the second cache line;
> > can you please clarify?
> > >
> > > > - Also it will be effecting exiting vector routines
> > >
> > > That is unavoidable if we move something from the second to the first
> > cache line.
> > >
> > > It may require some rework on the vector routines, but it shouldn't
> > be too difficult for whoever wrote these vector routines.
> > >
> > > >
> > > > I request to reconsider the tech board decision.
> > >
> > > I was on the techboard meeting as an observer (or whatever the
> > correct term would be for non-members), and this is my impression of
> > the decision on the meeting:
> > >
> > > The techboard clearly decided not to move any dynamic fields in the
> > first cache line, on the grounds that if we move them away again in a
> > later version, DPDK users utilizing a dynamic field in the first cache
> > line might experience a performance drop at that later time. And this
> > will be a very bad user experience, causing grief and complaints. To
> > me, this seemed like a firm decision, based on solid arguments.
> > >
> > > Then the techboard discussed which other field to move to the freed
> > up space in the first cache line. There were no performance reports
> > showing any improvements by moving the any of the suggested fields (m-
> > >pool, m->next, m->tx_offload), and there was a performance report
> > showing no improvements by moving m->next in a test case with large
> > segmented packets. The techboard decided to move m->pool as originally
> > suggested. To me, this seemed like a somewhat random choice between A,
> > B and C, on the grounds that moving one of them is probably better than
> > moving none of them.
> > >
> >
> > This largely tallies with what I remember of the discussion too.
> >
> > I'd also add though that the choice between the next pointer and the
> > pool
> > pointer came down to the fact that the next pointer is only used for
> > chained, multi-segment, packets - which also tend to be larger packets
> > -
> > while the pool pointer is of relevance to all packets, big and small,
> > single and multi-segment.
>
> I wish that was the case, but I am not so sure...
>
> It is true that m->next is NULL for non-segmented packets.
Yes m->next is NULL for non-segmented packets.
Do we need to modify rte_mbuf_check() to enforce these checks?
More information about the dev
mailing list