[dpdk-dev] [RFC 0/8] mbuf: structure reorganization

Morten Brørup mb at smartsharesystems.com
Tue Feb 21 16:18:50 CET 2017


Comments inline.


> -----Original Message-----
> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
> Sent: Tuesday, February 21, 2017 4:05 PM
> To: Morten Brørup
> Cc: Olivier Matz; Bruce Richardson; Ananyev, Konstantin; dev at dpdk.org
> Subject: Re: [dpdk-dev] [RFC 0/8] mbuf: structure reorganization
> 
> Hi Morten,
> 
> On Tue, 21 Feb 2017 15:20:23 +0100, Morten Brørup
> <mb at smartsharesystems.com> wrote:
> > Comments at the end.
> >
> >
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Olivier Matz
> > > Sent: Thursday, February 16, 2017 5:14 PM
> > > To: Bruce Richardson
> > > Cc: Ananyev, Konstantin; dev at dpdk.org
> > > Subject: Re: [dpdk-dev] [RFC 0/8] mbuf: structure reorganization
> > >
> > > On Thu, 16 Feb 2017 15:46:19 +0000, Bruce Richardson
> > > <bruce.richardson at intel.com> wrote:
> > > > On Thu, Feb 16, 2017 at 02:48:07PM +0100, Olivier Matz wrote:
> > > > > Hi Konstantin,
> > > > >
> > > > > Thanks for the feedback.
> > > > > Comments inline.
> > > > >
> > > > >
> > > > > On Mon, 6 Feb 2017 18:41:27 +0000, "Ananyev, Konstantin"
> > > > > <konstantin.ananyev at intel.com> wrote:
> > > > > > Hi Olivier,
> > > > > > Looks good in general, some comments from me below.
> > > > > > Thanks
> > > > > > Konstantin
> > > > > >
> > > > > > >
> > > > > > > The main changes are:
> > > > > > > - reorder structure to increase vector performance on some
> > > > > > > non-ia platforms.
> > > > > > > - add a 64bits timestamp field in the 1st cache line
> > > > > >
> > > > > > Wonder why it deserves to be in first cache line?
> > > > > > How it differs from seqn below (pure SW stuff right now).
> > > > >
> > > > > In case the timestamp is set from a NIC value, it is set in the
> > > > > Rx path. So that's why I think it deserve to be located in the
> > > > > 1st cache line.
> > > > >
> > > > > As you said, the seqn is a pure sw stuff right: it is set in a
> > > > > lib, not in a PMD rx path.
> > > > >
> > > > > > > - m->next, m->nb_segs, and m->refcnt are always initialized
> > > > > > > for mbufs in the pool, avoiding the need of setting
> > > > > > > m->next
> > > (located
> > > > > > > in the 2nd cache line) in the Rx path for mono-segment
> > > > > > > packets.
> > > > > > > - change port and nb_segs to 16 bits
> > > > > >
> > > > > > Not that I am completely against it, but changing nb_segs to
> > > > > > 16 bits seems like an overkill to me.
> > > > > > I think we can keep and extra 8bits for something more useful
> > > > > > in future.
> > > > >
> > > > > In my case, I use the m->next field to chain more than 256
> > > > > segments for L4 socket buffers. It also updates nb_seg that can
> > > > > overflow. It's not a big issue since at the end, nb_seg is
> > > > > decremented for each segment. On the other hand, if I enable
> > > > > some sanity checks on mbufs, it complains because the number of
> > > > > segments is not equal to nb_seg.
> > > > >
> > > > > There is also another use case with fragmentation as discussed
> > > > > recently: http://dpdk.org/dev/patchwork/patch/19819/
> > > > >
> > > > > Of course, dealing with a long mbuf list is not that efficient,
> > > > > but the application can maintain another structure to
> accelerate
> > > > > the access to the middle/end of the list.
> > > > >
> > > > > Finally, we have other ideas to get additional 8 bits if
> > > > > required
> > > in
> > > > > the future, so I don't think it's really a problem.
> > > > >
> > > > >
> > > > > >
> > > > > > > - move seqn in the 2nd cache line
> > > > > > >
> > > > > > > Things discussed but not done in the patchset:
> > > > > > > - move refcnt and nb_segs to the 2nd cache line: many
> > > > > > > drivers sets them in the Rx path, so it could introduce a
> > > > > > > performance regression, or
> > > > > >
> > > > > > I wonder can refcnt only be moved into the 2-nd cacheline?
> > > > > > As I understand thanks to other change (from above)
> > > > > > m->refcnt
> > > will
> > > > > > already be initialized, so RX code don't need to touch it.
> > > > > > Though yes, it still would require changes in all PMDs.
> > > > >
> > > > > Yes, I agree, some fields could be moved in the 2nd cache line
> > > > > once all PMDs stop to write them in RX path. I propose to issue
> > > > > some guidelines to PMD maintainers at the same time the
> patchset
> > > > > is pushed. Then we can consider changing it in a future
> version,
> > > > > in case we need more room in the 1st mbuf cache line.
> > > >
> > > > If we are changing things, we should really do all that now,
> > > > rather than storing up future breaks to mbuf. Worst case, we
> > > > should plan for it immediately after the release where we make
> > > > these changes. Have
> > > two
> > > > releases that break mbuf immediately after each other - and
> > > > flagged
> > > as
> > > > such, but keep it stable thereafter. I don't like having
> technical
> > > > debt on mbuf just after we supposedly "fix" it.
> > >
> > > I think there is no need to do this change now. And I don't feel
> > > good with the idea of having a patchset that updates all the PMDs
> to
> > > remove the access to a field because it moved to the 2nd cache line
> > > (especially thinking about vector PMDs).
> > >
> > > That's why I think the plan could be:
> > > - push an updated version of this patchset quickly
> > > - advertise to PMD maintainers "you don't need to set the m->next,
> > >   m->refcnt, and m->nb_segs in the RX path, please update your
> > > drivers"
> > > - later, if we need more room in the 1st cache line of the mbuf, we
> > >   can move refcnt and nb_seg, probably without impacting the
> > >   performance.
> > >
> > >
> > > Olivier
> >
> > I suppose you mean that PMDs don't need to /initialize/ m->next,
> > m->refcnt and m->nb_segs.
> >
> > Forgive my ignorance, and this is wild speculation, but: Would a PMD
> > not need to set m->next and m->nb_segs if it receives a jumbogram
> > larger than an mbuf packet buffer? And if this is a realistic use
> > case, these fields actually do belong in the 1st cache line. PMD
> > developers please chime in.
> 
> Nothing to add to Bruce's answer :)

ACK that!

> 
> >
> > And I tend to agree with Bruce about making all these mbuf changes in
> > one go, rather than postponing some of them to later. Especially
> > because the postponement also closes and reopens the whole discussion
> > and decision process! (Not initializing a few fields in a PMD cannot
> > require a lot of work by the PMD developers. Moving the fields to the
> > 2nd cache line will in the worst case degrade the performance of the
> > non-updated PMDs.)
> >
> > A two step process makes good sense for the developers of DPDK, but
> > both steps should be taken within the same release, so they are
> > transparent to the users of DPDK.
> 
> I don't think this is doable, knowing the submission dead line is in
> less than 2 weeks. On my side, honestly, I don't want to dive in the
> code of into all PMDs. I feel this would be more risky than letting the
> PMD maintainers update their own PMD code.
> 
> Olivier

I was assuming that the work of updating the PMD code according to the new mbuf code would be left to the PMD developers. I.e. you take the first step (of updating the mbuf completely), and the PMD developers/maintainers take the second step.

Although I am not doing any of the actual work here, it seems like a relatively small task for each PMD maintainer to update his PMD accordingly, so perhaps you should ask them directly about the deadline issue. From a project release perspective, completing the mbuf structure reorganization in one release seems better than postponing some of the intended modifications for later.

Med venlig hilsen / kind regards
- Morten Brørup


More information about the dev mailing list