[dpdk-dev] [PATCH] mbuf: fix reset on mbuf free

Morten Brørup mb at smartsharesystems.com
Fri Nov 6 13:23:42 CET 2020


> From: Ananyev, Konstantin [mailto:konstantin.ananyev at intel.com]
> Sent: Friday, November 6, 2020 12:54 PM
> 
> > > > > > > > > > > > > > > >> Hi Olivier,
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >>> m->nb_seg must be reset on mbuf free
> > > whatever
> > > > > the
> > > > > > > value
> > > > > > > > > of m->next,
> > > > > > > > > > > > > > > >>> because it can happen that m->nb_seg is
> !=
> > > 1.
> > > > > For
> > > > > > > > > instance in this
> > > > > > > > > > > > > > > >>> case:
> > > > > > > > > > > > > > > >>>
> > > > > > > > > > > > > > > >>>   m1 = rte_pktmbuf_alloc(mp);
> > > > > > > > > > > > > > > >>>   rte_pktmbuf_append(m1, 500);
> > > > > > > > > > > > > > > >>>   m2 = rte_pktmbuf_alloc(mp);
> > > > > > > > > > > > > > > >>>   rte_pktmbuf_append(m2, 500);
> > > > > > > > > > > > > > > >>>   rte_pktmbuf_chain(m1, m2);
> > > > > > > > > > > > > > > >>>   m0 = rte_pktmbuf_alloc(mp);
> > > > > > > > > > > > > > > >>>   rte_pktmbuf_append(m0, 500);
> > > > > > > > > > > > > > > >>>   rte_pktmbuf_chain(m0, m1);
> > > > > > > > > > > > > > > >>>
> > > > > > > > > > > > > > > >>> As rte_pktmbuf_chain() does not reset
> > > nb_seg in
> > > > > the
> > > > > > > > > initial m1
> > > > > > > > > > > > > > > >>> segment (this is not required), after
> this
> > > code
> > > > > the
> > > > > > > > > mbuf chain
> > > > > > > > > > > > > > > >>> have 3 segments:
> > > > > > > > > > > > > > > >>>   - m0: next=m1, nb_seg=3
> > > > > > > > > > > > > > > >>>   - m1: next=m2, nb_seg=2
> > > > > > > > > > > > > > > >>>   - m2: next=NULL, nb_seg=1
> > > > > > > > > > > > > > > >>>
> > > > > > > > > > > > > > > >>> Freeing this mbuf chain will not
> restore
> > > > > nb_seg=1
> > > > > > > in
> > > > > > > > > the second
> > > > > > > > > > > > > > > >>> segment.
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >> Hmm, not sure why is that?
> > > > > > > > > > > > > > > >> You are talking about freeing m1, right?
> > > > > > > > > > > > > > > >> rte_pktmbuf_prefree_seg(struct rte_mbuf
> *m)
> > > > > > > > > > > > > > > >> {
> > > > > > > > > > > > > > > >> 	...
> > > > > > > > > > > > > > > >> 	if (m->next != NULL) {
> > > > > > > > > > > > > > > >>                         m->next = NULL;
> > > > > > > > > > > > > > > >>                         m->nb_segs = 1;
> > > > > > > > > > > > > > > >>                 }
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >> m1->next != NULL, so it will enter the
> if()
> > > > > block,
> > > > > > > > > > > > > > > >> and will reset both next and nb_segs.
> > > > > > > > > > > > > > > >> What I am missing here?
> > > > > > > > > > > > > > > >> Thinking in more generic way, that
> change:
> > > > > > > > > > > > > > > >>  -		if (m->next != NULL) {
> > > > > > > > > > > > > > > >>  -			m->next = NULL;
> > > > > > > > > > > > > > > >>  -			m->nb_segs = 1;
> > > > > > > > > > > > > > > >>  -		}
> > > > > > > > > > > > > > > >>  +		m->next = NULL;
> > > > > > > > > > > > > > > >>  +		m->nb_segs = 1;
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Ah, sorry. I oversimplified the example
> and
> > > now
> > > > > it
> > > > > > > does
> > > > > > > > > not
> > > > > > > > > > > > > > > > show the issue...
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > The full example also adds a split() to
> break
> > > the
> > > > > > > mbuf
> > > > > > > > > chain
> > > > > > > > > > > > > > > > between m1 and m2. The kind of thing that
> > > would
> > > > > be
> > > > > > > done
> > > > > > > > > for
> > > > > > > > > > > > > > > > software TCP segmentation.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > If so, may be the right solution is to care
> > > about
> > > > > > > nb_segs
> > > > > > > > > > > > > > > when next is set to NULL on split? Any
> place
> > > when
> > > > > next
> > > > > > > is
> > > > > > > > > set
> > > > > > > > > > > > > > > to NULL. Just to keep the optimization in a
> > > more
> > > > > > > generic
> > > > > > > > > place.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > > The problem with that approach is that there
> are
> > > > > already
> > > > > > > > > several
> > > > > > > > > > > > > > existing split() or trim() implementations in
> > > > > different
> > > > > > > DPDK-
> > > > > > > > > based
> > > > > > > > > > > > > > applications. For instance, we have some in
> > > > > 6WINDGate. If
> > > > > > > we
> > > > > > > > > force
> > > > > > > > > > > > > > applications to set nb_seg to 1 when
> resetting
> > > next,
> > > > > it
> > > > > > > has
> > > > > > > > > to be
> > > > > > > > > > > > > > documented because it is not straightforward.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I think it is better to go that way.
> > > > > > > > > > > > > From my perspective it seems natural to reset
> > > nb_seg at
> > > > > > > same
> > > > > > > > > time
> > > > > > > > > > > > > we reset next, otherwise inconsistency will
> occur.
> > > > > > > > > > > >
> > > > > > > > > > > > While it is not explicitly stated for nb_segs, to
> me
> > > it
> > > > > was
> > > > > > > clear
> > > > > > > > > that
> > > > > > > > > > > > nb_segs is only valid in the first segment, like
> for
> > > many
> > > > > > > fields
> > > > > > > > > (port,
> > > > > > > > > > > > ol_flags, vlan, rss, ...).
> > > > > > > > > > > >
> > > > > > > > > > > > If we say that nb_segs has to be valid in any
> > > segments,
> > > > > it
> > > > > > > means
> > > > > > > > > that
> > > > > > > > > > > > chain() or split() will have to update it in all
> > > > > segments,
> > > > > > > which
> > > > > > > > > is not
> > > > > > > > > > > > efficient.
> > > > > > > > > > >
> > > > > > > > > > > Why in all?
> > > > > > > > > > > We can state that nb_segs on non-first segment
> should
> > > > > always
> > > > > > > equal
> > > > > > > > > 1.
> > > > > > > > > > > As I understand in that case, both split() and
> chain()
> > > have
> > > > > to
> > > > > > > > > update nb_segs
> > > > > > > > > > > only for head mbufs, rest ones will remain
> untouched.
> > > > > > > > > >
> > > > > > > > > > Well, anyway, I think it's strange to have a
> constraint
> > > on m-
> > > > > > > >nb_segs
> > > > > > > > > for
> > > > > > > > > > non-first segment. We don't have that kind of
> constraints
> > > for
> > > > > > > other
> > > > > > > > > fields.
> > > > > > > > >
> > > > > > > > > True, we don't. But this is one of the fields we
> consider
> > > > > critical
> > > > > > > > > for proper work of mbuf alloc/free mechanism.
> > > > > > > > >
> > > > > > > >
> > > > > > > > I am not sure that requiring m->nb_segs == 1 on non-first
> > > > > segments
> > > > > > > will provide any benefits.
> > > > > > >
> > > > > > > It would make this patch unneeded.
> > > > > > > So, for direct, non-segmented mbufs  pktmbuf_free() will
> remain
> > > > > write-
> > > > > > > free.
> > > > > >
> > > > > > I see. Then I agree with Konstantin that alternative
> solutions
> > > should
> > > > > be considered.
> > > > > >
> > > > > > The benefit regarding free()'ing non-segmented mbufs - which
> is a
> > > > > very common operation - certainly outweighs the cost of
> requiring
> > > > > split()/chain() operations to set the new head mbuf's nb_segs =
> 1.
> > > > > >
> > > > > > Nonetheless, the bug needs to be fixed somehow.
> > > > > >
> > > > > > If we can't come up with a better solution that doesn't break
> the
> > > > > ABI, we are forced to accept the patch.
> > > > > >
> > > > > > Unless the techboard accepts to break the ABI in order to
> avoid
> > > the
> > > > > performance cost of this patch.
> > > > >
> > > > > Did someone notice a performance drop with this patch?
> > > > > On my side, I don't see any regression on a L3 use case.
> > > >
> > > > I am afraid that the DPDK performance regression tests are based
> on
> > > TX immediately following RX, so cache misses in TX may go by
> unnoticed
> > > because RX warmed up the cache for TX already. And similarly for RX
> > > reusing mbufs that have been warmed up by the preceding free() at
> TX.
> > > >
> > > > Please consider testing the performance difference with the mbuf
> > > being completely cold at TX, and going completely cold again before
> > > being reused for RX.
> > > >
> > > > >
> > > > > Let's sumarize: splitting a mbuf chain and freeing it causes
> > > subsequent
> > > > > mbuf
> > > > > allocation to return a mbuf which is not correctly initialized.
> > > There
> > > > > are 2
> > > > > options to fix it:
> > > > >
> > > > > 1/ change the mbuf free function (this patch)
> > > > >
> > > > >    - m->nb_segs would behave like many other field: valid in
> the
> > > first
> > > > >      segment, ignored in other segments
> > > > >    - may impact performance (suspected)
> > > > >
> > > > > 2/ change all places where a mbuf chain is split, or trimmed
> > > > >
> > > > >    - m->nb_segs would have a specific behavior: count the
> number of
> > > > >      segments in the first mbuf, should be 1 in the last
> segment,
> > > > >      ignored in other ones.
> > > > >    - no code change in mbuf library, so no performance impact
> > > > >    - need to patch all places where we do a mbuf split or trim.
> > > From
> > > > > afar,
> > > > >      I see at least mbuf_cut_seg_ofs() in DPDK. Some external
> > > > > applications
> > > > >      may have to be patched (for instance, I already found 3
> places
> > > in
> > > > >      6WIND code base without a deep search).
> > > > >
> > > > > In my opinion, 1/ is better, except we notice a significant
> > > > > performance,
> > > > > because the (implicit) behavior is unchanged.
> > > > >
> > > > > Whatever the solution, some documentation has to be added.
> > > > >
> > > > > Olivier
> > > > >
> > > >
> > > > Unfortunately, I don't think that anything but the first option
> will
> > > go into 20.11 and stable releases of older versions, so I stand by
> my
> > > acknowledgment of the patch.
> > >
> > > If we are affraid about 20.11 performance (it is legitimate, few
> days
> > > before the release), we can target 21.02. After all, everybody
> lives
> > > with this bug since 2017, so there is no urgency. If accepted and
> well
> > > tested, it can be backported in stable branches.
> >
> > +1
> >
> > Good thinking, Olivier!
> 
> Looking at the changes once again, it probably can be reworked a bit:
> 
> -	if (m->next != NULL) {
> -		m->next = NULL;
> -		m->nb_segs = 1;
> -	}
> 
> +	if (m->next != NULL)
> +		m->next = NULL;
> +	if (m->nb_segs != 1)
> +		m->nb_segs = 1;
> 
> That way we add one more condition checking, but I suppose it
> shouldn't be that perf critical.
> That way for direct,non-segmented mbuf it still should be write-free.
> Except cases as you described above: chain(), then split().
> 
> Of-course we still need to do perf testing for that approach too.
> So if your preference it to postpone it till 21.02 - that's ok for me.
> Konstantin

With this suggestion, I cannot imagine any performance drop for direct, non-segmented mbufs: It now reads m->nb_segs, residing in the mbuf's first cache line, but the function already reads m->refcnt in the first cache line; so no cache misses are introduced.




More information about the dev mailing list