[dpdk-dev] [PATCH 15/15] mbuf: move pool pointer in hotterfirst half

Morten Brørup mb at smartsharesystems.com
Thu Nov 5 10:04:29 CET 2020


> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ananyev,
> Konstantin
> Sent: Thursday, November 5, 2020 1:26 AM
> 
> >
> > Hi,
> >
> > On Tue, Nov 03, 2020 at 04:03:46PM +0100, Morten Brørup wrote:
> > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Slava
> Ovsiienko
> > > > Sent: Tuesday, November 3, 2020 3:03 PM
> > > >
> > > > Hi, Morten
> > > >
> > > > > From: Morten Brørup <mb at smartsharesystems.com>
> > > > > Sent: Tuesday, November 3, 2020 14:10
> > > > >
> > > > > > From: Thomas Monjalon [mailto:thomas at monjalon.net]
> > > > > > Sent: Monday, November 2, 2020 4:58 PM
> > > > > >
> > > > > > +Cc techboard
> > > > > >
> > > > > > We need benchmark numbers in order to take a decision.
> > > > > > Please all, prepare some arguments and numbers so we can
> discuss
> > > > the
> > > > > > mbuf layout in the next techboard meeting.
> >
> > I did some quick tests, and it appears to me that just moving the
> pool
> > pointer to the first cache line has not a significant impact.
> 
> Hmm, as I remember Thomas mentioned about 5%+ improvement
> with that change. Though I suppose a lot depends from actual test-case.
> Would be good to know when it does help and when it doesn't.
> 
> >
> > However, I agree with Morten that there is some room for optimization
> > around m->pool: I did a hack in the ixgbe driver to assume there is
> only
> > one mbuf pool. This simplifies a lot the freeing of mbufs in Tx,
> because
> > we don't have to group them in bulks that shares the same pool (see
> > ixgbe_tx_free_bufs()). The impact of this hack is quite good: +~5% on
> a
> > real-life forwarding use case.
> 
> I think we already have such optimization ability within DPDK:
> #define DEV_TX_OFFLOAD_MBUF_FAST_FREE   0x00010000
> /**< Device supports optimization for fast release of mbufs.
>  *   When set application must guarantee that per-queue all mbufs comes
> from
>  *   the same mempool and has refcnt = 1.
>  */
> 
> Seems over-optimistic to me, but many PMDs do support it.

Looking at a few drivers using this flag, Intel drivers use rte_mempool_put(m->pool), and thus still reads the second cache line. Only ThunderX seems to use the optimization benefit and use rte_mempool_put_bulk(q->pool).

I would rather see a generic optimization of free()'ing non-segmented packets in the mbuf library, where free() and free_seg() take advantage of knowing whether they are working on the first segment or not - like the is_header indicator in many of the mbuf check functions - and, when working on the first segment, gate access to n->next by m->nb_segs > 1.

Concept:

static inline void
rte_pktmbuf_free(struct rte_mbuf *m)
{
    struct rte_mbuf *m_next;

    /* NOTE: Sanity check of header has moved to __rte_pktmbuf_prefree_seg(). */

    if (m != NULL) {
        if (m->nb_segs == 1) {
            __rte_pktmbuf_free_seg(m, 1);
        } else {
            m_next = m->next;
            __rte_pktmbuf_free_seg(m, 1);
            m = m_next;
            while (m != NULL) {
                m_next = m->next;
                __rte_pktmbuf_free_seg(m, 0);
                m = m_next;
            }
        }
    }
}

static __rte_always_inline void
__rte_pktmbuf_free_seg(struct rte_mbuf *m, int is_header)
{
    m = __rte_pktmbuf_prefree_seg(m, is_header);
    if (likely(m != NULL))
        rte_mbuf_raw_free(m);
}

static __rte_always_inline struct rte_mbuf *
__rte_pktmbuf_prefree_seg(struct rte_mbuf *m, int is_header)
{
    __rte_mbuf_sanity_check(m, is_header);

    if (likely(rte_mbuf_refcnt_read(m) == 1)) {

        if (!RTE_MBUF_DIRECT(m)) {
            rte_pktmbuf_detach(m);
            if (RTE_MBUF_HAS_EXTBUF(m) &&
                RTE_MBUF_HAS_PINNED_EXTBUF(m) &&
                __rte_pktmbuf_pinned_extbuf_decref(m))
                return NULL;
        }

        if (is_header && m->nb_segs == 1)
            return m; /* NOTE: Avoid touching (writing to) the second cache line. */

        if (m->next != NULL) {
            m->next = NULL;
            m->nb_segs = 1;
        }

        return m;

    } else if (__rte_mbuf_refcnt_update(m, -1) == 0) {

        if (!RTE_MBUF_DIRECT(m)) {
            rte_pktmbuf_detach(m);
            if (RTE_MBUF_HAS_EXTBUF(m) &&
                RTE_MBUF_HAS_PINNED_EXTBUF(m) &&
                __rte_pktmbuf_pinned_extbuf_decref(m))
                return NULL;
        }

        if (is_header && m->nb_segs == 1) {
            /* NOTE: Avoid touching (writing to) the second cache line. */
            rte_mbuf_refcnt_set(m, 1);
            return m;
        }

        if (m->next != NULL) {
            m->next = NULL;
            m->nb_segs = 1;
        }
        rte_mbuf_refcnt_set(m, 1);

        return m;
    }
    return NULL;
}

Furthermore, this concept might provide an additional performance improvement by moving m->pool to the first cache line, so rte_mempool_put() in rte_mbuf_raw_free() wouldn't have to touch the second cache line either.



More information about the dev mailing list