[dpdk-dev] [PATCH v4 1/2] mbuf: support attaching external buffer to mbuf

Yongseok Koh yskoh at mellanox.com
Tue Apr 24 13:47:08 CEST 2018


On Mon, Apr 23, 2018 at 10:01:07PM -0700, Stephen Hemminger wrote:
> On Mon, 23 Apr 2018 18:38:53 -0700
> Yongseok Koh <yskoh at mellanox.com> wrote:
> 
> > This patch introduces a new way of attaching an external buffer to a mbuf.
> > 
> > Attaching an external buffer is quite similar to mbuf indirection in
> > replacing buffer addresses and length of a mbuf, but a few differences:
> >   - When an indirect mbuf is attached, refcnt of the direct mbuf would be
> >     2 as long as the direct mbuf itself isn't freed after the attachment.
> >     In such cases, the buffer area of a direct mbuf must be read-only. But
> >     external buffer has its own refcnt and it starts from 1. Unless
> >     multiple mbufs are attached to a mbuf having an external buffer, the
> >     external buffer is writable.
> >   - There's no need to allocate buffer from a mempool. Any buffer can be
> >     attached with appropriate free callback.
> >   - Smaller metadata is required to maintain shared data such as refcnt.
> > 
> > Signed-off-by: Yongseok Koh <yskoh at mellanox.com>
> 
> I think this is a good idea. It looks more useful than indirect mbuf's for
> the use case where received data needs to come from a non mempool area.

Actually Olivier's idea and I just implemented it for my need. :-)

> Does it have any performance impact? I would hope it doesn't impact
> applications not using external buffers.

It would have little. The only change which can impact regular cases is in
rte_pktmbuf_prefree_seg(). This critical path inlines rte_pktmbuf_detach() and
it becomes a little longer - a few more instructions to update refcnt and branch
to user-provided callback. In io fwd of testpmd with single core, I'm not seeing
any noticeable drop.

> Is it possible to start with a refcnt > 1 for the mbuf?  I am thinking
> of the case in netvsc where data is received into an area returned
> from the host. The area is an RNDIS buffer and may contain several
> packets.  A useful optimization would be for the driver return
> mbufs which point to that buffer where starting refcnt value
> is the number of packets in the buffer.  When refcnt goes to
> 0 the buffer would be returned to the host.

That's actually my use-case for mlx5 PMD. mlx5 device supports "Multi-Packet Rx
Queue".  The device can pack multiple packets into a single Rx buffer to reduce
PCIe overhead of control transactions. And it is also quite common for FPGA
based NICs. What I've done is allocating a big buffer (from a PMD-private
mempool) and reserve a space in the head to store metadata to manage another
refcnt, which gets decremented by registered callback func. And the callback
func will free the whole chunk if the refcnt gets to zero.

+--+----+--------------+---+----+--------------+---+---+- - -
|  |head|mbuf1 data    |sh |head|mbuf2 data    |sh |   |
|  |room|              |inf|room|              |inf|   |
+--+----+--------------+---+----+--------------+---+---+- - -
 ^
 |
 Metadata for the whole chunk, having another refcnt managed by PMD.
 fcb_opaque will have this pointer so that the callback func knows it.

> One other problem with this is that it adds an additional buffer
> management constraint on the application. If for example the
> mbuf's are going into a TCP stack and TCP can have very slow
> readers; then the receive buffer might have a long lifetime.
> Since the receive buffers are limited, eventually the receive
> area runs out and no more packets are received. Much fingerpointing
> and angry users ensue..

In such a case (buffer depletion), I memcpy the Rx packet to mbuf instead of
attaching it to the mbuf until buffers get available. Seems unavoidable
penalty but better than dropping packets.


Thanks,
Yongseok


More information about the dev mailing list