[dpdk-dev] [PATCH v5 1/2] mbuf: support attaching external buffer to mbuf

Yongseok Koh yskoh at mellanox.com
Wed Apr 25 20:02:36 CEST 2018


On Wed, Apr 25, 2018 at 05:23:20PM +0000, Ananyev, Konstantin wrote:
> 
> 
> > -----Original Message-----
> > From: Yongseok Koh [mailto:yskoh at mellanox.com]
> > Sent: Wednesday, April 25, 2018 6:07 PM
> > To: Ananyev, Konstantin <konstantin.ananyev at intel.com>
> > Cc: Lu, Wenzhuo <wenzhuo.lu at intel.com>; Wu, Jingjing <jingjing.wu at intel.com>; olivier.matz at 6wind.com; dev at dpdk.org;
> > arybchenko at solarflare.com; stephen at networkplumber.org; thomas at monjalon.net; adrien.mazarguil at 6wind.com;
> > nelio.laranjeiro at 6wind.com
> > Subject: Re: [PATCH v5 1/2] mbuf: support attaching external buffer to mbuf
> > 
> > On Wed, Apr 25, 2018 at 01:31:42PM +0000, Ananyev, Konstantin wrote:
> > [...]
> > > >  /** Mbuf prefetch */
> > > >  #define RTE_MBUF_PREFETCH_TO_FREE(m) do {       \
> > > >  	if ((m) != NULL)                        \
> > > > @@ -1213,11 +1306,127 @@ static inline int rte_pktmbuf_alloc_bulk(struct rte_mempool *pool,
> > > >  }
> > > >
> > > >  /**
> > > > + * Attach an external buffer to a mbuf.
> > > > + *
> > > > + * User-managed anonymous buffer can be attached to an mbuf. When attaching
> > > > + * it, corresponding free callback function and its argument should be
> > > > + * provided. This callback function will be called once all the mbufs are
> > > > + * detached from the buffer.
> > > > + *
> > > > + * The headroom for the attaching mbuf will be set to zero and this can be
> > > > + * properly adjusted after attachment. For example, ``rte_pktmbuf_adj()``
> > > > + * or ``rte_pktmbuf_reset_headroom()`` can be used.
> > > > + *
> > > > + * More mbufs can be attached to the same external buffer by
> > > > + * ``rte_pktmbuf_attach()`` once the external buffer has been attached by
> > > > + * this API.
> > > > + *
> > > > + * Detachment can be done by either ``rte_pktmbuf_detach_extbuf()`` or
> > > > + * ``rte_pktmbuf_detach()``.
> > > > + *
> > > > + * Attaching an external buffer is quite similar to mbuf indirection in
> > > > + * replacing buffer addresses and length of a mbuf, but a few differences:
> > > > + * - When an indirect mbuf is attached, refcnt of the direct mbuf would be
> > > > + *   2 as long as the direct mbuf itself isn't freed after the attachment.
> > > > + *   In such cases, the buffer area of a direct mbuf must be read-only. But
> > > > + *   external buffer has its own refcnt and it starts from 1. Unless
> > > > + *   multiple mbufs are attached to a mbuf having an external buffer, the
> > > > + *   external buffer is writable.
> > > > + * - There's no need to allocate buffer from a mempool. Any buffer can be
> > > > + *   attached with appropriate free callback and its IO address.
> > > > + * - Smaller metadata is required to maintain shared data such as refcnt.
> > > > + *
> > > > + * @warning
> > > > + * @b EXPERIMENTAL: This API may change without prior notice.
> > > > + * Once external buffer is enabled by allowing experimental API,
> > > > + * ``RTE_MBUF_DIRECT()`` and ``RTE_MBUF_INDIRECT()`` are no longer
> > > > + * exclusive. A mbuf can be considered direct if it is neither indirect nor
> > > > + * having external buffer.
> > > > + *
> > > > + * @param m
> > > > + *   The pointer to the mbuf.
> > > > + * @param buf_addr
> > > > + *   The pointer to the external buffer we're attaching to.
> > > > + * @param buf_iova
> > > > + *   IO address of the external buffer we're attaching to.
> > > > + * @param buf_len
> > > > + *   The size of the external buffer we're attaching to. If memory for
> > > > + *   shared data is not provided, buf_len must be larger than the size of
> > > > + *   ``struct rte_mbuf_ext_shared_info`` and padding for alignment. If not
> > > > + *   enough, this function will return NULL.
> > > > + * @param shinfo
> > > > + *   User-provided memory for shared data. If NULL, a few bytes in the
> > > > + *   trailer of the provided buffer will be dedicated for shared data and
> > > > + *   the shared data will be properly initialized. Otherwise, user must
> > > > + *   initialize the content except for free callback and its argument. The
> > > > + *   pointer of shared data will be stored in m->shinfo.
> > > > + * @param free_cb
> > > > + *   Free callback function to call when the external buffer needs to be
> > > > + *   freed.
> > > > + * @param fcb_opaque
> > > > + *   Argument for the free callback function.
> > > > + *
> > > > + * @return
> > > > + *   A pointer to the new start of the data on success, return NULL
> > > > + *   otherwise.
> > > > + */
> > > > +static inline char * __rte_experimental
> > > > +rte_pktmbuf_attach_extbuf(struct rte_mbuf *m, void *buf_addr,
> > > > +	rte_iova_t buf_iova, uint16_t buf_len,
> > > > +	struct rte_mbuf_ext_shared_info *shinfo,
> > > > +	rte_mbuf_extbuf_free_callback_t free_cb, void *fcb_opaque)
> > > > +{
> > > > +	/* Additional attachment should be done by rte_pktmbuf_attach() */
> > > > +	RTE_ASSERT(!RTE_MBUF_HAS_EXTBUF(m));
> > >
> > > Shouldn't we have here something like:
> > > RTE_ASSERT(RTE_MBUF_DIRECT(m) && rte_mbuf_refcnt_read(m) == 1);
> > > ?
> > 
> > Right. That's better. Attaching mbuf should be direct and writable.
> > 
> > > > +
> > > > +	m->buf_addr = buf_addr;
> > > > +	m->buf_iova = buf_iova;
> > > > +
> > > > +	if (shinfo == NULL) {
> > >
> > > Instead of allocating shinfo ourselves - wound's it be better to rely
> > > on caller always allocating afeeling it for us (he can do that at the end/start of buffer,
> > > or whenever he likes to.
> > 
> > It is just for convenience. For some users, external attachment could be
> > occasional and casual, e.g. punt control traffic from kernel/hv. For such
> > non-serious cases, it is good to provide this small utility.
> 
> For such users that small utility could be a separate function then:
> shinfo_inside_buf() or so.

I like this idea! As this is an inline function and can be called in a datapath,
shorter code is better if it isn't expected to be used frequently.

Will take this idea for the new version. Thanks.

> > > Again in that case - caller can provide one shinfo to several mbufs (with different buf_addrs)
> > > and would know for sure that free_cb wouldn't be overwritten by mistake.
> > > I.E. mbuf code will only update refcnt inside shinfo.
> > 
> > I think you missed the discussion with other people yesterday. This change is
> > exactly for that purpose. Like I documented above, if this API is called with
> > shinfo being provided, it will use the user-provided shinfo instead of sparing a
> > few byte in the trailer and won't touch the shinfo.
> 
> As I can see your current code always update free_cb and fcb_opaque.
> Which is kind of strange these fields shold be the same for all instances of the shinfo.
> 
> > This code block happens only
> > if user doesn't provide memory for shared data (shinfo is NULL).
> > 
> > > > +		void *buf_end = RTE_PTR_ADD(buf_addr, buf_len);
> > > > +
> > > > +		shinfo = RTE_PTR_ALIGN_FLOOR(RTE_PTR_SUB(buf_end,
> > > > +					sizeof(*shinfo)), sizeof(uintptr_t));
> > > > +		if ((void *)shinfo <= buf_addr)
> > > > +			return NULL;
> > > > +
> > > > +		m->buf_len = RTE_PTR_DIFF(shinfo, buf_addr);
> > > > +		rte_mbuf_ext_refcnt_set(shinfo, 1);
> > > > +	} else {
> > > > +		m->buf_len = buf_len;
> > >
> > > I think you need to update shinfo>refcnt here too.
> > 
> > Like explained above, if shinfo is provided, it doesn't alter anything except
> > for callbacks and its arg.
> 
> Hm, but I have 2mbufs attached to the same external buffer via  same shinfo,
> shouldn't  shinfo.refcnt == 2?

If the shinfo is provided by user, user is responsible for managing the content.
Please refer to my reply to Andrew. I drew a few diagrams.


Thanks,
Yongseok


More information about the dev mailing list