[dpdk-dev] ***Spam*** Re: [PATCH v4 1/2] mbuf: support attaching external buffer to mbuf

Andrew Rybchenko arybchenko at solarflare.com
Tue Apr 24 20:21:00 CEST 2018


On 04/24/2018 07:02 PM, Olivier Matz wrote:
> Hi Andrew, Yongseok,
>
> On Tue, Apr 24, 2018 at 03:28:33PM +0300, Andrew Rybchenko wrote:
>> On 04/24/2018 04:38 AM, Yongseok Koh wrote:
>>> This patch introduces a new way of attaching an external buffer to a mbuf.
>>>
>>> Attaching an external buffer is quite similar to mbuf indirection in
>>> replacing buffer addresses and length of a mbuf, but a few differences:
>>>     - When an indirect mbuf is attached, refcnt of the direct mbuf would be
>>>       2 as long as the direct mbuf itself isn't freed after the attachment.
>>>       In such cases, the buffer area of a direct mbuf must be read-only. But
>>>       external buffer has its own refcnt and it starts from 1. Unless
>>>       multiple mbufs are attached to a mbuf having an external buffer, the
>>>       external buffer is writable.
>>>     - There's no need to allocate buffer from a mempool. Any buffer can be
>>>       attached with appropriate free callback.
>>>     - Smaller metadata is required to maintain shared data such as refcnt.
>> Really useful. Many thanks. See my notes below.
>>
>> It worries me that detach is more expensive than it really required since it
>> requires to restore mbuf as direct. If mbuf mempool is used for mbufs
>> as headers for external buffers only all these actions are absolutely
>> useless.
> I agree on the principle. And we have the same issue with indirect mbuf.
> Currently, the assumption is that a free mbuf (inside a mempool) is
> initialized as a direct mbuf. We can think about optimizations here,
> but I'm not sure it should be in this patchset.

I agree that it should be addressed separately.

> [...]
>
>>> @@ -688,14 +704,33 @@ rte_mbuf_to_baddr(struct rte_mbuf *md)
>>>    }
>>>    /**
>>> + * Returns TRUE if given mbuf is cloned by mbuf indirection, or FALSE
>>> + * otherwise.
>>> + *
>>> + * If a mbuf has its data in another mbuf and references it by mbuf
>>> + * indirection, this mbuf can be defined as a cloned mbuf.
>>> + */
>>> +#define RTE_MBUF_CLONED(mb)     ((mb)->ol_flags & IND_ATTACHED_MBUF)
>>> +
>>> +/**
>>>     * Returns TRUE if given mbuf is indirect, or FALSE otherwise.
>>>     */
>>> -#define RTE_MBUF_INDIRECT(mb)   ((mb)->ol_flags & IND_ATTACHED_MBUF)
>>> +#define RTE_MBUF_INDIRECT(mb)   RTE_MBUF_CLONED(mb)
>> It is still confusing that INDIRECT != !DIRECT.
>> May be we have no good options right now, but I'd suggest to at least
>> deprecate
>> RTE_MBUF_INDIRECT() and completely remove it in the next release.
> Agree. I may have missed something, but is my previous suggestion
> not doable?
>
> - direct = embeds its own data      (and indirect = !direct)
> - clone (or another name) = data is another mbuf
> - extbuf = data is in an external buffer

I guess the problem that it changes INDIRECT semantics since EXTBUF
is added as well. I think strictly speaking it is an API change.
Is it OK to make it without announcement?

> Deprecating the macro is a good idea.
>
>>> +	m->buf_addr = buf_addr;
>>> +	m->buf_iova = buf_iova;
>>> +
>>> +	if (shinfo == NULL) {
>>> +		shinfo = RTE_PTR_ALIGN_FLOOR(RTE_PTR_SUB(buf_end,
>>> +					sizeof(*shinfo)), sizeof(uintptr_t));
>>> +		if ((void *)shinfo <= buf_addr)
>>> +			return NULL;
>>> +
>>> +		m->buf_len = RTE_PTR_DIFF(shinfo, buf_addr);
>>> +	} else {
>>> +		m->buf_len = buf_len;
>>> +	}
>>> +
>>> +	m->data_len = 0;
>>> +
>>> +	rte_pktmbuf_reset_headroom(m);
>> I would suggest to make data_off one more parameter.
>> If I have a buffer with data which I'd like to attach to an mbuf, I'd like
>> to control data_off.
> Another option is to set the headroom to 0.
> Because the after attaching the mbuf to an external buffer, we will
> still require to set the length.
>
> A user can do something like this:
>
> 	rte_pktmbuf_attach_extbuf(m, buf_va, buf_iova, buf_len, shinfo,
> 		free_cb, free_cb_arg);
> 	rte_pktmbuf_append(m, data_len + headroom);
> 	rte_pktmbuf_adj(m, headroom);
>
>>> +	m->ol_flags |= EXT_ATTACHED_MBUF;
>>> +	m->shinfo = shinfo;
>>> +
>>> +	rte_mbuf_ext_refcnt_set(shinfo, 1);
>> Why is assignment used here? Cannot we attach extbuf already attached to
>> other mbuf?
> In rte_pktmbuf_attach(), this is true. That's not illogical to
> keep the same approach here. Maybe an assert could be added?
>
>> May be shinfo should be initialized only if it is not provided (shinfo ==
>> NULL on input)?
> I don't get why, can you explain please?

May be I misunderstand how it should look like when one huge buffer
is partitioned. I thought that it should be only one shinfo per huge buffer
to control when it is not used any more by any mbufs with extbuf.

Other option is to have shinfo per small buf plus reference counter
per huge buf (which is decremented when small buf reference counter
becomes zero and free callback is executed). I guess it is assumed above.
My fear is that it is too much reference counters:
  1. mbuf reference counter
  2. small buf reference counter
  3. huge buf reference counter
May be it is possible use (1) for (2) as well?

Andrew.


More information about the dev mailing list