[dpdk-dev] [PATCH v2 1/5] mbuf: fix clone support when application uses private mbuf data

Olivier MATZ olivier.matz at 6wind.com
Fri Mar 27 16:17:11 CET 2015


Hi Konstantin,

On 03/27/2015 03:25 PM, Ananyev, Konstantin wrote:
> Hi Olivier,
> 
>> -----Original Message-----
>> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
>> Sent: Friday, March 27, 2015 1:56 PM
>> To: Ananyev, Konstantin; dev at dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH v2 1/5] mbuf: fix clone support when application uses private mbuf data
>>
>> Hi Konstantin,
>>
>> On 03/27/2015 10:07 AM, Olivier MATZ wrote:
>>>> I think that to support ability to setup priv_size on a mempool basis,
>>>> and reserve private space between struct rte_mbuf and rte_mbuf. buf_addr,
>>>> we need to:
>>>>
>>>> 1. Store priv_size both inside the mempool and inside the mbuf.
>>>>
>>>> 2. rte_pktmbuf_attach() should change the value of priv_size to the priv_size of the direct mbuf we are going to attach to:
>>>> rte_pktmbuf_attach(struct rte_mbuf *mi, struct rte_mbuf *md) {... mi->priv_size = md->priv_size; ...}
>>>>
>>>> 3. rte_pktmbuf_detach() should restore original value of mbuf's priv_size:
>>>> rte_pktmbuf_detach(struct rte_mbuf *m)
>>>> {
>>>>   ...
>>>>    m->priv_size = rte_mempool_get_privsize(m->pool);
>>>>    m->buf _addr= rte_mbuf_to_baddr(m);
>>>>    ...
>>>> }
>>>>
>>>> Also I think we need to provide a way to specify priv_size for all mbufs of the mepool at init time:
>>>> - either force people to specify it at rte_mempool_create() time (probably use init_arg for that),
>>>> - or provide separate function that could be called straight after rte_mempool_create() , that
>>>> would setup priv_size for the  pool and for all its mbufs.
>>>> - or some sort of combination of these 2 approaches - introduce a wrapper function
>>>> (rte_mbuf_pool_create() or something) that would take priv_size as parameter,
>>>> create a new mempool and then setup priv_size.
>>
>> I though a bit more about this solution, and I realized that doing
>> mi->priv_size = md->priv_size in rte_pktmbuf_attach() is probably not
>> a good idea, as there is no garantee that the size of mi is large enough
>> to store the priv of md.
>>
>> Having the same priv_size for mi and md is maybe a good constraint.
>> I can add this in the API comments and assertions in the code to
>> check this condition, what do you think?
> 
> Probably we have a different concepts of what is mbuf's  private space in mind.
> From my point, even indirect buffer should use it's own private space and
> leave contents of direct mbuf it attached to unmodified.  
> After attach() operation, only contents of the buffer are shared between mbufs,
> but not the mbuf's metadata. 

Sorry if it was not clear in my previous messages, but I agree
with your description. When attaching a mbuf, only data, not
metadata should be shared.

In the solution you are suggesting (quoted above), you say we need
to set mi->priv_size to md->priv_size in rte_pktmbuf_attach(). I felt
this was not possible, but it depends on the meaning we give to
priv_size:

1. If the meaning is "the size of the private data embedded in this
   mbuf", which is the most logical meaning, we cannot do this
   affectation

2. If the meaning is "the size of the private data embedded in the
   mbuf the buf_addr is pointing to" (which is harder to get), the
   affectation makes sense.

>From what I understand, you feel we should use 2/ as priv_size
definition. Is it correct?

In my previous message, the definition of m->priv_size was 1/,
so that's why I felt assigning mi->priv_size to md->priv_size was
not possible.

I agree 2/ is probably a good choice, as it would allow to attach
to a mbuf with a different priv_size. It may require some additional
comments above the field in the structure to explain that.


> Otherwise on detach(), you'll have to copy contents of private space back, from direct to indirect mbuf? 
> Again how to deal with the case, when 2 or more mbufs will attach to the same direct one?
> 
> So let say, if we'll have a macro:
> 
> #define RTE_MBUF_PRIV_PTR(mb)	((void *)((struct rte_mbuf *)(mb)) + 1))
> 
> No matter is mb  a direct or indirect mbuf.
> Do you have something else in mind here?

I completely agree with this macro. We should consider the private data
as an extension of the mbuf structure.


>>> Introducing rte_mbuf_pool_create() seems a good idea to me, it
>>> would hide 'rte_pktmbuf_pool_private' from the user and force
>>> to initialize all the required fields (mbuf_data_room_size only
>>> today, and maybe mbuf_priv_size).
>>>
>>> The API would be:
>>>
>>> struct rte_mempool *
>>> rte_mbuf_pool_create(const char *name, unsigned n, unsigned elt_size,
>>> 	unsigned cache_size, size_t mbuf_priv_size,
>>> 	rte_mempool_obj_ctor_t *obj_init, void *obj_init_arg,
>>> 	int socket_id, unsigned flags)
>>>
>>> I can give it a try and send a patch for this.
>>
>> About this, it is not required anymore for this patch series if we
>> agree with my comment above.
> 
> I still think we need some way to setup priv_size on a per-mempool basis.
> Doing that in rte_mbuf_pool_create() seems like a good thing to me.
> Not sure, why you decided to drop it?

I think we can already do it without changing the API by providing
our own rte_pktmbuf_init and rte_pktmbuf_pool_init.

rte_pktmbuf_init() has to set:
  m->buf_len = mp->elt_size - sizeof(struct mbuf);
  m->priv_size = sizeof(struct mbuf) - sizeof(struct rte_mbuf);

rte_pktmbuf_pool_init() has to set:
  /* we can use the default function */
  mbp_priv->mbuf_data_room_size = MBUF_RXDATA_SIZE +
  	RTE_PKTMBUF_HEADROOM;

In this case, it is possible to calculate the mbuf_priv_size only
from the pool object:

  mbuf_priv_size = pool->elt_size - RTE_PKTMBUF_HEADROOM -
	pool_private->mbuf_data_room_size -
	sizeof(rte_mbuf)


I agree it's not ideal, but I think the mbuf pool initialization
is another problem. That's why I suggested to change this in a
separate series that will add rte_mbuf_pool_create() with the
API described above. Thoughts?


Thanks,
Olivier


> 
> Konstantin
> 
>>
>> I'll send a separate patch for that. It's probably a good occasion
>> to get rid of the pointer casted into an integer for
>> mbuf_data_room_size.
>>
>> Regards,
>> Olivier


More information about the dev mailing list