[dpdk-dev] [PATCH v2 1/5] mbuf: fix clone support when application uses private mbuf data

Olivier MATZ olivier.matz at 6wind.com
Mon Mar 30 21:55:50 CEST 2015

Hi Konstantin,

On 03/30/2015 02:34 PM, Ananyev, Konstantin wrote:
> Hi Olivier,
>> -----Original Message-----
>> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
>> Sent: Friday, March 27, 2015 3:17 PM
>> To: Ananyev, Konstantin; dev at dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH v2 1/5] mbuf: fix clone support when application uses private mbuf data
>> Hi Konstantin,
>> On 03/27/2015 03:25 PM, Ananyev, Konstantin wrote:
>>> Hi Olivier,
>>>> -----Original Message-----
>>>> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
>>>> Sent: Friday, March 27, 2015 1:56 PM
>>>> To: Ananyev, Konstantin; dev at dpdk.org
>>>> Subject: Re: [dpdk-dev] [PATCH v2 1/5] mbuf: fix clone support when application uses private mbuf data
>>>> Hi Konstantin,
>>>> On 03/27/2015 10:07 AM, Olivier MATZ wrote:
>>>>>> I think that to support ability to setup priv_size on a mempool basis,
>>>>>> and reserve private space between struct rte_mbuf and rte_mbuf. buf_addr,
>>>>>> we need to:
>>>>>> 1. Store priv_size both inside the mempool and inside the mbuf.
>>>>>> 2. rte_pktmbuf_attach() should change the value of priv_size to the priv_size of the direct mbuf we are going to attach to:
>>>>>> rte_pktmbuf_attach(struct rte_mbuf *mi, struct rte_mbuf *md) {... mi->priv_size = md->priv_size; ...}
>>>>>> 3. rte_pktmbuf_detach() should restore original value of mbuf's priv_size:
>>>>>> rte_pktmbuf_detach(struct rte_mbuf *m)
>>>>>> {
>>>>>>   ...
>>>>>>    m->priv_size = rte_mempool_get_privsize(m->pool);
>>>>>>    m->buf _addr= rte_mbuf_to_baddr(m);
>>>>>>    ...
>>>>>> }
>>>>>> Also I think we need to provide a way to specify priv_size for all mbufs of the mepool at init time:
>>>>>> - either force people to specify it at rte_mempool_create() time (probably use init_arg for that),
>>>>>> - or provide separate function that could be called straight after rte_mempool_create() , that
>>>>>> would setup priv_size for the  pool and for all its mbufs.
>>>>>> - or some sort of combination of these 2 approaches - introduce a wrapper function
>>>>>> (rte_mbuf_pool_create() or something) that would take priv_size as parameter,
>>>>>> create a new mempool and then setup priv_size.
>>>> I though a bit more about this solution, and I realized that doing
>>>> mi->priv_size = md->priv_size in rte_pktmbuf_attach() is probably not
>>>> a good idea, as there is no garantee that the size of mi is large enough
>>>> to store the priv of md.
>>>> Having the same priv_size for mi and md is maybe a good constraint.
>>>> I can add this in the API comments and assertions in the code to
>>>> check this condition, what do you think?
>>> Probably we have a different concepts of what is mbuf's  private space in mind.
>>> From my point, even indirect buffer should use it's own private space and
>>> leave contents of direct mbuf it attached to unmodified.
>>> After attach() operation, only contents of the buffer are shared between mbufs,
>>> but not the mbuf's metadata.
>> Sorry if it was not clear in my previous messages, but I agree
>> with your description. When attaching a mbuf, only data, not
>> metadata should be shared.
>> In the solution you are suggesting (quoted above), you say we need
>> to set mi->priv_size to md->priv_size in rte_pktmbuf_attach(). I felt
>> this was not possible, but it depends on the meaning we give to
>> priv_size:
>> 1. If the meaning is "the size of the private data embedded in this
>>    mbuf", which is the most logical meaning, we cannot do this
>>    affectation
>> 2. If the meaning is "the size of the private data embedded in the
>>    mbuf the buf_addr is pointing to" (which is harder to get), the
>>    affectation makes sense.
>> From what I understand, you feel we should use 2/ as priv_size
>> definition. Is it correct?
> Yes, I meant 2.
> From my point priv_size inside mbuf is more like 'buf_ofs'.
> It's main purpose is for internal use - to help our mbuf manipulation routinies
> (attach/detach/free) to work correctly.
> If the user wants to query size of private space for the mbuf, he probably should
> use the value from mempool.


>> In my previous message, the definition of m->priv_size was 1/,
>> so that's why I felt assigning mi->priv_size to md->priv_size was
>> not possible.
>> I agree 2/ is probably a good choice, as it would allow to attach
>> to a mbuf with a different priv_size. It may require some additional
>> comments above the field in the structure to explain that.
>>> Otherwise on detach(), you'll have to copy contents of private space back, from direct to indirect mbuf?
>>> Again how to deal with the case, when 2 or more mbufs will attach to the same direct one?
>>> So let say, if we'll have a macro:
>>> #define RTE_MBUF_PRIV_PTR(mb)	((void *)((struct rte_mbuf *)(mb)) + 1))
>>> No matter is mb  a direct or indirect mbuf.
>>> Do you have something else in mind here?
>> I completely agree with this macro. We should consider the private data
>> as an extension of the mbuf structure.
>>>>> Introducing rte_mbuf_pool_create() seems a good idea to me, it
>>>>> would hide 'rte_pktmbuf_pool_private' from the user and force
>>>>> to initialize all the required fields (mbuf_data_room_size only
>>>>> today, and maybe mbuf_priv_size).
>>>>> The API would be:
>>>>> struct rte_mempool *
>>>>> rte_mbuf_pool_create(const char *name, unsigned n, unsigned elt_size,
>>>>> 	unsigned cache_size, size_t mbuf_priv_size,
>>>>> 	rte_mempool_obj_ctor_t *obj_init, void *obj_init_arg,
>>>>> 	int socket_id, unsigned flags)
>>>>> I can give it a try and send a patch for this.
>>>> About this, it is not required anymore for this patch series if we
>>>> agree with my comment above.
>>> I still think we need some way to setup priv_size on a per-mempool basis.
>>> Doing that in rte_mbuf_pool_create() seems like a good thing to me.
>>> Not sure, why you decided to drop it?
>> I think we can already do it without changing the API by providing
>> our own rte_pktmbuf_init and rte_pktmbuf_pool_init.
>> rte_pktmbuf_init() has to set:
>>   m->buf_len = mp->elt_size - sizeof(struct mbuf);
>>   m->priv_size = sizeof(struct mbuf) - sizeof(struct rte_mbuf);
>> rte_pktmbuf_pool_init() has to set:
>>   /* we can use the default function */
>>   mbp_priv->mbuf_data_room_size = MBUF_RXDATA_SIZE +
> Yeh, when  arg==NULL for rte_pktmbuf_pool_init() we always set up
> mbuf_data_room_size to the hardcoded value.
> Which always looked strange to me.
> I think it should be set to:
> mp->elt_size - sizeof(struct rte_mbuf) - RTE_PKTMBUF_HEADROOM;
> for that case.

Yes, that would make more sense instead of the hardcoded value.
But I'm not sure it should be part of this series as the clone
patches won't change this behavior. I would prefer to have it in
another series that reworks mbuf pool initialization. I can also
work on it.

On the other hand if you really feel this patch is needed in
this series, it's not a problem as it's a one-liner.

>> In this case, it is possible to calculate the mbuf_priv_size only
>> from the pool object:
>>   mbuf_priv_size = pool->elt_size - RTE_PKTMBUF_HEADROOM -
>> 	pool_private->mbuf_data_room_size -
>> 	sizeof(rte_mbuf)
> So if I understand your idea correctly:
> If  second parameter for rte_pktmbuf_pool_init() is NULL, then 
> we setup
> mbp_priv->mbuf_data_room_size = mp->elt_size - sizeof(struct rte_mbuf) - RTE_PKTMBUF_HEADROOM;
> Which means that  priv_size ==0  for all  mbufs in the pool 
> Otherwise we setup:
>  mbp_priv->mbuf_data_room_size = opaque_arg;
> As we are doing now, and priv_size for all mbufs in the pool will be:
> pool->elt_size - pool_private->mbuf_data_room_size - sizeof(rte_mbuf);
> And in that case, all users have to do to specify priv_size for mempool is to pass a proper argument
> for rte_pktmbuf_pool_init() at  mempool_create().
> Correct? 


>> I agree it's not ideal, but I think the mbuf pool initialization
>> is another problem. That's why I suggested to change this in a
>> separate series that will add rte_mbuf_pool_create() with the
>> API described above. Thoughts?
> I also put answers to another mail below.
> Just to keep all discussion in one place.
>>> Though, I still think that the better approach is to reserve private space before struct rte_mbuf, not after.
>>> In that case, user got his private space, and we keep buf_addr straight after  rte_mbuf, without any whole.
>>> So we don't need steps 2 and 3, above,
>>> plus we don't need rte_mbuf_to_baddr() and rte_mbuf_from_indirect() -
>>> RTE_MBUF_TO_BADDR/ RTE_MBUF_FROM_BADDR would keep working correctly.
>>> In fact, with this scheme - we don't even need priv_size for mbuf management (attach/detach/free).
>>> Wonder did you try that approach?
>> Yes, I though about this approach too. But I think it would require
>> more changes. When an application or a driver allocate a mbuf with
>> mempool_get(), it expects that the returned pointer is the rte_mbuf *.
> Yep, sure it will still get the pointer to the rte_mbuf *.
> Though later, if upper layer would like to convert from  rte_mbuf* to app_specific_mbuf *,
> it would have to use a macro:
> #define RTE_MBUF_TO_PRIV(m)	((void *)((uintptr_t)(m) - (m)->priv_size)) 
>> With this solution, there are 2 options:
>> - no mempool modification, so each application/driver has to add
>>   priv_size bytes to the object to get the mbuf pointer. This does not
>>   look feasible.
>> - change the mempool to support private area before each object. I
>>   think mempool API is already quite complex, and I think it's not
>>   the role of the mempool library to provide such features.
> In fact, I think the changes would be minimal.
> All we have to do, is to make changes in rte_pktmbuf_init():
> void
> rte_pktmbuf_init(struct rte_mempool *mp,
>                  __attribute__((unused)) void *opaque_arg,
>                  void *_m,
>                  __attribute__((unused)) unsigned i)
> {
>      //extract priv_size from mempool   (discussed above).
>       uint16_t priv_size = rte_mbufpool_get_priv_size(mp); 
>       struct rte_mbuf *m = _m + priv_size;
>       uint32_t buf_len = mp->elt_size - sizeof(struct rte_mbuf) - priv_size;
> ....
> With that way priv_size inside mbuf would always be the size its own private space,
> for both direct and indirect mbus., so we don't require to set priv_size at attach/detach.
> Again RTE_MBUF_TO_BADDR() and RTE_MBUF_FROM_BADDR() would just work without any modifications.

I'm not sure I'm getting it. The argument '_m' of your
rte_pktmbuf_init() is the pointer to the priv data, right?
So it means that the mbuf is located priv_size bytes after.

The rte_pktmbuf_init() function is called by mempool_create(),
and the _m parameter is a pointer to a mempool object. So
in my understanding, mempool_get() would not return a rte_mbuf
but a pointer to the application private data.


> Konstantin

More information about the dev mailing list