[dpdk-dev] [PATCH v2 1/5] mbuf: fix clone support when application uses private mbuf data

Olivier MATZ olivier.matz at 6wind.com
Sat Mar 28 22:19:12 CET 2015


Hi Zoltan,

On 03/27/2015 07:11 PM, Zoltan Kiss wrote:
>> Sorry if it was not clear in my previous messages, but I agree
>> with your description. When attaching a mbuf, only data, not
>> metadata should be shared.
>>
>> In the solution you are suggesting (quoted above), you say we need
>> to set mi->priv_size to md->priv_size in rte_pktmbuf_attach(). I felt
>> this was not possible, but it depends on the meaning we give to
>> priv_size:
>>
>> 1. If the meaning is "the size of the private data embedded in this
>>     mbuf", which is the most logical meaning, we cannot do this
>>     affectation
>>
>> 2. If the meaning is "the size of the private data embedded in the
>>     mbuf the buf_addr is pointing to" (which is harder to get), the
>>     affectation makes sense.
>>
>>  From what I understand, you feel we should use 2/ as priv_size
>> definition. Is it correct?
>>
>> In my previous message, the definition of m->priv_size was 1/,
>> so that's why I felt assigning mi->priv_size to md->priv_size was
>> not possible.
>>
>> I agree 2/ is probably a good choice, as it would allow to attach
>> to a mbuf with a different priv_size. It may require some additional
>> comments above the field in the structure to explain that.
> I think we need to document it in the comments very well that you can
> attach mbuf's to each other with different private area sizes, and
> applications should care about this difference. And we should give a
> macro to get the private area size, which will get rte_mbuf.mp->priv_size.
> Actually we should give some better name to rte_mbuf.priv_size, it's a
> bit misleading now. Maybe direct_priv_size?

I agree it should be well documented in the API comments, but I'm
not sure it's worth changing the name of the field. After all,
m->buf_addr and m->buf_len are also related to the buffer of the
direct mbuf without beeing named "direct_*".


>>>>> Introducing rte_mbuf_pool_create() seems a good idea to me, it
>>>>> would hide 'rte_pktmbuf_pool_private' from the user and force
>>>>> to initialize all the required fields (mbuf_data_room_size only
>>>>> today, and maybe mbuf_priv_size).
>>>>>
>>>>> The API would be:
>>>>>
>>>>> struct rte_mempool *
>>>>> rte_mbuf_pool_create(const char *name, unsigned n, unsigned elt_size,
>>>>>     unsigned cache_size, size_t mbuf_priv_size,
>>>>>     rte_mempool_obj_ctor_t *obj_init, void *obj_init_arg,
>>>>>     int socket_id, unsigned flags)
>>>>>
>>>>> I can give it a try and send a patch for this.
>>>>
>>>> About this, it is not required anymore for this patch series if we
>>>> agree with my comment above.
>>>
>>> I still think we need some way to setup priv_size on a per-mempool
>>> basis.
>>> Doing that in rte_mbuf_pool_create() seems like a good thing to me.
>>> Not sure, why you decided to drop it?
>>
>> I think we can already do it without changing the API by providing
>> our own rte_pktmbuf_init and rte_pktmbuf_pool_init.
>>
>> rte_pktmbuf_init() has to set:
>>    m->buf_len = mp->elt_size - sizeof(struct mbuf);
>>    m->priv_size = sizeof(struct mbuf) - sizeof(struct rte_mbuf);
> What's struct mbuf? If we take my assumption above, direct_priv_size
> could go uninitalized, and we can set it when attaching.

In this example, "struct mbuf" is the mbuf used by the application:

struct mbuf {
	struct rte_mbuf rte_mb;
	struct app_private priv;
};

Therefore, we have:

sizeof(struct mbuf) = sizeof(struct rte_mbuf) +
	sizeof(struct app_private);


About keeping the m->priv_size field not initialized, I'm not really
convinced because we would always have to use the pool->mbuf_priv_size
when we want to get the address of data buffer embedded in a mbuf. This
implies several indirections, and we have only one if the info is
already in the mbuf.


>> rte_pktmbuf_pool_init() has to set:
>>    /* we can use the default function */
>>    mbp_priv->mbuf_data_room_size = MBUF_RXDATA_SIZE +
>>        RTE_PKTMBUF_HEADROOM;
>>
>> In this case, it is possible to calculate the mbuf_priv_size only
>> from the pool object:
>>
>>    mbuf_priv_size = pool->elt_size - RTE_PKTMBUF_HEADROOM -
>>     pool_private->mbuf_data_room_size -
>>     sizeof(rte_mbuf)
> My understanding is that the pool private date is something completely
> different than the private data of the mbufs. I think
> rte_mempool.priv_size should be initialized in *mp_init.

As I said in my previous mail, I think we don't need to have
pool_private->mbuf_priv_size for this series, as it can be
calculated from what we already have in pool_private. I'll send
a v3 patch that implement this solution, and it can be a base for
discussions.

However, I think the way mbuf pools are initialized may need some
rework, maybe using rte_mbuf_pool_create() as Konstantin suggested.
I'll try to do some reworks in this area in another series.

Regards,
Olivier



> 
>>
>>
>> I agree it's not ideal, but I think the mbuf pool initialization
>> is another problem. That's why I suggested to change this in a
>> separate series that will add rte_mbuf_pool_create() with the
>> API described above. Thoughts?
> 
> 
>>
>>
>> Thanks,
>> Olivier
>>
>>
>>>
>>> Konstantin
>>>
>>>>
>>>> I'll send a separate patch for that. It's probably a good occasion
>>>> to get rid of the pointer casted into an integer for
>>>> mbuf_data_room_size.
>>>>
>>>> Regards,
>>>> Olivier



More information about the dev mailing list