[dpdk-dev] [PATCH v2 1/5] mbuf: fix clone support when application uses private mbuf data

Ananyev, Konstantin konstantin.ananyev at intel.com
Tue Mar 31 01:17:23 CEST 2015


Hi Olivier,

> -----Original Message-----
> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
> Sent: Monday, March 30, 2015 8:56 PM
> To: Ananyev, Konstantin
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2 1/5] mbuf: fix clone support when application uses private mbuf data
> 
> Hi Konstantin,
> 
> On 03/30/2015 02:34 PM, Ananyev, Konstantin wrote:
> > Hi Olivier,
> >
> >> -----Original Message-----
> >> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
> >> Sent: Friday, March 27, 2015 3:17 PM
> >> To: Ananyev, Konstantin; dev at dpdk.org
> >> Subject: Re: [dpdk-dev] [PATCH v2 1/5] mbuf: fix clone support when application uses private mbuf data
> >>
> >> Hi Konstantin,
> >>
> >> On 03/27/2015 03:25 PM, Ananyev, Konstantin wrote:
> >>> Hi Olivier,
> >>>
> >>>> -----Original Message-----
> >>>> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
> >>>> Sent: Friday, March 27, 2015 1:56 PM
> >>>> To: Ananyev, Konstantin; dev at dpdk.org
> >>>> Subject: Re: [dpdk-dev] [PATCH v2 1/5] mbuf: fix clone support when application uses private mbuf data
> >>>>
> >>>> Hi Konstantin,
> >>>>
> >>>> On 03/27/2015 10:07 AM, Olivier MATZ wrote:
> >>>>>> I think that to support ability to setup priv_size on a mempool basis,
> >>>>>> and reserve private space between struct rte_mbuf and rte_mbuf. buf_addr,
> >>>>>> we need to:
> >>>>>>
> >>>>>> 1. Store priv_size both inside the mempool and inside the mbuf.
> >>>>>>
> >>>>>> 2. rte_pktmbuf_attach() should change the value of priv_size to the priv_size of the direct mbuf we are going to attach to:
> >>>>>> rte_pktmbuf_attach(struct rte_mbuf *mi, struct rte_mbuf *md) {... mi->priv_size = md->priv_size; ...}
> >>>>>>
> >>>>>> 3. rte_pktmbuf_detach() should restore original value of mbuf's priv_size:
> >>>>>> rte_pktmbuf_detach(struct rte_mbuf *m)
> >>>>>> {
> >>>>>>   ...
> >>>>>>    m->priv_size = rte_mempool_get_privsize(m->pool);
> >>>>>>    m->buf _addr= rte_mbuf_to_baddr(m);
> >>>>>>    ...
> >>>>>> }
> >>>>>>
> >>>>>> Also I think we need to provide a way to specify priv_size for all mbufs of the mepool at init time:
> >>>>>> - either force people to specify it at rte_mempool_create() time (probably use init_arg for that),
> >>>>>> - or provide separate function that could be called straight after rte_mempool_create() , that
> >>>>>> would setup priv_size for the  pool and for all its mbufs.
> >>>>>> - or some sort of combination of these 2 approaches - introduce a wrapper function
> >>>>>> (rte_mbuf_pool_create() or something) that would take priv_size as parameter,
> >>>>>> create a new mempool and then setup priv_size.
> >>>>
> >>>> I though a bit more about this solution, and I realized that doing
> >>>> mi->priv_size = md->priv_size in rte_pktmbuf_attach() is probably not
> >>>> a good idea, as there is no garantee that the size of mi is large enough
> >>>> to store the priv of md.
> >>>>
> >>>> Having the same priv_size for mi and md is maybe a good constraint.
> >>>> I can add this in the API comments and assertions in the code to
> >>>> check this condition, what do you think?
> >>>
> >>> Probably we have a different concepts of what is mbuf's  private space in mind.
> >>> From my point, even indirect buffer should use it's own private space and
> >>> leave contents of direct mbuf it attached to unmodified.
> >>> After attach() operation, only contents of the buffer are shared between mbufs,
> >>> but not the mbuf's metadata.
> >>
> >> Sorry if it was not clear in my previous messages, but I agree
> >> with your description. When attaching a mbuf, only data, not
> >> metadata should be shared.
> >>
> >> In the solution you are suggesting (quoted above), you say we need
> >> to set mi->priv_size to md->priv_size in rte_pktmbuf_attach(). I felt
> >> this was not possible, but it depends on the meaning we give to
> >> priv_size:
> >>
> >> 1. If the meaning is "the size of the private data embedded in this
> >>    mbuf", which is the most logical meaning, we cannot do this
> >>    affectation
> >>
> >> 2. If the meaning is "the size of the private data embedded in the
> >>    mbuf the buf_addr is pointing to" (which is harder to get), the
> >>    affectation makes sense.
> >>
> >> From what I understand, you feel we should use 2/ as priv_size
> >> definition. Is it correct?
> >
> > Yes, I meant 2.
> > From my point priv_size inside mbuf is more like 'buf_ofs'.
> > It's main purpose is for internal use - to help our mbuf manipulation routinies
> > (attach/detach/free) to work correctly.
> > If the user wants to query size of private space for the mbuf, he probably should
> > use the value from mempool.
> 
> Agree.
> 
> 
> >> In my previous message, the definition of m->priv_size was 1/,
> >> so that's why I felt assigning mi->priv_size to md->priv_size was
> >> not possible.
> >>
> >> I agree 2/ is probably a good choice, as it would allow to attach
> >> to a mbuf with a different priv_size. It may require some additional
> >> comments above the field in the structure to explain that.
> >>
> >>
> >>> Otherwise on detach(), you'll have to copy contents of private space back, from direct to indirect mbuf?
> >>> Again how to deal with the case, when 2 or more mbufs will attach to the same direct one?
> >>>
> >>> So let say, if we'll have a macro:
> >>>
> >>> #define RTE_MBUF_PRIV_PTR(mb)	((void *)((struct rte_mbuf *)(mb)) + 1))
> >>>
> >>> No matter is mb  a direct or indirect mbuf.
> >>> Do you have something else in mind here?
> >>
> >> I completely agree with this macro. We should consider the private data
> >> as an extension of the mbuf structure.
> >>
> >>
> >>>>> Introducing rte_mbuf_pool_create() seems a good idea to me, it
> >>>>> would hide 'rte_pktmbuf_pool_private' from the user and force
> >>>>> to initialize all the required fields (mbuf_data_room_size only
> >>>>> today, and maybe mbuf_priv_size).
> >>>>>
> >>>>> The API would be:
> >>>>>
> >>>>> struct rte_mempool *
> >>>>> rte_mbuf_pool_create(const char *name, unsigned n, unsigned elt_size,
> >>>>> 	unsigned cache_size, size_t mbuf_priv_size,
> >>>>> 	rte_mempool_obj_ctor_t *obj_init, void *obj_init_arg,
> >>>>> 	int socket_id, unsigned flags)
> >>>>>
> >>>>> I can give it a try and send a patch for this.
> >>>>
> >>>> About this, it is not required anymore for this patch series if we
> >>>> agree with my comment above.
> >>>
> >>> I still think we need some way to setup priv_size on a per-mempool basis.
> >>> Doing that in rte_mbuf_pool_create() seems like a good thing to me.
> >>> Not sure, why you decided to drop it?
> >>
> >> I think we can already do it without changing the API by providing
> >> our own rte_pktmbuf_init and rte_pktmbuf_pool_init.
> >>
> >> rte_pktmbuf_init() has to set:
> >>   m->buf_len = mp->elt_size - sizeof(struct mbuf);
> >>   m->priv_size = sizeof(struct mbuf) - sizeof(struct rte_mbuf);
> >>
> >> rte_pktmbuf_pool_init() has to set:
> >>   /* we can use the default function */
> >>   mbp_priv->mbuf_data_room_size = MBUF_RXDATA_SIZE +
> >>   	RTE_PKTMBUF_HEADROOM;
> >
> > Yeh, when  arg==NULL for rte_pktmbuf_pool_init() we always set up
> > mbuf_data_room_size to the hardcoded value.
> > Which always looked strange to me.
> > I think it should be set to:
> > mp->elt_size - sizeof(struct rte_mbuf) - RTE_PKTMBUF_HEADROOM;
> > for that case.
> 
> Yes, that would make more sense instead of the hardcoded value.
> But I'm not sure it should be part of this series as the clone
> patches won't change this behavior. I would prefer to have it in
> another series that reworks mbuf pool initialization. I can also
> work on it.
> 
> On the other hand if you really feel this patch is needed in
> this series, it's not a problem as it's a one-liner.
> 
> >
> >>
> >> In this case, it is possible to calculate the mbuf_priv_size only
> >> from the pool object:
> >>
> >>   mbuf_priv_size = pool->elt_size - RTE_PKTMBUF_HEADROOM -
> >> 	pool_private->mbuf_data_room_size -
> >> 	sizeof(rte_mbuf)
> >>
> >
> > So if I understand your idea correctly:
> > If  second parameter for rte_pktmbuf_pool_init() is NULL, then
> > we setup
> >
> > mbp_priv->mbuf_data_room_size = mp->elt_size - sizeof(struct rte_mbuf) - RTE_PKTMBUF_HEADROOM;
> >
> > Which means that  priv_size ==0  for all  mbufs in the pool
> > Otherwise we setup:
> >
> >  mbp_priv->mbuf_data_room_size = opaque_arg;
> >
> > As we are doing now, and priv_size for all mbufs in the pool will be:
> > pool->elt_size - pool_private->mbuf_data_room_size - sizeof(rte_mbuf);
> >
> > And in that case, all users have to do to specify priv_size for mempool is to pass a proper argument
> > for rte_pktmbuf_pool_init() at  mempool_create().
> > Correct?
> 
> Correct.
> 
> 
> 
> >
> >>
> >> I agree it's not ideal, but I think the mbuf pool initialization
> >> is another problem. That's why I suggested to change this in a
> >> separate series that will add rte_mbuf_pool_create() with the
> >> API described above. Thoughts?
> >>
> >
> > I also put answers to another mail below.
> > Just to keep all discussion in one place.
> >
> >>> Though, I still think that the better approach is to reserve private space before struct rte_mbuf, not after.
> >>> In that case, user got his private space, and we keep buf_addr straight after  rte_mbuf, without any whole.
> >>> So we don't need steps 2 and 3, above,
> >>> plus we don't need rte_mbuf_to_baddr() and rte_mbuf_from_indirect() -
> >>> RTE_MBUF_TO_BADDR/ RTE_MBUF_FROM_BADDR would keep working correctly.
> >>> In fact, with this scheme - we don't even need priv_size for mbuf management (attach/detach/free).
> >>>
> >>> Wonder did you try that approach?
> >>
> >> Yes, I though about this approach too. But I think it would require
> >> more changes. When an application or a driver allocate a mbuf with
> >> mempool_get(), it expects that the returned pointer is the rte_mbuf *.
> >
> > Yep, sure it will still get the pointer to the rte_mbuf *.
> > Though later, if upper layer would like to convert from  rte_mbuf* to app_specific_mbuf *,
> > it would have to use a macro:
> >
> > #define RTE_MBUF_TO_PRIV(m)	((void *)((uintptr_t)(m) - (m)->priv_size))
> >
> >>
> >> With this solution, there are 2 options:
> >> - no mempool modification, so each application/driver has to add
> >>   priv_size bytes to the object to get the mbuf pointer. This does not
> >>   look feasible.
> >> - change the mempool to support private area before each object. I
> >>   think mempool API is already quite complex, and I think it's not
> >>   the role of the mempool library to provide such features.
> >
> >
> > In fact, I think the changes would be minimal.
> > All we have to do, is to make changes in rte_pktmbuf_init():
> >
> > void
> > rte_pktmbuf_init(struct rte_mempool *mp,
> >                  __attribute__((unused)) void *opaque_arg,
> >                  void *_m,
> >                  __attribute__((unused)) unsigned i)
> > {
> >
> >      //extract priv_size from mempool   (discussed above).
> >       uint16_t priv_size = rte_mbufpool_get_priv_size(mp);
> >
> >       struct rte_mbuf *m = _m + priv_size;
> >       uint32_t buf_len = mp->elt_size - sizeof(struct rte_mbuf) - priv_size;
> >
> > ....
> >
> >
> > With that way priv_size inside mbuf would always be the size its own private space,
> > for both direct and indirect mbus., so we don't require to set priv_size at attach/detach.
> > Again RTE_MBUF_TO_BADDR() and RTE_MBUF_FROM_BADDR() would just work without any modifications.
> 
> I'm not sure I'm getting it. The argument '_m' of your
> rte_pktmbuf_init() is the pointer to the priv data, right?
> So it means that the mbuf is located priv_size bytes after.
> 
> The rte_pktmbuf_init() function is called by mempool_create(),
> and the _m parameter is a pointer to a mempool object. So
> in my understanding, mempool_get() would not return a rte_mbuf
> but a pointer to the application private data.

Ah my bad, forgot that mempool's obj_init() returns void now :(
To make this approach work also need to change it, so it return a pointer to the new object. 
So, rte_pktmbuf_init() would return m and then in mempool_add_elem():

if (obj_init)
                obj = obj_init(mp, obj_init_arg, obj, obj_idx);

rte_ring_sp_enqueue(mp->ring, obj); 

Konstantin

> 
> 
> Regards,
> Olivier
> 
> 
> 
> >
> > Konstantin
> >
> >



More information about the dev mailing list