[dpdk-dev] [PATCH v3 1/5] mbuf: fix clone support when application uses private mbuf data

Ananyev, Konstantin konstantin.ananyev at intel.com
Tue Apr 7 19:17:01 CEST 2015


Hi Olivier,

> -----Original Message-----
> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
> Sent: Tuesday, April 07, 2015 4:46 PM
> To: Ananyev, Konstantin; dev at dpdk.org
> Cc: zoltan.kiss at linaro.org; Richardson, Bruce
> Subject: Re: [PATCH v3 1/5] mbuf: fix clone support when application uses private mbuf data
> 
> Hi Konstantin,
> 
> On 04/07/2015 02:40 PM, Ananyev, Konstantin wrote:
> > Hi Olivier,
> >
> >> -----Original Message-----
> >> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
> >> Sent: Monday, April 06, 2015 10:50 PM
> >> To: Ananyev, Konstantin; dev at dpdk.org
> >> Cc: zoltan.kiss at linaro.org; Richardson, Bruce
> >> Subject: Re: [PATCH v3 1/5] mbuf: fix clone support when application uses private mbuf data
> >>
> >> Hi Konstantin,
> >>
> >> Thanks for your comments.
> >>
> >> On 04/02/2015 07:21 PM, Ananyev, Konstantin wrote:
> >>> Hi Olivier,
> >>>
> >>>> -----Original Message-----
> >>>> From: Olivier Matz [mailto:olivier.matz at 6wind.com]
> >>>> Sent: Tuesday, March 31, 2015 8:23 PM
> >>>> To: dev at dpdk.org
> >>>> Cc: Ananyev, Konstantin; zoltan.kiss at linaro.org; Richardson, Bruce; Olivier Matz
> >>>> Subject: [PATCH v3 1/5] mbuf: fix clone support when application uses private mbuf data
> >>>>
> >>>> From: Olivier Matz <olivier.matz at 6wind.com>
> >>>>
> >>>> Add a new private_size field in mbuf structure that should
> >>>> be initialized at mbuf pool creation. This field contains the
> >>>> size of the application private data in mbufs.
> >>>>
> >>>> Introduce new static inline functions rte_mbuf_from_indirect()
> >>>> and rte_mbuf_to_baddr() to replace the existing macros, which
> >>>> take the private size in account when attaching and detaching
> >>>> mbufs.
> >>>>
> >>>> Signed-off-by: Olivier Matz <olivier.matz at 6wind.com>
> >>>> ---
> >>>>   app/test-pmd/testpmd.c     |  1 +
> >>>>   examples/vhost/main.c      |  4 +--
> >>>>   lib/librte_mbuf/rte_mbuf.c |  1 +
> >>>>   lib/librte_mbuf/rte_mbuf.h | 77 +++++++++++++++++++++++++++++++++++-----------
> >>>>   4 files changed, 63 insertions(+), 20 deletions(-)
> >>>>
> >>>> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
> >>>> index 3057791..c5a195a 100644
> >>>> --- a/app/test-pmd/testpmd.c
> >>>> +++ b/app/test-pmd/testpmd.c
> >>>> @@ -425,6 +425,7 @@ testpmd_mbuf_ctor(struct rte_mempool *mp,
> >>>>   	mb->tx_offload   = 0;
> >>>>   	mb->vlan_tci     = 0;
> >>>>   	mb->hash.rss     = 0;
> >>>> +	mb->priv_size    = 0;
> >>>>   }
> >>>>
> >>>>   static void
> >>>> diff --git a/examples/vhost/main.c b/examples/vhost/main.c
> >>>> index c3fcb80..e44e82f 100644
> >>>> --- a/examples/vhost/main.c
> >>>> +++ b/examples/vhost/main.c
> >>>> @@ -139,7 +139,7 @@
> >>>>   /* Number of descriptors per cacheline. */
> >>>>   #define DESC_PER_CACHELINE (RTE_CACHE_LINE_SIZE / sizeof(struct vring_desc))
> >>>>
> >>>> -#define MBUF_EXT_MEM(mb)   (RTE_MBUF_FROM_BADDR((mb)->buf_addr) != (mb))
> >>>> +#define MBUF_EXT_MEM(mb)   (rte_mbuf_from_indirect(mb) != (mb))
> >>>>
> >>>>   /* mask of enabled ports */
> >>>>   static uint32_t enabled_port_mask = 0;
> >>>> @@ -1550,7 +1550,7 @@ attach_rxmbuf_zcp(struct virtio_net *dev)
> >>>>   static inline void pktmbuf_detach_zcp(struct rte_mbuf *m)
> >>>>   {
> >>>>   	const struct rte_mempool *mp = m->pool;
> >>>> -	void *buf = RTE_MBUF_TO_BADDR(m);
> >>>> +	void *buf = rte_mbuf_to_baddr(m);
> >>>>   	uint32_t buf_ofs;
> >>>>   	uint32_t buf_len = mp->elt_size - sizeof(*m);
> >>>>   	m->buf_physaddr = rte_mempool_virt2phy(mp, m) + sizeof(*m);
> >>>> diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
> >>>> index 526b18d..e095999 100644
> >>>> --- a/lib/librte_mbuf/rte_mbuf.c
> >>>> +++ b/lib/librte_mbuf/rte_mbuf.c
> >>>> @@ -125,6 +125,7 @@ rte_pktmbuf_init(struct rte_mempool *mp,
> >>>>   	m->pool = mp;
> >>>>   	m->nb_segs = 1;
> >>>>   	m->port = 0xff;
> >>>> +	m->priv_size = 0;
> >>>
> >>> Why it is 0?
> >>> Shouldn't it be the same calulations as in detach() below:
> >>> m->priv_size = /*get private size from mempool private*/;
> >>> m->buf_addr = (char *)m + sizeof(struct rte_mbuf) + m->priv_size;
> >>> m->buf_len = mp->elt_size - sizeof(struct rte_mbuf) - m->priv_size;
> >>> ?
> >>
> >> It's 0 because we also have in the function (not visible in the
> >> patch):
> >>
> >>    m->buf_addr = (char *)m + sizeof(struct rte_mbuf);
> >
> > Yep, that's why as I wrote above, I think we need to setup here all 3 fields:
> > priv_size, buf_addr, buf_len exactly in the same way as in detach().
> >
> >>
> >> It means that an application that wants to use a private area has
> >> to provide another init function derived from this default function.
> >
> > After your changes, attach/free and other functions from public mbuf API
> > rely on priv_size being set properly.
> > So I suppose 'official' pktmbuf_init() should also set it in a proper manner.
> >
> >> This was already the case before the patch series.
> >
> > Before this patch series, we don't have priv_size, so we have nothing to setup.
> >
> >>
> >> As we discussed in previous mail, I plan to propose a rework of
> >> mbuf pool initialization in another series, and my initial idea was to
> >> change this at the same time. But on the other hand it does not hurt
> >> to do this change now. I'll include it in next version.
> >
> > Ok.
> 
> Just to be sure we're on the same line:
> 
> - before the patch series
> 
>    - private area was working before that patch series if clones were not
>      used. To use a private are, the user had to provide another
>      function derived from pktmbuf_init() to change m->buf_addr and
>      m->buf_len.
>    - using both private area + clones was broken
> 
> - after the patch series
> 
>    - private area is working with or without clone. But yo use it,
>      the user still has to provide another function to change
>      m->buf_addr, m->buf_len *and m->priv_size*.
> 
> The series just fixes the fact that "clones + priv" was not working.
> It does not address the problem that providing a new pktmbuf_init()
> function is required to use privata area. To fix this, I think it
> could require a API evolution that should be part of another series.

I don't think we need new pktmbuf_init().
We just need to update it, so both pktmbuf_init() and detach() setup
buf_addr, buf_len (and priv_size) to exactly the same values.
If they don't do that, it means that you can't use attach/detach with
mempools created with pktmbuf_init() any more.

BTW, another thing that I just realised:
examples/ipv4_multicast and examples/ip_fragmentation/ -
both create a pool of mbufs with elem_size < 2K and don't populate mempool's private area -
so mbp_priv->mbuf_data_room_size == 0, for them. 

So that code in detach():

 +	mbp_priv = rte_mempool_get_priv(mp);
 +	m->priv_size = mp->elt_size - RTE_PKTMBUF_HEADROOM -
 +		mbp_priv->mbuf_data_room_size -
 +		sizeof(struct rte_mbuf);


Would break both these samples.
I suppose we need to handle situation when mp->elt_size < RTE_PKTMBUF_HEADROOM + sizeof(struct rte_mbuf),
(and probably also when mbuf_data_room_size == 0) correctly. 

Konstantin


> 
> I'll send a v4 addressing the comments soon, thanks.
> 
> Regards,
> Olivier
> 
> 
> 
> >
> >>
> >>
> >>> BTW, don't see changes in rte_pktmbuf_pool_init() to setup
> >>> mbp_priv->mbuf_data_room_size properly.
> >>> Without that changes, how can people start using that feature?
> >>> It seems that the only way now - setup priv_size and buf_len for each mbuf manually.
> >>
> >> It's the same reason than above. To use a private are, the user has
> >> to provide its own function that sets up data_room_size, derived from
> >> this pool_init default function. This was also the case before the
> >> patch series.
> >>
> >>
> >>>
> >>>>   }
> >>>>
> >>>>   /* do some sanity checks on a mbuf: panic if it fails */
> >>>> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> >>>> index 17ba791..932fe58 100644
> >>>> --- a/lib/librte_mbuf/rte_mbuf.h
> >>>> +++ b/lib/librte_mbuf/rte_mbuf.h
> >>>> @@ -317,18 +317,51 @@ struct rte_mbuf {
> >>>>   			/* uint64_t unused:8; */
> >>>>   		};
> >>>>   	};
> >>>> +
> >>>> +	/** Size of the application private data. In case of an indirect
> >>>> +	 * mbuf, it stores the direct mbuf private data size. */
> >>>> +	uint16_t priv_size;
> >>>>   } __rte_cache_aligned;
> >>>>
> >>>>   /**
> >>>> - * Given the buf_addr returns the pointer to corresponding mbuf.
> >>>> + * Return the mbuf owning the data buffer address of an indirect mbuf.
> >>>> + *
> >>>> + * @param mi
> >>>> + *   The pointer to the indirect mbuf.
> >>>> + * @return
> >>>> + *   The address of the direct mbuf corresponding to buffer_addr.
> >>>>    */
> >>>> -#define RTE_MBUF_FROM_BADDR(ba)     (((struct rte_mbuf *)(ba)) - 1)
> >>>> +static inline struct rte_mbuf *
> >>>> +rte_mbuf_from_indirect(struct rte_mbuf *mi)
> >>>> +{
> >>>> +       struct rte_mbuf *md;
> >>>> +
> >>>> +       /* mi->buf_addr and mi->priv_size correspond to buffer and
> >>>> +	* private size of the direct mbuf */
> >>>> +       md = (struct rte_mbuf *)((char *)mi->buf_addr - sizeof(*mi) -
> >>>> +	       mi->priv_size);
> >>>
> >>> (uintptr_t)mi->buf_addr?
> >>
> >> Any clue why (uintptr_t) would be better than (char *) ?
> >
> > No big difference really, just looks a bit better to me :)
> >
> >> By the way, I added this cast because it would not compile with
> >> g++ (and probably with icc too).
> >>
> >>>
> >>>> +       return md;
> >>>> +}
> >>>>
> >>>>   /**
> >>>> - * Given the pointer to mbuf returns an address where it's  buf_addr
> >>>> - * should point to.
> >>>> + * Return the buffer address embedded in the given mbuf.
> >>>> + *
> >>>> + * The user must ensure that m->priv_size corresponds to the
> >>>> + * private size of this mbuf, which is not the case for indirect
> >>>> + * mbufs.
> >>>> + *
> >>>> + * @param md
> >>>> + *   The pointer to the mbuf.
> >>>> + * @return
> >>>> + *   The address of the data buffer owned by the mbuf.
> >>>>    */
> >>>> -#define RTE_MBUF_TO_BADDR(mb)       (((struct rte_mbuf *)(mb)) + 1)
> >>>> +static inline char *
> >>>
> >>> Might be better to return 'void *' here.
> >>
> >> Ok, as m->buf_addr is a (void *).
> >>
> >>>
> >>>> +rte_mbuf_to_baddr(struct rte_mbuf *md)
> >>>> +{
> >>>> +       char *buffer_addr;
> >>>
> >>> uintptr_t buffer_addr?
> >>
> >> Same question than above, I don't really see why it's better than
> >> (char *).
> >>
> >>>
> >>>> +       buffer_addr = (char *)md + sizeof(*md) + md->priv_size;
> >>>> +       return buffer_addr;
> >>>> +}
> >>>>
> >>>>   /**
> >>>>    * Returns TRUE if given mbuf is indirect, or FALSE otherwise.
> >>>> @@ -688,6 +721,7 @@ static inline struct rte_mbuf *rte_pktmbuf_alloc(struct rte_mempool *mp)
> >>>>
> >>>>   /**
> >>>>    * Attach packet mbuf to another packet mbuf.
> >>>> + *
> >>>>    * After attachment we refer the mbuf we attached as 'indirect',
> >>>>    * while mbuf we attached to as 'direct'.
> >>>>    * Right now, not supported:
> >>>> @@ -701,7 +735,6 @@ static inline struct rte_mbuf *rte_pktmbuf_alloc(struct rte_mempool *mp)
> >>>>    * @param md
> >>>>    *   The direct packet mbuf.
> >>>>    */
> >>>> -
> >>>>   static inline void rte_pktmbuf_attach(struct rte_mbuf *mi, struct rte_mbuf *md)
> >>>>   {
> >>>>   	RTE_MBUF_ASSERT(RTE_MBUF_DIRECT(md) &&
> >>>> @@ -712,6 +745,7 @@ static inline void rte_pktmbuf_attach(struct rte_mbuf *mi, struct rte_mbuf *md)
> >>>>   	mi->buf_physaddr = md->buf_physaddr;
> >>>>   	mi->buf_addr = md->buf_addr;
> >>>>   	mi->buf_len = md->buf_len;
> >>>> +	mi->priv_size = md->priv_size;
> >>>>
> >>>>   	mi->next = md->next;
> >>>>   	mi->data_off = md->data_off;
> >>>> @@ -732,7 +766,8 @@ static inline void rte_pktmbuf_attach(struct rte_mbuf *mi, struct rte_mbuf *md)
> >>>>   }
> >>>>
> >>>>   /**
> >>>> - * Detach an indirect packet mbuf -
> >>>> + * Detach an indirect packet mbuf.
> >>>> + *
> >>>>    *  - restore original mbuf address and length values.
> >>>>    *  - reset pktmbuf data and data_len to their default values.
> >>>>    *  All other fields of the given packet mbuf will be left intact.
> >>>> @@ -740,22 +775,28 @@ static inline void rte_pktmbuf_attach(struct rte_mbuf *mi, struct rte_mbuf *md)
> >>>>    * @param m
> >>>>    *   The indirect attached packet mbuf.
> >>>>    */
> >>>> -
> >>>>   static inline void rte_pktmbuf_detach(struct rte_mbuf *m)
> >>>>   {
> >>>> -	const struct rte_mempool *mp = m->pool;
> >>>> -	void *buf = RTE_MBUF_TO_BADDR(m);
> >>>> -	uint32_t buf_len = mp->elt_size - sizeof(*m);
> >>>> -	m->buf_physaddr = rte_mempool_virt2phy(mp, m) + sizeof (*m);
> >>>> -
> >>>> +	struct rte_pktmbuf_pool_private *mbp_priv;
> >>>> +	struct rte_mempool *mp = m->pool;
> >>>> +	void *buf;
> >>>> +	unsigned mhdr_size;
> >>>> +
> >>>> +	/* first, restore the priv_size, this is needed before calling
> >>>> +	 * rte_mbuf_to_baddr() */
> >>>> +	mbp_priv = rte_mempool_get_priv(mp);
> >>>> +	m->priv_size = mp->elt_size - RTE_PKTMBUF_HEADROOM -
> >>>> +		mbp_priv->mbuf_data_room_size -
> >>>> +		sizeof(struct rte_mbuf);
> >>>
> >>> I think it is better to put this priv_size calculation above into the separate function -
> >>> rte_mbuf_get_priv_size(m) or something.
> >>> We need it in few places, and users would probably need it anyway.
> >>
> >> yep, good idea
> >>
> >>>
> >>>> +
> >>>> +	buf = rte_mbuf_to_baddr(m);
> >>>> +	mhdr_size = (char *)buf - (char *)m;
> >>>
> >>> Why do you need to recalculate mhdr_size here?
> >>> As I understand it is a m->priv_size, and you just retrieved it, 2 lines above.
> >>>
> >>
> >> It's not m->priv_size but (sizeof(rte_mbuf) + m->priv_size).
> >
> > Ah yes, sorry for confusion.
> >
> >> In both case, it requires an operation, but maybe
> >>    mhdr_size = (sizeof(rte_mbuf) + m->priv_size)
> >> is clearer than
> >>    mhdr_size = (char *)buf - (char *)m
> >>
> >>
> >>>> +	m->buf_physaddr = rte_mempool_virt2phy(mp, m) + mhdr_size;
> >>>
> >>> Actually I think could just be:
> >>> m->buf_physaddr = rte_mempool_virt2phy(mp, buf);
> >>
> >> Even if it would work, the API of rte_mempool_virt2phy()
> >> says that the second argument should be "A pointer (virtual address)
> >> to the element of the pool."
> >> I think we should keep the initial code.
> >
> > Ok.
> > Konstantin
> >
> >>
> >> Regards,
> >> Olivier
> >>
> >



More information about the dev mailing list