[dpdk-dev] [PATCH] rte_mbuf: scattered pktmbufs freeing optimization

Olivier MATZ olivier.matz at 6wind.com
Fri Feb 27 14:20:58 CET 2015


Hi Vadim, Hi Konstantin,

On 02/27/2015 01:18 PM, Vadim Suraev wrote:
> Hi, Konstantin,
> 
>> Seems really useful.

Indeed, this function looks useful, and I also have a work in progress
on this topic, but currently it is not well tested.

As we are on the mbuf subject, for 2.1, I would like to discuss some
points of mbuf API:

- support mbuf and segment bulk allocation/free
- clarify the difference between raw_alloc/raw_free and
  mempool_get/mempool_put: For instance, I think that the reference
  counter initialization should be managed by rte_pktmbuf_reset() like
  the rest of the fields, therefore raw_alloc/raw_free could be replaced
  by mempool functions (this would imply some modifications in drivers).
- clarify which functions are internal and which are not. Some could be
  made public and hence should be documented: from what I see, the
  vhost example currently uses internal APIs and I suspect it may
  not work when MBUF_DEBUG=y.
- factorize rte_rxmbuf_alloc() which is duplicated in almost any driver
- discuss if driver should be calling pktmbuf_reset().
  It would avoid some bugs where fields are not properly initialized.
- check if rte_pktmbuf_free() is called on mbufs with m->next or
  refcount not initialized
- support clones of clones
- support attach/detach of mbufs embedding a private part
- move m->next in the first cache line as it's initialized in rx
- rename "__rte_pktmbuf_prefree_seg" in something clearer like "decref".

To avoid doing the work twice, please note that I already have an
implementation for:
- rte_mbuf_raw_free_bulk(): each mbuf return in its own pool
- rte_mbuf_free_seg_bulk(): similar to call rte_pktmbuf_free_seg()
  on each mbuf
but they are not tested at all now.

I will submit a RFC series in the coming weeks that addresses these
issues. It could be used as a basis for a discussion.

Sorry Vadim for polluting the topic, but I think the subjects are
quite linked together. Please find some comments on your patch below.

>> One thought - why to introduce the limitation that all mbufs have to be
> from the same mempool?
>> I think you can reorder it a bit, so it can handle situation when chained
> mbufs belong to different mempools.
> 
> I had a doubt, my concern was how practical is that (multiple mempools)
> case?

I think having several mempool is common on multi-socket machines.

> Do you think there should be two versions: lightweight (with the
> restriction) and generic?

I think having the version that supports different mempools is more
important. Checking if the mempool is the same may not cost too much
compared to the rest of the work. But if we find a good use case
where having the lightweight version is preferable, we could have
2 versions (ex: if performance is really different).

>>> +/* This macro defines the size of max bulk of mbufs to free for
>> rte_pktmbuf_free_bulk */
>>> +#define MAX_MBUF_FREE_SIZE 32

I wonder if it's required to have this macro.

In my implementation, I use a dynamic-sized table:

static inline void
rte_mbuf_free_seg_bulk(struct rte_mbuf **m_table, unsigned n)
{
	void *obj_table[n];
[...]


>>> + *
>>> + * @param head
>>> + *   The head of mbufs to be freed chain
>>> + */
>>> +
>>> +static inline void __attribute__((always_inline))
>>> +rte_pktmbuf_free_bulk(struct rte_mbuf *head)

About the inlining, I have no objection now, although Stephen may be
right. I think we can consider un-inline some functions, based on
performance measurements.

>>> +{
>>> +    void *mbufs[MAX_MBUF_FREE_SIZE];
>>> +    unsigned mbufs_count = 0;
>>> +    struct rte_mbuf *next;
>>> +
>>> +    RTE_MBUF_MEMPOOL_CHECK1(head);
>>> +
>>> +    while(head) {
>>> +        next = head->next;
>>> +        head->next = NULL;
>>> +        if(__rte_pktmbuf_prefree_seg(head)) {
>>> +            RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(head) == 0);
>>> +            RTE_MBUF_MEMPOOL_CHECK2(head);
>>> +            mbufs[mbufs_count++] = head;
>>> +        }
>>> +        head = next;
>>> +        if(mbufs_count == MAX_MBUF_FREE_SIZE) {
>>> +            rte_mempool_put_bulk(((struct rte_mbuf
>> *)mbufs[0])->pool,mbufs,mbufs_count);
>>> +            mbufs_count = 0;
>>> +        }
>>> +    }
>>> +    if(mbufs_count > 0) {
>>> +        rte_mempool_put_bulk(((struct rte_mbuf
>> *)mbufs[0])->pool,mbufs,mbufs_count);
>>> +    }
>>> +}
>>> +

Also, I think a unit test should be added.

Thanks,
Olivier


More information about the dev mailing list