[dpdk-dev] [PATCH v1 13/15] mempool: add support to non-EAL thread
    Ananyev, Konstantin 
    konstantin.ananyev at intel.com
       
    Thu Jan 22 15:04:39 CET 2015
    
    
  
Hi Miroslaw,
> -----Original Message-----
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Walukiewicz, Miroslaw
> Sent: Thursday, January 22, 2015 12:45 PM
> To: Liang, Cunming; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v1 13/15] mempool: add support to non-EAL thread
> 
> 
> 
> > -----Original Message-----
> > From: Liang, Cunming
> > Sent: Thursday, January 22, 2015 1:20 PM
> > To: Walukiewicz, Miroslaw; dev at dpdk.org
> > Subject: RE: [dpdk-dev] [PATCH v1 13/15] mempool: add support to non-EAL
> > thread
> >
> >
> >
> > > -----Original Message-----
> > > From: Walukiewicz, Miroslaw
> > > Sent: Thursday, January 22, 2015 5:53 PM
> > > To: Liang, Cunming; dev at dpdk.org
> > > Subject: RE: [dpdk-dev] [PATCH v1 13/15] mempool: add support to non-
> > EAL
> > > thread
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Cunming Liang
> > > > Sent: Thursday, January 22, 2015 9:17 AM
> > > > To: dev at dpdk.org
> > > > Subject: [dpdk-dev] [PATCH v1 13/15] mempool: add support to non-EAL
> > > > thread
> > > >
> > > > For non-EAL thread, bypass per lcore cache, directly use ring pool.
> > > > It allows using rte_mempool in either EAL thread or any user pthread.
> > > > As in non-EAL thread, it directly rely on rte_ring and it's none preemptive.
> > > > It doesn't suggest to run multi-pthread/cpu which compete the
> > > > rte_mempool.
> > > > It will get bad performance and has critical risk if scheduling policy is RT.
> > > >
> > > > Signed-off-by: Cunming Liang <cunming.liang at intel.com>
> > > > ---
> > > >  lib/librte_mempool/rte_mempool.h | 18 +++++++++++-------
> > > >  1 file changed, 11 insertions(+), 7 deletions(-)
> > > >
> > > > diff --git a/lib/librte_mempool/rte_mempool.h
> > > > b/lib/librte_mempool/rte_mempool.h
> > > > index 3314651..4845f27 100644
> > > > --- a/lib/librte_mempool/rte_mempool.h
> > > > +++ b/lib/librte_mempool/rte_mempool.h
> > > > @@ -198,10 +198,12 @@ struct rte_mempool {
> > > >   *   Number to add to the object-oriented statistics.
> > > >   */
> > > >  #ifdef RTE_LIBRTE_MEMPOOL_DEBUG
> > > > -#define __MEMPOOL_STAT_ADD(mp, name, n) do {
> > 	\
> > > > -		unsigned __lcore_id = rte_lcore_id();		\
> > > > -		mp->stats[__lcore_id].name##_objs += n;		\
> > > > -		mp->stats[__lcore_id].name##_bulk += 1;		\
> > > > +#define __MEMPOOL_STAT_ADD(mp, name, n) do {                    \
> > > > +		unsigned __lcore_id = rte_lcore_id();           \
> > > > +		if (__lcore_id < RTE_MAX_LCORE) {               \
> > > > +			mp->stats[__lcore_id].name##_objs += n;	\
> > > > +			mp->stats[__lcore_id].name##_bulk += 1;	\
> > > > +		}                                               \
> > > >  	} while(0)
> > > >  #else
> > > >  #define __MEMPOOL_STAT_ADD(mp, name, n) do {} while(0)
> > > > @@ -767,8 +769,9 @@ __mempool_put_bulk(struct rte_mempool *mp,
> > > > void * const *obj_table,
> > > >  	__MEMPOOL_STAT_ADD(mp, put, n);
> > > >
> > > >  #if RTE_MEMPOOL_CACHE_MAX_SIZE > 0
> > > > -	/* cache is not enabled or single producer */
> > > > -	if (unlikely(cache_size == 0 || is_mp == 0))
> > > > +	/* cache is not enabled or single producer or none EAL thread */
> > >
> > > I don't understand this limitation.
> > >
> > > I see that the rte_membuf.h defines table per RTE_MAX_LCORE like below
> > > #if RTE_MEMPOOL_CACHE_MAX_SIZE > 0
> > >         /** Per-lcore local cache. */
> > >         struct rte_mempool_cache local_cache[RTE_MAX_LCORE];
> > > #endif
> > >
> > > But why we cannot extent the size of the local cache table to something
> > like
> > > RTE_MAX_THREADS that does not exceed max value of rte_lcore_id()
> > >
> > > Keeping this condition here is a  real performance killer!!!!!!.
> > > I saw in my test application spending more 95% of CPU time reading the
> > atomic
> > > in M C/MP ring utilizing access to mempool.
> > [Liang, Cunming] This is the first step to make it work.
> > By Konstantin's comments, shall prevent to allocate unique id by ourselves.
> > And the return value from gettid() is too large as an index.
> > For non-EAL thread performance gap, will think about additional fix patch
> > here.
> > If care about performance, still prefer to choose EAL thread now.
> 
> In previous patch you had allocation of the thread id on base of unique gettid() as number
> not a potential pointer as we can expect from implementation getid() from Linux or BSD.
I am really puzzled with your sentence above.
What ' potential pointer' you are talking about?
rte_lcore_id() - returns unsigned 32bit integer (as it always did).
_lcore_id for each EAL thread is assigned at rte_eal_init().
For the EAL thread  _lcore_id value is in interval [0, RTE_MAX_LCORE) and
it is up to the user to make sure that each _lcore_id is unique inside DPDK MultiProcess group.
That's all as it was before. 
What's new with Steve's patch:
1) At startup user can select a set of physical cpu(s) on which each EAL thread (lcore) can run.
>From explanation at  [PATCH v1 02/15]:
--lcores='1,2@(5-7),(3-5)@(0,2),(0,6)' means starting 7 EAL thread as below
lcore 0 runs on cpuset 0x41 (cpu 0,6)
lcore 1 runs on cpuset 0x2 (cpu 1)
lcore 2 runs on cpuset 0xe0 (cpu 5,6,7)
lcore 3,4,5 runs on cpuset 0x5 (cpu 0,2)
lcore 6 runs on cpuset 0x41 (cpu 0,6)
2) Non-EAL (dynamically created threads) will have their  _lcore_id set to LCORE_ID_ANY (==UINT32_MAX).
That allow us to distinguish them from EAL threads (lcores).
Make changes in DPDK functions, so that they will safely work within such threads too, but there could be some performance 
(rte_mempool get/put)  degradation comaring to EAL thread.  
> 
> The another problem is that we compare here int with some unique thread identifier.
We are not doing that here.
_lcore_id is unsigned int.
> How can you prevent that when implementation of gettid will change and unique thread identifier will be
> Less than RTE_MAX_LCORE and will be still unique.
We are not using gettid() to assign _lcore_id values.
And never did.
Please read the patch set carefully.
> 
> I think that your assumption will work for well-known operating systems but will be very unportable.
> 
> Regarding performance the DPDK can work efficiently in different environments including pthreads.
> You can imagine running DPDK from pthread application where affinity will be made by application.
> Effectiveness depends on application thread implementation comparable to EAL threads.
That was one of the goals of the patch: make all DPDK API to work with dynamically created threads.
So now it is safe to do:
static void *thread_func1(void *arg)
{
    rte_pktmbuf_free((struct rte_mbuf *)arg);
   return NULL;
}
pthread_t tid;
struct rte_mbuf *m = rte_pktmbuf_alloc(mp);
 
pthread_create(&tid, NULL, thread_func1, m);
As Steve pointed - if you are dependent on rte_mempool get/put performance inside your threads -
better to use EAL threads.
With current patch you can have up to RTE_MAX_LCORE such threads and can assign any cpu affinity for each of them.
Though, if you have an idea how to extend mempool cache to any number of dynamically created threads
(make cache for each mempool sort of TLS?), then sure we can discuss it.
But I suppose it should be a metter of separate patch.    
Konstantin
> 
> I think that this is a goal for this change.
> 
> > >
> > > Same comment for get operation below
> > >
> > > > +	if (unlikely(cache_size == 0 || is_mp == 0 ||
> > > > +		     lcore_id >= RTE_MAX_LCORE))
> > > >  		goto ring_enqueue;
> > > >
> > > >  	/* Go straight to ring if put would overflow mem allocated for cache
> > > > */
> > > > @@ -952,7 +955,8 @@ __mempool_get_bulk(struct rte_mempool *mp,
> > void
> > > > **obj_table,
> > > >  	uint32_t cache_size = mp->cache_size;
> > > >
> > > >  	/* cache is not enabled or single consumer */
> > > > -	if (unlikely(cache_size == 0 || is_mc == 0 || n >= cache_size))
> > > > +	if (unlikely(cache_size == 0 || is_mc == 0 ||
> > > > +		     n >= cache_size || lcore_id >= RTE_MAX_LCORE))
> > > >  		goto ring_dequeue;
> > > >
> > > >  	cache = &mp->local_cache[lcore_id];
> > > > --
> > > > 1.8.1.4
    
    
More information about the dev
mailing list