[dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore

Ananyev, Konstantin konstantin.ananyev at intel.com
Fri Jan 9 12:52:53 CET 2015



> -----Original Message-----
> From: Liang, Cunming
> Sent: Friday, January 09, 2015 9:41 AM
> To: Ananyev, Konstantin; Stephen Hemminger; Richardson, Bruce
> Cc: dev at dpdk.org
> Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> 
> 
> 
> > -----Original Message-----
> > From: Ananyev, Konstantin
> > Sent: Friday, January 09, 2015 1:06 AM
> > To: Liang, Cunming; Stephen Hemminger; Richardson, Bruce
> > Cc: dev at dpdk.org
> > Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> >
> >
> > Hi Steve,
> >
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Liang, Cunming
> > > Sent: Tuesday, December 23, 2014 9:52 AM
> > > To: Stephen Hemminger; Richardson, Bruce
> > > Cc: dev at dpdk.org
> > > Subject: Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Stephen Hemminger [mailto:stephen at networkplumber.org]
> > > > Sent: Tuesday, December 23, 2014 2:29 AM
> > > > To: Richardson, Bruce
> > > > Cc: Liang, Cunming; dev at dpdk.org
> > > > Subject: Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> > > >
> > > > On Mon, 22 Dec 2014 09:46:03 +0000
> > > > Bruce Richardson <bruce.richardson at intel.com> wrote:
> > > >
> > > > > On Mon, Dec 22, 2014 at 01:51:27AM +0000, Liang, Cunming wrote:
> > > > > > ...
> > > > > > > I'm conflicted on this one. However, I think far more applications would
> > be
> > > > > > > broken
> > > > > > > to start having to use thread_id in place of an lcore_id than would be
> > > > broken
> > > > > > > by having the lcore_id no longer actually correspond to a core.
> > > > > > > I'm actually struggling to come up with a large number of scenarios
> > where
> > > > it's
> > > > > > > important to an app to determine the cpu it's running on, compared to
> > the
> > > > large
> > > > > > > number of cases where you need to have a data-structure per thread.
> > In
> > > > DPDK
> > > > > > > libs
> > > > > > > alone, you see this assumption that lcore_id == thread_id a large
> > number
> > > > of
> > > > > > > times.
> > > > > > >
> > > > > > > Despite the slight logical inconsistency, I think it's better to avoid
> > > > introducing
> > > > > > > a thread-id and continue having lcore_id representing a unique thread.
> > > > > > >
> > > > > > > /Bruce
> > > > > >
> > > > > > Ok, I understand it.
> > > > > > I list the implicit meaning if using lcore_id representing the unique thread.
> > > > > > 1). When lcore_id less than RTE_MAX_LCORE, it still represents the logical
> > > > core id.
> > > > > > 2). When lcore_id large equal than RTE_MAX_LCORE, it represents an
> > unique
> > > > id for thread.
> > > > > > 3). Most of APIs(except rte_lcore_id()) in rte_lcore.h suggest to be used
> > only
> > > > in CASE 1)
> > > > > > 4). rte_lcore_id() can be used in CASE 2), but the return value no matter
> > > > represent a logical core id.
> > > > > >
> > > > > > If most of us feel it's acceptable, I'll prepare for the RFC v2 base on this
> > > > conclusion.
> > > > > >
> > > > > > /Cunming
> > > > >
> > > > > Sorry, I don't like that suggestion either, as having lcore_id values greater
> > > > > than RTE_MAX_LCORE is terrible, as how will people know how to
> > dimension
> > > > arrays
> > > > > to be indexes by lcore id? Given the choice, if we are not going to just use
> > > > > lcore_id as a generic thread id, which is always between 0 and
> > > > RTE_MAX_LCORE
> > > > > we can look to define a new thread_id variable to hold that. However, it
> > should
> > > > > have a bounded range.
> > > > > From an ease-of-porting perspective, I still think that the simplest option is
> > to
> > > > > use the existing lcore_id and accept the fact that it's now a thread id rather
> > > > > than an actual physical lcore. Question is, is would that cause us lots of
> > issues
> > > > > in the future?
> > > > >
> > > > > /Bruce
> > > >
> > > > The current rte_lcore_id() has different meaning the thread. Your proposal
> > will
> > > > break code that uses lcore_id to do per-cpu statistics and the lcore_config
> > > > code in the samples.
> > > > q
> > > [Liang, Cunming] +1.
> >
> > Few more thoughts on that subject:
> >
> > Actually one more place in the lib, where lcore_id is used (and it should be
> > unique):
> > rte_spinlock_recursive_lock() / rte_spinlock_recursive_trylock().
> > So if we going to replace lcore_id with thread_id as uniques thread index, then
> > these functions
> > have to be updated too.
> [Liang, Cunming] You're right, if deciding to use thread_id, we have to check and replace
> rte_lcore_id()/RTE_PER_LCORE(_lcore_id) on all the impact place.
> Now I'm buying the proposal to keep using rte_lcore_id() to return the
> unique id. Meanwhile I think it's necessary to have real cpu id.
> It's helpful in NUMA socket checking.
> I will provide new API rte_curr_cpu() to return the runtime cpu no matter
> the thread running in coremasked or non-coremasked cpu.
> So the socket info stored in lcore_config still useful  to choose the local socket.
> >
> > About maintaining our own unique thread_id inside shared memory
> > (_get_linear_tid()/_put_linear_tid()).
> > There is one thing that worries me with that approach:
> > In case of abnormal process termination, TIDs used by that process will remain
> > 'reserved'
> > and there is no way to know which TIDs were used by terminated process.
> > So there could be a situation with DPDK multi-process model,
> > when after secondary process abnormal termination, It wouldn't be possible to
> > restart it -
> > we just run out of 'free' TIDs.
> [Liang, Cunming] That's a good point I think. I think it's not only for thread id but
> for all the dynamic allocated resource (e.g. memzone, mempool).
> we haven't a garbage collection or heartbeat to process the secondary abnormal exit.

Of course some dynamically allocated meory could be unclaimed in that case.
But right now, at least you can restart the child process.
What I am saying - we probably better avoid managing our own TIDs dynamically at all.  

> 
> >
> > Which makes me think probably there is no need to introduce new globally
> > unique 'thread_id'?
> > Might be just lcore_id is enough?
> > As Mirek and Bruce suggested we can treat it a sort of 'unique thread id' inside
> > EAL.
> [Liang, Cunming] I think we'd better have two, one for 'unique thread id', one for real cpu id.
> No matter which of them are named lcore_id/thread_id/cpu_id and etc.

As I understand, the goal is to be a be to run multiple EAL threads on multiple physical cpus.
So each thread could run on multiple cpus, i.e - there would be no one to one match
between lcore_id(thread_id) and cpu_id.
That's why I think we  need to:  
Introduce rte_lcore_get_affinity(lcore_id) - that would return cpuset for given lcore.   
Update rte_lcore_to_socket_id(lcore_id) - it would check if all cpus that lcore is allegeable
to run belong to the same socket.
If yes that socket_id will be returned, if no SOCKET_ID_ANY.

> For cpu id, we need to check/get the NUMA info.
> Pthread may migrate from one core to another, the thread 'socket id' may change,
> The per cpu socket info we have them in lcore_config.
> 
> > Or as 'virtual' core id that can run on set of physical cpus, and these subsets for
> > different 'virtual' cores can intersect.
> > Then basically we can keep legacy behaviour with '-c <lcores_mask>,' where each
> > lcore_id matches one to one  with physical cpu, and introduce new one,
> > something like:
> > --
> > lcores='(<lcore_set1>)=(<phys_cpu_set1>),..(<lcore_setN)=(<phys_cpu_setN>)'.
> > So let say: --lcores=(0-7)=(0,2-4),(10)=(7),(8)=(all)' would mean:
> > Create 10 EAL threads, bind threads with clore_id=[0-7] to cpuset: <0,2,3,4>,
> > thread  with lcore_id=10 is binded to  cpu 7, and allow to run lcore_id=8 on any
> > cpu in the system.
> > Of course '-c' and '-lcores' would be mutually exclusive, and we will need to
> > update  rte_lcore_to_socket_id()
> > and introduce: rte_lcore_(set|get)_affinity().
> >
> > Does it make sense to you?
> [Liang, Cunming] If assign lcore_id during the command line, user have to handle
> the conflict for '-c' and '--lcores'.
> In this cases, if lcore_id 0~10 is occupied, the coremasked thread start from 11 ?

As I said above: " Of course '-c' and '-lcores' would be mutually exclusive".

> In case, application create a new pthread during the runtime.
> As there's no lcore id belongs to the new thread mentioned in the command line, it then still back to dynamic allocate.
> I means on the startup, user may have no idea of how much pthread they will run.

I think you are mixing 2 different tasks here:
1. Allow EAL threads (lcores) to be run run on set of physical cpus (not just one), and these subsets for
different lcores can be intersectable.
2. Allow dynamically created threads to call EAL functions (rte_mempool, rte_recursive_lock, rte_timer, etc).

My understanding was that our goal here is task #1.
For #1 - I think what I proposed above is enough.

Though, if our goal is #2 - it is a different story.
In that case, I think we shouldn't manage unique TID ourselves.
We are not OS, and on the app level it would quite complicated to implement it
in a robust way with current DPDK multi-process model.   
Another thing - with proposed implementation we still limiting number of allowed threads.
Instead of RTE_MAX_LCORES we just introduce RTE_MAX_THREADS.
So all problems with rte_mempool caches and rte_timers will remain. 

If we really need #2, then what we probably can do instead:

1. Rely on OS unique TID (linux gettid()).
2. Assign by default __lcore_id = -1, and set it up to the proper value only for EAL (lcore) threads.
3. Revise all usages of __lcore_id inside the lib and for each case:
  A) either change it  to use system wide unique TID (rte_recusive_spinlock)
  B) or update the code, so it can handle situation with __lcore_id == -1 

As I can see, right now the following code inside RTE libs use rte_clore_id():

1.	lib/librte_eal/common/eal_common_log.c
Uses rte_lcore_id() return value as index in static log_cur_msg[].

2.	lib/librte_eal/common/include/generic/rte_spinlock.h
Uses rte_lcore_id() return value as rte_spinlock_recursive.user.
Value -1 (LCORE_ID_ANY) is reserved to mark lock as unused.

3.	lib/librte_mempool/rte_mempool.h
Uses rte_lcore_id() return value as index in rte_mempool.local_cache[] and  inside rte_mempool.stats[].

4.	lib/librte_timer/rte_timer.c
Uses rte_lcore_id() return value as index in static struct priv_timer priv_timer[].
Also uses it as 16 bit owner filed inside union rte_timer_status. 
Again -1 is reserved value for RTE_TIMER_NO_OWNER.

5.	lib/librte_eal/common/include/rte_lcore.h
Inside rte_socket_id() uses rte_clore_id() return value as index in lcore_config[].

6.	lib/librte_ring/rte_ring.h
Uses rte_clore_id() return value as index in rte_ring.stats[].

case 2  is A), so I think we can use gettid() returned value instead of __lcore_id value here.  
All other cases looks like B) to me.

The easiest thing (at least as the first step) is just not add a check that __lcore_id < MAX_LCORE_ID.
case 3:  avoid mempool caching if __lcore_id >= MAX_LCORE_ID  
case 4: Allow to setup timers only for EAL (lcore) threads (__lcore_id < MAX_LCORE_ID).
E.g. - dynamically created thread will be able to start/stop timer for lcore thread,
but it will be not allowed to setup timer for itself or another non-lcore thread. 
rte_timer_manage() for non-lcore thread would simply do nothing and return straightway.
case 5:  just return SOCKET_ID_ANY if __lcore_id >= MAX_LCORE_ID.
case 6:  avoid stats[] update if __lcore_id >= MAX_LCORE_ID.

That way user can create as many threads as he wants dynamically and still should be able to use EAL functions inside them.
Of course for that, the problem that Olivier mentioned with thread pre-emption in the middle of ring enqueue/dequeue
(http://dpdk.org/ml/archives/dev/2014-December/010342.html) 
need to be fixed somehow.
Otherwise performance might be really poor.  
Though I suppose that need to be done for task #1 anyway.

Konstantin

> 
> 'rte_pthread_assign_lcore' do the things as 'rte_lcore_(set|get)_affinity()'
> If we keeping using lcore_id, I like the name you proposed.
> 
> I'll send my code update on next Monday.
> 
> >
> > BTW, one more thing: while we are on it  - it is probably a good time to do
> > something with our interrupt thread?
> > It is a bit strange that we can't use rte_pktmbuf_free() or
> > rte_spinlock_recursive_lock() from our own interrupt/alarm handlers
> >
> > Konstantin



More information about the dev mailing list