[dpdk-dev] [PATCH 1/3] rcu: add RCU library supporting QSBR mechanism

Honnappa Nagarahalli Honnappa.Nagarahalli at arm.com
Fri Mar 29 06:54:09 CET 2019


> 
> > >
> > > > +#define RTE_QSBR_CNT_THR_OFFLINE 0 #define RTE_QSBR_CNT_INIT
> 1
> > > > +
> > > > +/**
> > > > + * RTE thread Quiescent State structure.
> > > > + * Quiescent state counter array (array of 'struct
> > > > +rte_rcu_qsbr_cnt'),
> > > > + * whose size is dependent on the maximum number of reader
> > > > +threads
> > > > + * (m_threads) using this variable is stored immediately
> > > > +following
> > > > + * this structure.
> > > > + */
> > > > +struct rte_rcu_qsbr {
> > > > +	uint64_t token __rte_cache_aligned;
> > > > +	/**< Counter to allow for multiple simultaneous QS queries */
> > > > +
> > > > +	uint32_t num_elems __rte_cache_aligned;
> > > > +	/**< Number of elements in the thread ID array */
> > > > +	uint32_t m_threads;
> > > > +	/**< Maximum number of threads this RCU variable will use */
> > > > +
> > > > +	uint64_t reg_thread_id[RTE_QSBR_THRID_ARRAY_ELEMS]
> > > __rte_cache_aligned;
> > > > +	/**< Registered thread IDs are stored in a bitmap array */
> > >
> > >
> > > As I understand you ended up with fixed size array to avoid 2
> > > variable size arrays in this struct?
> > Yes
> >
> > > Is that big penalty for register/unregister() to either store a
> > > pointer to bitmap, or calculate it based on num_elems value?
> > In the last RFC I sent out [1], I tested the impact of having
> > non-fixed size array. There 'was' a performance degradation in most of the
> performance tests. The issue was with calculating the address of per thread
> QSBR counters (not with the address calculation of the bitmap).
> > With the current patch, I do not see the performance difference (the
> > difference between the RFC and this patch are the memory orderings,
> > they are masking any perf gain from having a fixed array). However, I have
> kept the fixed size array as the generated code does not have additional
> calculations to get the address of qsbr counter array elements.
> >
> > [1] http://mails.dpdk.org/archives/dev/2019-February/125029.html
> 
> Ok I see, but can we then arrange them ina  different way:
> qsbr_cnt[] will start at the end of struct rte_rcu_qsbr (same as you have it
> right now).
> While bitmap will be placed after qsbr_cnt[].
Yes, that is an option. Though, it would mean we have to calculate the address, similar to macro 'RTE_QSBR_CNT_ARRAY_ELM'

> As I understand register/unregister is not consider on critical path, so some
> perf-degradation here doesn't matter.
Yes

> Also check() would need extra address calculation for bitmap, but considering
> that we have to go through all bitmap (and in worst case qsbr_cnt[])
> anyway, that probably not a big deal?
I think the address calculation can be made simpler than what I had tried before. I can give it a shot.

> 
> >
> > > As another thought - do we really need bitmap at all?
> > The bit map is helping avoid accessing all the elements in
> > rte_rcu_qsbr_cnt array (as you have mentioned below). This provides
> > the ability to scale the number of threads dynamically. For ex: an
> application can create a qsbr variable with 48 max threads, but currently only
> 2 threads are active (due to traffic conditions).
> 
> I understand that bitmap supposed to speedup check() for situations when
> most threads are unregistered.
> My thought was that might be check() speedup for such situation is not that
> critical.
IMO, there is a need to address both the cases, considering the future direction of DPDK. It is possible to introduce a counter for the current number of threads registered. If that is same as maximum number of threads, then scanning the registered thread ID array can be skipped.

> 
> >
> > > Might it is possible to sotre register value for each thread inside
> > > it's
> > > rte_rcu_qsbr_cnt:
> > > struct rte_rcu_qsbr_cnt {uint64_t cnt; uint32_t register;}
> > > __rte_cache_aligned; ?
> > > That would cause check() to walk through all elems in
> > > rte_rcu_qsbr_cnt array, but from other side would help to avoid cache
> conflicts for register/unregister.
> > With the addition of rte_rcu_qsbr_thread_online/offline APIs, the
> > register/unregister APIs are not in critical path anymore. Hence, the cache
> conflicts are fine. The online/offline APIs work on thread specific cache lines
> and these are in the critical path.
> >
> > >
> > > > +} __rte_cache_aligned;
> > > > +


More information about the dev mailing list