[dpdk-dev] [PATCH v5 1/3] rcu: add RCU library supporting QSBR mechanism

Honnappa Nagarahalli Honnappa.Nagarahalli at arm.com
Wed Apr 17 03:45:11 CEST 2019


> > > > > > > > > > >
> > > > > > > > > > > After evaluating long term API/ABI issues, I think
> > > > > > > > > > > you need to get rid of almost all use of inline and
> > > > > > > > > > > visible structures. Yes it might be marginally
> > > > > > > > > > > slower, but you thank me
> > > > > the first time you have to fix something.
> > > > > > > > > > >
> > > > > > > > > > Agree, I was planning on another version to address
> > > > > > > > > > this (I am yet
> > > > > to take a look at your patch addressing the ABI).
> > > > > > > > > > The structure visibility definitely needs to be addressed.
> > > > > > > > > > For the inline functions, is the plan to convert all
> > > > > > > > > > the inline functions in DPDK? If yes, I think we need
> > > > > > > > > > to consider the performance
> > > > > > > > > difference. May be consider L3-fwd application, change
> > > > > > > > > all the
> > > > > inline functions in its path and run a test?
> > > > > > > > >
> > > > > > > > > Every function that is not in the direct datapath should
> > > > > > > > > not be
> > > > > inline.
> > > > > > > > > Exceptions or things like rx/tx burst, ring
> > > > > > > > > enqueue/dequeue, and packet alloc/free
> > > > I do not understand how DPDK can claim ABI compatibility if we
> > > > have
> > > inline functions (unless we freeze any development in these inline
> > > functions forever).
> > > >
> > > > > > > >
> > > > > > > > Plus synchronization routines: spin/rwlock/barrier, etc.
> > > > > > > > I think rcu should be one of such exceptions - it is just
> > > > > > > > another synchronization mechanism after all (just a bit
> > > > > > > > more
> > > sophisticated).
> > > > > > > > Konstantin
> > > > > > >
> > > > > > > If you look at the other userspace RCU, you wil see that the
> > > > > > > only inlines are the rcu_read_lock,rcu_read_unlock and
> > > > > rcu_reference/rcu_assign_pointer.
> > > > > > >
> > > > > > > The synchronization logic is all real functions.
> > > > > >
> > > > > > In fact, I think urcu provides both flavors:
> > > > > > https://github.com/urcu/userspace-
> > > > > rcu/blob/master/include/urcu/static/
> > > > > > urcu-qsbr.h I still don't understand why we have to treat it
> > > > > > differently then let say spin-lock/ticket-lock or rwlock.
> > > > > > If we gone all the way to create our own version of rcu, we
> > > > > > probably want it to be as fast as possible (I know that main
> > > > > > speedup should come from the fact that readers don't have to
> > > > > > wait for writer to finish, but still...)
> > > > > >
> > > > > > Konstantin
> > > > > >
> > > > >
> > > > > Having locking functions inline is already a problem in current
> releases.
> > > > > The implementation can not be improved without breaking ABI (or
> > > > > doing special workarounds like lock v2)
> > > > I think ABI and inline function discussion needs to be taken up in
> > > > a
> > > different thread.
> > > >
> > > > Currently, I am looking to hide the structure visibility. I looked
> > > > at your
> > > patch [1], it is a different case than what I have in this patch. It
> > > is a pretty generic use case as well (similar situation exists in
> > > other libraries). I think a generic solution should be agreed upon.
> > > >
> > > > If we have to hide the structure content, the handle to QS
> > > > variable
> > > returned to the application needs to be opaque. I suggest using 'void *'
> > > behind which any structure can be used.
> > > >
> > > > typedef void * rte_rcu_qsbr_t;
> > > > typedef void * rte_hash_t;
> > > >
> > > > But it requires typecasting.
> > > >
> > > > [1] http://patchwork.dpdk.org/cover/52609/
> > >
> > > C allows structure to be defined without knowing what is in it
> therefore.
> > >
> > > typedef struct rte_rcu_qsbr rte_rcu_qsbr_t;
> > >
> > > is preferred (or do it without typedef)
> > >
> > > struct rte_rcu_qsbr;
> >
> > I see that rte_hash library uses the same approach (struct rte_hash in
> rte_hash.h, though it is marking as internal). But the ABI Laboratory tool
> [1] seems to be reporting incorrect numbers for this library even though
> the internal structure is changed.
> >
> > [1]
> > https://abi-
> laboratory.pro/index.php?view=compat_report&l=dpdk&v1=19.0
> > 2&v2=current&obj=66794&kind=abi
> 
> The problem is rte_hash structure is exposed as part of ABI in
> rte_cuckoo_hash.h This was a mistake.
Do you mean, due to the use of structure with the same name? I am wondering if it is just a tools issue. The application is not supposed to include rte_cuckoo_hash.h.

For the RCU library, we either need to go all functions or leave it the way it is. I do not see a point in trying to hide the internal structure while having inline functions.

I converted the inline functions to function calls.

Testing on Arm platform (results *are* repeatable) shows very minimal drop (0.1% to 0.2%) in performance while using lock-free rte_hash data structure. But one of the test cases which is just spinning shows good amount of drop (41%).

Testing on x86 (Xeon Gold 6132 CPU @ 2.60GHz, results *are* pretty repeatable) shows performance improvements (7% to 8%) while using lock-free rte_hash data structure. The test cases which is just spinning show significant drop (14%, 155%, 231%).

Konstantin, any thoughts on the results?

I will send out V6 which will fix issues reported so far. The function vs inline part is still open, need to close it soon.


More information about the dev mailing list