[dpdk-dev] [RFC 0/3] tqs: add thread quiescent state library
stephen at networkplumber.org
Sat Dec 1 00:44:17 CET 2018
On Fri, 30 Nov 2018 21:56:30 +0100
Mattias Rönnblom <mattias.ronnblom at ericsson.com> wrote:
> On 2018-11-30 03:13, Honnappa Nagarahalli wrote:
> >> Reinventing RCU is not helping anyone.
> > IMO, this depends on what the rte_tqs has to offer and what the requirements are. Before starting this patch, I looked at the liburcu APIs. I have to say, fairly quickly (no offense) I concluded that this does not address DPDK's needs. I took a deeper look at the APIs/code in the past day and I still concluded the same. My partial analysis (analysis of more APIs can be done, I do not have cycles at this point) is as follows:
> > The reader threads' information is maintained in a linked list. This linked list is protected by a mutex lock. Any additions/deletions/traversals of this list are blocking and cannot happen in parallel.
> > The API, 'synchronize_rcu'  (similar functionality to rte_tqs_check call) is a blocking call. There is no option provided to make it non-blocking. The writer spins cycles while waiting for the grace period to get over.
> Wouldn't the options be call_rcu, which rarely blocks, or defer_rcu()
> which never? Why would the average application want to wait for the
> grace period to be over anyway?
> > 'synchronize_rcu' also has grace period lock . If I have multiple writers running on data plane threads, I cannot call this API to reclaim the memory in the worker threads as it will block other worker threads. This means, there is an extra thread required (on the control plane?) which does garbage collection and a method to push the pointers from worker threads to the garbage collection thread. This also means the time duration from delete to free increases putting pressure on amount of memory held up.
> > Since this API cannot be called concurrently by multiple writers, each writer has to wait for other writer's grace period to get over (i.e. multiple writer threads cannot overlap their grace periods).
> "Real" DPDK applications typically have to interact with the outside
> world using interfaces beyond DPDK packet I/O, and this is best done via
> an intermediate "control plane" thread running in the DPDK application.
> Typically, this thread would also be the RCU writer and "garbage
> collector", I would say.
> > This API also has to traverse the linked list which is not very well suited for calling on data plane.
> > I have not gone too much into rcu_thread_offline API. This again needs to be used in worker cores and does not look to be very optimal.
> > I have glanced at rcu_quiescent_state , it wakes up the thread calling 'synchronize_rcu' which seems good amount of code for the data plane.
> Wouldn't the typical DPDK lcore worker call rcu_quiescent_state() after
> processing a burst of packets? If so, I would more lean toward
> "negligible overhead", than "a good amount of code".
> I must admit I didn't look at your library in detail, but I must still
> ask: if TQS is basically RCU, why isn't it called RCU? And why isn't the
> API calls named in a similar manner?
We used liburcu at Brocade with DPDK. It was just a case of putting rcu_quiescent_state in the packet handling
loop. There were a bunch more cases where control thread needed to register/unregister as part of RCU.
I think any library would have that issue with user supplied threads. You need a "worry about me" and
a "don't worry about me" API in the library.
There is also a tradeoff between call_rcu and defer_rcu about what context the RCU callback happens.
You really need a control thread to handle the RCU cleanup.
The point is that RCU steps into the application design, and liburcu seems to be flexible enough
and well documented enough to allow for more options.
More information about the dev