[dpdk-dev] [PATCH v3 0/5] Add non-blocking ring
Honnappa Nagarahalli
Honnappa.Nagarahalli at arm.com
Fri Jan 25 06:20:58 CET 2019
Hi Gage,
Thank you for this patch. Arm (Ola Liljedahl) had worked on a non-blocking ring algorithm. We were planning to add it to DPDK at some point this year. I am wondering if you would be open to take a look at the algorithm and collaborate?
I am yet to fully understand both the algorithms. But, Ola has reviewed your patch and can provide a quick overview of the differences here.
If you agree, we can send a RFC patch. You can review that and do performance benchmarking on your platforms. I can also benchmark your patch (may be once you fix the issue identified in __rte_ring_do_nb_enqueue_mp function?) on Arm platforms. May be we can end up with a better combined algorithm.
Hi Thomas/Bruce,
Please let me know if this is ok and if there is a better way to do this.
Thank you,
Honnappa
> -----Original Message-----
> From: dev <dev-bounces at dpdk.org> On Behalf Of Gage Eads
> Sent: Friday, January 18, 2019 9:23 AM
> To: dev at dpdk.org
> Cc: olivier.matz at 6wind.com; arybchenko at solarflare.com;
> bruce.richardson at intel.com; konstantin.ananyev at intel.com;
> stephen at networkplumber.org
> Subject: [dpdk-dev] [PATCH v3 0/5] Add non-blocking ring
>
> For some users, the rte ring's "non-preemptive" constraint is not acceptable;
> for example, if the application uses a mixture of pinned high-priority threads
> and multiplexed low-priority threads that share a mempool.
>
> This patchset introduces a non-blocking ring, on top of which a mempool can
> run.
> Crucially, the non-blocking algorithm relies on a 128-bit compare-and-swap,
> so it is currently limited to x86_64 machines. This is also an experimental API,
> so RING_F_NB users must build with the ALLOW_EXPERIMENTAL_API flag.
>
> The ring uses more compare-and-swap atomic operations than the regular rte
> ring:
> With no contention, an enqueue of n pointers uses (1 + 2n) CAS operations
> and a dequeue of n pointers uses 2. This algorithm has worse average-case
> performance than the regular rte ring (particularly a highly-contended ring
> with large bulk accesses), however:
> - For applications with preemptible pthreads, the regular rte ring's worst-case
> performance (i.e. one thread being preempted in the update_tail() critical
> section) is much worse than the non-blocking ring's.
> - Software caching can mitigate the average case performance for ring-based
> algorithms. For example, a non-blocking ring based mempool (a likely use
> case
> for this ring) with per-thread caching.
>
> The non-blocking ring is enabled via a new flag, RING_F_NB. For ease-of-use,
> existing ring enqueue/dequeue functions work with both "regular" and non-
> blocking rings.
>
> This patchset also adds non-blocking versions of ring_autotest and
> ring_perf_autotest, and a non-blocking ring based mempool.
>
> This patchset makes one API change; a deprecation notice will be posted in a
> separate commit.
>
> This patchset depends on the non-blocking stack patchset[1].
>
> [1] http://mails.dpdk.org/archives/dev/2019-January/123653.html
>
> v3:
> - Avoid the ABI break by putting 64-bit head and tail values in the same
> cacheline as struct rte_ring's prod and cons members.
> - Don't attempt to compile rte_atomic128_cmpset without
> ALLOW_EXPERIMENTAL_API, as this would break a large number of libraries.
> - Add a helpful warning to __rte_ring_do_nb_enqueue_mp() in case someone
> tries
> to use RING_F_NB without the ALLOW_EXPERIMENTAL_API flag.
> - Update the ring mempool to use experimental APIs
> - Clarify that RINB_F_NB is only limited to x86_64 currently; ARMv8.1-A
> builds
> can eventually support it with the CASP instruction.
>
> v2:
> - Merge separate docs commit into patch #5
> - Convert uintptr_t to size_t
> - Add a compile-time check for the size of size_t
> - Fix a space-after-typecast issue
> - Fix an unnecessary-parentheses checkpatch warning
> - Bump librte_ring's library version
>
> Gage Eads (5):
> ring: add 64-bit headtail structure
> ring: add a non-blocking implementation
> test_ring: add non-blocking ring autotest
> test_ring_perf: add non-blocking ring perf test
> mempool/ring: add non-blocking ring handlers
>
> doc/guides/prog_guide/env_abstraction_layer.rst | 2 +-
> drivers/mempool/ring/Makefile | 1 +
> drivers/mempool/ring/meson.build | 2 +
> drivers/mempool/ring/rte_mempool_ring.c | 58 ++-
> lib/librte_eventdev/rte_event_ring.h | 2 +-
> lib/librte_ring/Makefile | 3 +-
> lib/librte_ring/rte_ring.c | 72 ++-
> lib/librte_ring/rte_ring.h | 574 ++++++++++++++++++++++--
> lib/librte_ring/rte_ring_generic_64.h | 152 +++++++
> lib/librte_ring/rte_ring_version.map | 7 +
> test/test/test_ring.c | 57 ++-
> test/test/test_ring_perf.c | 19 +-
> 12 files changed, 874 insertions(+), 75 deletions(-) create mode 100644
> lib/librte_ring/rte_ring_generic_64.h
>
> --
> 2.13.6
More information about the dev
mailing list