[PATCH v3 0/5] Stage-Ordered API and other extensions for ring library
    Konstantin Ananyev 
    konstantin.v.ananyev at yandex.ru
       
    Mon Sep 16 14:37:28 CEST 2024
    
    
  
From: Konstantin Ananyev <konstantin.ananyev at huawei.com>
v2 -> v3:
- fix compilation/doxygen complains
- dropped patch:
  "examples/l3fwd: make ACL work in pipeline and eventdev modes": [2]
  As was mentioned in the patch desctiption it was way too big, 
  controversial and incomplete. If the community is ok to introduce
  pipeline model into the l3fwd, then it is propbably worth to be
  a separate patch series.
v1 -> v2:
- rename 'elmst/objst' to 'meta' (Morten)
- introduce new data-path APIs set: one with both meta{} and objs[],
  second with just objs[] (Morten)
- split data-path APIs into burst/bulk flavours (same as rte_ring)
- added dump function for te_soring and improved dump() for rte_ring.
- dropped patch:
  " ring: minimize reads of the counterpart cache-line"
  - no performance gain observed
  - actually it does change behavior of conventional rte_ring
    enqueue/dequeue APIs -
    it could return available/free less then actually exist in the ring.
    As in some other libs we reliy on that information - it will
    introduce problems.
The main aim of these series is to extend ring library with
new API that allows user to create/use Staged-Ordered-Ring (SORING)
abstraction. In addition to that there are few other patches that serve
different purposes:
- first two patches are just code reordering to de-duplicate
  and generalize existing rte_ring code.
- patch #3 extends rte_ring_dump() to correctly print head/tail metadata
  for different sync modes.
- next two patches introduce SORING API into the ring library and
  provide UT for it.
SORING overview
==============
Staged-Ordered-Ring (SORING) provides a SW abstraction for 'ordered' queues
with multiple processing 'stages'. It is based on conventional DPDK
rte_ring, re-uses many of its concepts, and even substantial part of
its code.
It can be viewed as an 'extension' of rte_ring functionality.
In particular, main SORING properties:
- circular ring buffer with fixed size objects
- producer, consumer plus multiple processing stages in between.
- allows to split objects processing into multiple stages.
- objects remain in the same ring while moving from one stage to the other,
  initial order is preserved, no extra copying needed.
- preserves the ingress order of objects within the queue across multiple
  stages
- each stage (and producer/consumer) can be served by single and/or
  multiple threads.
- number of stages, size and number of objects in the ring are
 configurable at ring initialization time.
Data-path API provides four main operations:
- enqueue/dequeue works in the same manner as for conventional rte_ring,
  all rte_ring synchronization types are supported.
- acquire/release - for each stage there is an acquire (start) and
  release (finish) operation. After some objects are 'acquired' -
  given thread can safely assume that it has exclusive ownership of
  these objects till it will invoke 'release' for them.
  After 'release', objects can be 'acquired' by next stage and/or dequeued
  by the consumer (in case of last stage).
Expected use-case: applications that uses pipeline model
(probably with multiple stages) for packet processing, when preserving
incoming packet order is important.
The concept of ‘ring with stages’ is similar to DPDK OPDL eventdev PMD [1],
but the internals are different.
In particular, SORING maintains internal array of 'states' for each element
in the ring that is  shared by all threads/processes that access the ring.
That allows 'release' to avoid excessive waits on the tail value and helps
to improve performancei and scalability.
In terms of performance, with our measurements rte_soring and
conventional rte_ring provide nearly identical numbers.
As an example, on our SUT: Intel ICX CPU @ 2.00GHz,
l3fwd (--lookup=acl) in pipeline mode [2] both
rte_ring and rte_soring reach ~20Mpps for single I/O lcore and same
number of worker lcores.
[1] https://www.dpdk.org/wp-content/uploads/sites/35/2018/06/DPDK-China2017-Ma-OPDL.pdf
[2] https://patchwork.dpdk.org/project/dpdk/patch/20240906131348.804-7-konstantin.v.ananyev@yandex.ru/
Eimear Morrissey (1):
  ring: make dump function more verbose
Konstantin Ananyev (4):
  ring: common functions for 'move head' ops
  ring: make copying functions generic
  ring/soring: introduce Staged Ordered Ring
  app/test: add unit tests for soring API
 .mailmap                           |   1 +
 app/test/meson.build               |   3 +
 app/test/test_ring_stress_impl.h   |   1 +
 app/test/test_soring.c             | 442 +++++++++++++++
 app/test/test_soring_mt_stress.c   |  40 ++
 app/test/test_soring_stress.c      |  48 ++
 app/test/test_soring_stress.h      |  35 ++
 app/test/test_soring_stress_impl.h | 827 +++++++++++++++++++++++++++++
 lib/ring/meson.build               |   4 +-
 lib/ring/rte_ring.c                |  87 ++-
 lib/ring/rte_ring.h                |  15 +
 lib/ring/rte_ring_c11_pvt.h        | 134 +----
 lib/ring/rte_ring_elem_pvt.h       | 181 +++++--
 lib/ring/rte_ring_generic_pvt.h    | 121 +----
 lib/ring/rte_ring_hts_elem_pvt.h   |  85 +--
 lib/ring/rte_ring_rts_elem_pvt.h   |  85 +--
 lib/ring/rte_soring.c              | 182 +++++++
 lib/ring/rte_soring.h              | 547 +++++++++++++++++++
 lib/ring/soring.c                  | 548 +++++++++++++++++++
 lib/ring/soring.h                  | 124 +++++
 lib/ring/version.map               |  26 +
 21 files changed, 3140 insertions(+), 396 deletions(-)
 create mode 100644 app/test/test_soring.c
 create mode 100644 app/test/test_soring_mt_stress.c
 create mode 100644 app/test/test_soring_stress.c
 create mode 100644 app/test/test_soring_stress.h
 create mode 100644 app/test/test_soring_stress_impl.h
 create mode 100644 lib/ring/rte_soring.c
 create mode 100644 lib/ring/rte_soring.h
 create mode 100644 lib/ring/soring.c
 create mode 100644 lib/ring/soring.h
-- 
2.35.3
    
    
More information about the dev
mailing list