[PATCH v5 0/4] add pointer compression API
Paul Szczepanek
paul.szczepanek at arm.com
Thu Feb 22 09:15:38 CET 2024
For some reason your email is not visible to me, even though it's in the
archive.
On 02/11/202416:32,Konstantin Ananyev konstantin.v.ananyev wrote:
> From one side the code itself is very small and straightforward, > from other side - it is not clear to me what is intended usage for it
> within DPDK and it's applianances?
> Konstantin
The intended usage is explained in the cover email (see below) and demonstrated
in the test supplied in the following patch - when sending arrays of pointers
between cores as it happens in a forwarding example.
On 01/11/2023 18:12, Paul Szczepanek wrote:
> This patchset is proposing adding a new EAL header with utility functions
> that allow compression of arrays of pointers.
>
> When passing caches full of pointers between threads, memory containing
> the pointers is copied multiple times which is especially costly between
> cores. A compression method will allow us to shrink the memory size
> copied.
>
> The compression takes advantage of the fact that pointers are usually
> located in a limited memory region (like a mempool). We can compress them
> by converting them to offsets from a base memory address.
>
> Offsets can be stored in fewer bytes (dictated by the memory region size
> and alignment of the pointer). For example: an 8 byte aligned pointer
> which is part of a 32GB memory pool can be stored in 4 bytes. The API is
> very generic and does not assume mempool pointers, any pointer can be
> passed in.
>
> Compression is based on few and fast operations and especially with vector
> instructions leveraged creates minimal overhead.
>
> The API accepts and returns arrays because the overhead means it only is
> worth it when done in bulk.
>
> Test is added that shows potential performance gain from compression. In
> this test an array of pointers is passed through a ring between two cores.
> It shows the gain which is dependent on the bulk operation size. In this
> synthetic test run on ampere altra a substantial (up to 25%) performance
> gain is seen if done in bulk size larger than 32. At 32 it breaks even and
> lower sizes create a small (less than 5%) slowdown due to overhead.
>
> In a more realistic mock application running the l3 forwarding dpdk
> example that works in pipeline mode on two cores this translated into a
> ~5% throughput increase on an ampere altra.
>
> v2:
> * addressed review comments (style, explanations and typos)
> * lowered bulk iterations closer to original numbers to keep runtime short
> * fixed pointer size warning on 32-bit arch
> v3:
> * added 16-bit versions of compression functions and tests
> * added documentation of these new utility functions in the EAL guide
> v4:
> * added unit test
> * fix bug in NEON implementation of 32-bit decompress
> v5:
> * disable NEON and SVE implementation on AARCH32 due to wrong pointer size
>
> Paul Szczepanek (4):
> eal: add pointer compression functions
> test: add pointer compress tests to ring perf test
> docs: add pointer compression to the EAL guide
> test: add unit test for ptr compression
>
> .mailmap | 1 +
> app/test/meson.build | 1 +
> app/test/test_eal_ptr_compress.c | 108 ++++++
> app/test/test_ring.h | 94 ++++-
> app/test/test_ring_perf.c | 354 ++++++++++++------
> .../prog_guide/env_abstraction_layer.rst | 142 +++++++
> lib/eal/include/meson.build | 1 +
> lib/eal/include/rte_ptr_compress.h | 266 +++++++++++++
> 8 files changed, 843 insertions(+), 124 deletions(-)
> create mode 100644 app/test/test_eal_ptr_compress.c
> create mode 100644 lib/eal/include/rte_ptr_compress.h
>
> --
> 2.25.1
>
More information about the dev
mailing list