[PATCH v14 3/6] ptr_compress: add pointer compression library
    David Marchand 
    david.marchand at redhat.com
       
    Mon Jun 10 17:18:56 CEST 2024
    
    
  
Hello,
On Fri, Jun 7, 2024 at 5:10 PM Paul Szczepanek <paul.szczepanek at arm.com> wrote:
>
> Add a new utility header for compressing pointers. The provided
> functions can store pointers as 32-bit or 16-bit offsets.
>
> The compression takes advantage of the fact that pointers are
> usually located in a limited memory region (like a mempool).
> We can compress them by converting them to offsets from a base
> memory address. Offsets can be stored in fewer bytes (dictated
> by the memory region size and alignment of the pointer).
> For example: an 8 byte aligned pointer which is part of a 32GB
> memory pool can be stored in 4 bytes.
>
> Suggested-by: Honnappa Nagarahalli <honnappa.nagarahalli at arm.com>
> Signed-off-by: Paul Szczepanek <paul.szczepanek at arm.com>
> Signed-off-by: Kamalakshitha Aligeri <kamalakshitha.aligeri at arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli at arm.com>
> Reviewed-by: Nathan Brown <nathan.brown at arm.com>
> Reviewed-by: Jack Bond-Preston <jack.bond-preston at foss.arm.com>
> Acked-by: Morten Brørup <mb at smartsharesystems.com>
> ---
>  MAINTAINERS                            |   4 +
>  doc/api/doxy-api-index.md              |   1 +
>  doc/api/doxy-api.conf.in               |   1 +
>  doc/guides/rel_notes/release_24_07.rst |   5 +
>  lib/meson.build                        |   1 +
>  lib/ptr_compress/meson.build           |   4 +
>  lib/ptr_compress/rte_ptr_compress.h    | 324 +++++++++++++++++++++++++
>  7 files changed, 340 insertions(+)
>  create mode 100644 lib/ptr_compress/meson.build
>  create mode 100644 lib/ptr_compress/rte_ptr_compress.h
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index c9adff9846..27b2f03e6c 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -1694,6 +1694,10 @@ M: Chenbo Xia <chenbox at nvidia.com>
>  M: Gaetan Rivet <grive at u256.net>
>  F: lib/pci/
>
> +Pointer Compression
> +M: Paul Szczepanek <paul.szczepanek at arm.com>
> +F: lib/ptr_compress/
> +
>  Power management
>  M: Anatoly Burakov <anatoly.burakov at intel.com>
>  M: David Hunt <david.hunt at intel.com>
> diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
> index 8c1eb8fafa..f9283154f8 100644
> --- a/doc/api/doxy-api-index.md
> +++ b/doc/api/doxy-api-index.md
> @@ -222,6 +222,7 @@ The public API headers are grouped by topics:
>    [config file](@ref rte_cfgfile.h),
>    [key/value args](@ref rte_kvargs.h),
>    [argument parsing](@ref rte_argparse.h),
> +  [ptr_compress](@ref rte_ptr_compress.h),
>    [string](@ref rte_string_fns.h),
>    [thread](@ref rte_thread.h)
>
> diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in
> index 27afec8b3b..a8823c046f 100644
> --- a/doc/api/doxy-api.conf.in
> +++ b/doc/api/doxy-api.conf.in
> @@ -71,6 +71,7 @@ INPUT                   = @TOPDIR@/doc/api/doxy-api-index.md \
>                            @TOPDIR@/lib/pipeline \
>                            @TOPDIR@/lib/port \
>                            @TOPDIR@/lib/power \
> +                          @TOPDIR@/lib/ptr_compress \
>                            @TOPDIR@/lib/rawdev \
>                            @TOPDIR@/lib/rcu \
>                            @TOPDIR@/lib/regexdev \
> diff --git a/doc/guides/rel_notes/release_24_07.rst b/doc/guides/rel_notes/release_24_07.rst
> index a69f24cf99..4711792e61 100644
> --- a/doc/guides/rel_notes/release_24_07.rst
> +++ b/doc/guides/rel_notes/release_24_07.rst
> @@ -55,6 +55,11 @@ New Features
>       Also, make sure to start the actual text at the margin.
>       =======================================================
>
> +* **Introduced pointer compression library.**
> +
> +  Library provides functions to compress and decompress arrays of pointers
> +  which can improve application performance under certain conditions.
> +  Performance test was added to help users evaluate performance on their setup.
Please, double empty line before a new section in the release notes.
>
>  Removed Items
>  -------------
> diff --git a/lib/meson.build b/lib/meson.build
> index 7c90602bf5..63becee142 100644
> --- a/lib/meson.build
> +++ b/lib/meson.build
> @@ -14,6 +14,7 @@ libraries = [
>          'argparse',
>          'telemetry', # basic info querying
>          'eal', # everything depends on eal
> +        'ptr_compress',
>          'ring',
>          'rcu', # rcu depends on ring
>          'mempool',
> diff --git a/lib/ptr_compress/meson.build b/lib/ptr_compress/meson.build
> new file mode 100644
> index 0000000000..e92706a45f
> --- /dev/null
> +++ b/lib/ptr_compress/meson.build
> @@ -0,0 +1,4 @@
> +# SPDX-License-Identifier: BSD-3-Clause
> +# Copyright(c) 2024 Arm Limited
> +
> +headers = files('rte_ptr_compress.h')
> diff --git a/lib/ptr_compress/rte_ptr_compress.h b/lib/ptr_compress/rte_ptr_compress.h
> new file mode 100644
> index 0000000000..bf9cfb0661
> --- /dev/null
> +++ b/lib/ptr_compress/rte_ptr_compress.h
> @@ -0,0 +1,324 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2024 Arm Limited
> + */
> +
> +#ifndef RTE_PTR_COMPRESS_H
> +#define RTE_PTR_COMPRESS_H
> +
> +/**
> + * @file
> + * Pointer compression and decompression functions.
> + *
> + * When passing arrays full of pointers between threads, memory containing
> + * the pointers is copied multiple times which is especially costly between
> + * cores. These functions allow us to compress the pointers.
> + *
> + * Compression takes advantage of the fact that pointers are usually located in
> + * a limited memory region. We compress them by converting them to offsets from
> + * a base memory address. Offsets can be stored in fewer bytes.
> + *
> + * The compression functions come in two varieties: 32-bit and 16-bit.
> + *
> + * To determine how many bits are needed to compress the pointer calculate
Please add a ,
... to compress the pointer,  calculate ...
> + * the biggest offset possible (highest value pointer - base pointer)
> + * and shift the value right according to alignment (shift by exponent of the
> + * power of 2 of alignment: aligned by 4 - shift by 2, aligned by 8 - shift by
> + * 3, etc.). The resulting value must fit in either 32 or 16 bits.
> + *
> + * For usage example and further explanation please see "Pointer Compression" in
> + * doc/guides/prog_guide/ptr_compress_lib.rst
I don't like refering to a path in the sources: the rendered doc won't
link to the actual documentation page, and this path may get incorrect
in the future if for some reason the doc is renamed or moved.
The simpler is to state "please see this library documentation
programming guide".
> + */
> +
> +#include <stdint.h>
> +#include <inttypes.h>
> +
> +#include <rte_branch_prediction.h>
> +#include <rte_common.h>
> +#include <rte_debug.h>
> +#include <rte_vect.h>
> +#include <rte_mempool.h>
It is unneeded.
Please, don't create a dependency to the mempool library.
> +#include <rte_bitops.h>
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +/**
> + * Calculate how many bits are required to store a value within a given range.
> + * This is to help decide which pointer compression functions can be used to
> + * store pointers contained within a memory range.
> + *
> + * @param range
> + *   The size of the range the value belongs to.
In my mind, a range takes the form of a set of two symbols.
Here, iiuc, what is needed/requested as an input is the number of
entries of a (contiguous) space.
So the naming is at least strange.
> + * @return
> + *   Number of bits required to store a value.
> + **/
> +#define RTE_PTR_COMPRESS_BITS_REQUIRED_TO_STORE_VALUE_IN_RANGE(range) \
> +       (((uint64_t)range) < 2 ? 1 : \
> +               (sizeof(uint64_t) * CHAR_BIT - rte_clz64((uint64_t)range - 1)))
But now, I don't see why we need to expose this macro (and its sister
RTE_PTR_COMPRESS_BIT_SHIFT_FROM_ALIGNMENT) at all.
Can you clarify the usecase?
> +
> +/**
> + * Calculate how many bits in the address can be dropped without losing any
> + * information thanks to the alignment of the address.
> + *
> + * @param alignment
> + *   Memory alignment.
> + * @return
> + *   Size of shift allowed without dropping any information from the pointer.
> + **/
> +#define RTE_PTR_COMPRESS_BIT_SHIFT_FROM_ALIGNMENT(alignment) \
> +       ((alignment) == 0 ? 0 : rte_ctz64((uint64_t)alignment))
> +
> +/**
> + * Determine if rte_ptr_compress_16_shift can be used to compress pointers
> + * that contain addresses of memory objects whose memory is aligned by
> + * a given amount and contained in a given memory range.
> + *
> + * @param mem_range
> + *   The size of the memory region that contains the objects pointed to.
Again, can you rephrase and not use the term "range"?
Expected input here is a number of pointers / elements in a space.
> + * @param obj_alignment
> + *   The alignment of objects pointed to.
> + * @return
> + *   1 if function can be used, 0 otherwise.
> + **/
> +#define RTE_PTR_COMPRESS_CAN_COMPRESS_16_SHIFT(mem_range, obj_alignment) \
> +       ((RTE_PTR_COMPRESS_BITS_REQUIRED_TO_STORE_VALUE_IN_RANGE(mem_range) - \
> +       RTE_PTR_COMPRESS_BIT_SHIFT_FROM_ALIGNMENT(obj_alignment)) <= 16 ? 1 : 0)
> +
> +/**
> + * Determine if rte_ptr_compress_32_shift can be used to compress pointers
> + * that contain addresses of memory objects whose memory is aligned by
> + * a given amount and contained in a given memory range.
> + *
> + * @param mem_range
> + *   The size of the memory region that contains the objects pointed to.
> + * @param obj_alignment
> + *   The alignment of objects pointed to.
> + * @return
> + *   1 if function can be used, 0 otherwise.
> + **/
> +#define RTE_PTR_COMPRESS_CAN_COMPRESS_32_SHIFT(mem_range, obj_alignment) \
> +       ((RTE_PTR_COMPRESS_BITS_REQUIRED_TO_STORE_VALUE_IN_RANGE(mem_range) - \
> +       RTE_PTR_COMPRESS_BIT_SHIFT_FROM_ALIGNMENT(obj_alignment)) <= 32 ? 1 : 0)
RTE_PTR_COMPRESS_CAN_COMPRESS_16_SHIFT and
RTE_PTR_COMPRESS_CAN_COMPRESS_32_SHIFT do the same job.
The only thing that differs is the size of the compressed space, so
maybe this size could be passed as an input.
No strong opinion though.
> +
> +/**
> + * Compress pointers into 32-bit offsets from base pointer.
> + *
> + * @note It is programmer's responsibility to ensure the resulting offsets fit
> + * into 32 bits. Alignment of the structures pointed to by the pointers allows
> + * us to drop bits from the offsets. This is controlled by the bit_shift
> + * parameter. This means that if structures are aligned by 8 bytes they must be
> + * within 32GB of the base pointer. If there is no such alignment guarantee they
> + * must be within 4GB.
> + *
> + * @param ptr_base
> + *   A pointer used to calculate offsets of pointers in src_table.
> + * @param src_table
> + *   A pointer to an array of pointers.
> + * @param dest_table
> + *   A pointer to an array of compressed pointers returned by this function.
> + * @param n
> + *   The number of objects to compress, must be strictly positive.
> + * @param bit_shift
> + *   Byte alignment of memory pointed to by the pointers allows for
> + *   bits to be dropped from the offset and hence widen the memory region that
> + *   can be covered. This controls how many bits are right shifted.
> + **/
> +static __rte_always_inline void
> +rte_ptr_compress_32_shift(void *ptr_base, void * const *src_table,
> +               uint32_t *dest_table, size_t n, uint8_t bit_shift)
> +{
> +       size_t i = 0;
> +#if defined RTE_HAS_SVE_ACLE && !defined RTE_ARCH_ARMv8_AARCH32
Note: DPDK usually split per architecture implementation in separate
header files, but I guess this is okay in the current form with #ifdef
for now..
[snip]
-- 
David Marchand
    
    
More information about the dev
mailing list