[PATCH v14 3/6] ptr_compress: add pointer compression library
Paul Szczepanek
paul.szczepanek at arm.com
Tue Jun 11 15:16:19 CEST 2024
On 10/06/2024 16:18, David Marchand wrote:
> Hello,
>
> On Fri, Jun 7, 2024 at 5:10 PM Paul Szczepanek <paul.szczepanek at arm.com> wrote:
>>
>> Add a new utility header for compressing pointers. The provided
>> functions can store pointers as 32-bit or 16-bit offsets.
>>
>> The compression takes advantage of the fact that pointers are
>> usually located in a limited memory region (like a mempool).
>> We can compress them by converting them to offsets from a base
>> memory address. Offsets can be stored in fewer bytes (dictated
>> by the memory region size and alignment of the pointer).
>> For example: an 8 byte aligned pointer which is part of a 32GB
>> memory pool can be stored in 4 bytes.
>>
>> Suggested-by: Honnappa Nagarahalli <honnappa.nagarahalli at arm.com>
>> Signed-off-by: Paul Szczepanek <paul.szczepanek at arm.com>
>> Signed-off-by: Kamalakshitha Aligeri <kamalakshitha.aligeri at arm.com>
>> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli at arm.com>
>> Reviewed-by: Nathan Brown <nathan.brown at arm.com>
>> Reviewed-by: Jack Bond-Preston <jack.bond-preston at foss.arm.com>
>> Acked-by: Morten Brørup <mb at smartsharesystems.com>
>> ---
>> MAINTAINERS | 4 +
>> doc/api/doxy-api-index.md | 1 +
>> doc/api/doxy-api.conf.in | 1 +
>> doc/guides/rel_notes/release_24_07.rst | 5 +
>> lib/meson.build | 1 +
>> lib/ptr_compress/meson.build | 4 +
>> lib/ptr_compress/rte_ptr_compress.h | 324 +++++++++++++++++++++++++
>> 7 files changed, 340 insertions(+)
>> create mode 100644 lib/ptr_compress/meson.build
>> create mode 100644 lib/ptr_compress/rte_ptr_compress.h
>>
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index c9adff9846..27b2f03e6c 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -1694,6 +1694,10 @@ M: Chenbo Xia <chenbox at nvidia.com>
>> M: Gaetan Rivet <grive at u256.net>
>> F: lib/pci/
>>
>> +Pointer Compression
>> +M: Paul Szczepanek <paul.szczepanek at arm.com>
>> +F: lib/ptr_compress/
>> +
>> Power management
>> M: Anatoly Burakov <anatoly.burakov at intel.com>
>> M: David Hunt <david.hunt at intel.com>
>> diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
>> index 8c1eb8fafa..f9283154f8 100644
>> --- a/doc/api/doxy-api-index.md
>> +++ b/doc/api/doxy-api-index.md
>> @@ -222,6 +222,7 @@ The public API headers are grouped by topics:
>> [config file](@ref rte_cfgfile.h),
>> [key/value args](@ref rte_kvargs.h),
>> [argument parsing](@ref rte_argparse.h),
>> + [ptr_compress](@ref rte_ptr_compress.h),
>> [string](@ref rte_string_fns.h),
>> [thread](@ref rte_thread.h)
>>
>> diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in
>> index 27afec8b3b..a8823c046f 100644
>> --- a/doc/api/doxy-api.conf.in
>> +++ b/doc/api/doxy-api.conf.in
>> @@ -71,6 +71,7 @@ INPUT = @TOPDIR@/doc/api/doxy-api-index.md \
>> @TOPDIR@/lib/pipeline \
>> @TOPDIR@/lib/port \
>> @TOPDIR@/lib/power \
>> + @TOPDIR@/lib/ptr_compress \
>> @TOPDIR@/lib/rawdev \
>> @TOPDIR@/lib/rcu \
>> @TOPDIR@/lib/regexdev \
>> diff --git a/doc/guides/rel_notes/release_24_07.rst b/doc/guides/rel_notes/release_24_07.rst
>> index a69f24cf99..4711792e61 100644
>> --- a/doc/guides/rel_notes/release_24_07.rst
>> +++ b/doc/guides/rel_notes/release_24_07.rst
>> @@ -55,6 +55,11 @@ New Features
>> Also, make sure to start the actual text at the margin.
>> =======================================================
>>
>> +* **Introduced pointer compression library.**
>> +
>> + Library provides functions to compress and decompress arrays of pointers
>> + which can improve application performance under certain conditions.
>> + Performance test was added to help users evaluate performance on their setup.
>
> Please, double empty line before a new section in the release notes.
>
>
>>
>> Removed Items
>> -------------
>> diff --git a/lib/meson.build b/lib/meson.build
>> index 7c90602bf5..63becee142 100644
>> --- a/lib/meson.build
>> +++ b/lib/meson.build
>> @@ -14,6 +14,7 @@ libraries = [
>> 'argparse',
>> 'telemetry', # basic info querying
>> 'eal', # everything depends on eal
>> + 'ptr_compress',
>> 'ring',
>> 'rcu', # rcu depends on ring
>> 'mempool',
>> diff --git a/lib/ptr_compress/meson.build b/lib/ptr_compress/meson.build
>> new file mode 100644
>> index 0000000000..e92706a45f
>> --- /dev/null
>> +++ b/lib/ptr_compress/meson.build
>> @@ -0,0 +1,4 @@
>> +# SPDX-License-Identifier: BSD-3-Clause
>> +# Copyright(c) 2024 Arm Limited
>> +
>> +headers = files('rte_ptr_compress.h')
>> diff --git a/lib/ptr_compress/rte_ptr_compress.h b/lib/ptr_compress/rte_ptr_compress.h
>> new file mode 100644
>> index 0000000000..bf9cfb0661
>> --- /dev/null
>> +++ b/lib/ptr_compress/rte_ptr_compress.h
>> @@ -0,0 +1,324 @@
>> +/* SPDX-License-Identifier: BSD-3-Clause
>> + * Copyright(c) 2024 Arm Limited
>> + */
>> +
>> +#ifndef RTE_PTR_COMPRESS_H
>> +#define RTE_PTR_COMPRESS_H
>> +
>> +/**
>> + * @file
>> + * Pointer compression and decompression functions.
>> + *
>> + * When passing arrays full of pointers between threads, memory containing
>> + * the pointers is copied multiple times which is especially costly between
>> + * cores. These functions allow us to compress the pointers.
>> + *
>> + * Compression takes advantage of the fact that pointers are usually located in
>> + * a limited memory region. We compress them by converting them to offsets from
>> + * a base memory address. Offsets can be stored in fewer bytes.
>> + *
>> + * The compression functions come in two varieties: 32-bit and 16-bit.
>> + *
>> + * To determine how many bits are needed to compress the pointer calculate
>
> Please add a ,
> ... to compress the pointer, calculate ...
>
>> + * the biggest offset possible (highest value pointer - base pointer)
>> + * and shift the value right according to alignment (shift by exponent of the
>> + * power of 2 of alignment: aligned by 4 - shift by 2, aligned by 8 - shift by
>> + * 3, etc.). The resulting value must fit in either 32 or 16 bits.
>> + *
>> + * For usage example and further explanation please see "Pointer Compression" in
>> + * doc/guides/prog_guide/ptr_compress_lib.rst
>
> I don't like refering to a path in the sources: the rendered doc won't
> link to the actual documentation page, and this path may get incorrect
> in the future if for some reason the doc is renamed or moved.
> The simpler is to state "please see this library documentation
> programming guide".
>
>
>> + */
>> +
>> +#include <stdint.h>
>> +#include <inttypes.h>
>> +
>> +#include <rte_branch_prediction.h>
>> +#include <rte_common.h>
>> +#include <rte_debug.h>
>> +#include <rte_vect.h>
>> +#include <rte_mempool.h>
>
> It is unneeded.
> Please, don't create a dependency to the mempool library.
>
>
I have addressed all of the above in v15.
>> +#include <rte_bitops.h>
>> +
>> +#ifdef __cplusplus
>> +extern "C" {
>> +#endif
>> +
>> +/**
>> + * Calculate how many bits are required to store a value within a given range.
>> + * This is to help decide which pointer compression functions can be used to
>> + * store pointers contained within a memory range.
>> + *
>> + * @param range
>> + * The size of the range the value belongs to.
>
> In my mind, a range takes the form of a set of two symbols.
>
> Here, iiuc, what is needed/requested as an input is the number of
> entries of a (contiguous) space.
> So the naming is at least strange.
>
>> + * @return
>> + * Number of bits required to store a value.
>> + **/
>> +#define RTE_PTR_COMPRESS_BITS_REQUIRED_TO_STORE_VALUE_IN_RANGE(range) \
>> + (((uint64_t)range) < 2 ? 1 : \
>> + (sizeof(uint64_t) * CHAR_BIT - rte_clz64((uint64_t)range - 1)))
>
> But now, I don't see why we need to expose this macro (and its sister
> RTE_PTR_COMPRESS_BIT_SHIFT_FROM_ALIGNMENT) at all.
> Can you clarify the usecase?
>
>
>> +
>> +/**
>> + * Calculate how many bits in the address can be dropped without losing any
>> + * information thanks to the alignment of the address.
>> + *
>> + * @param alignment
>> + * Memory alignment.
>> + * @return
>> + * Size of shift allowed without dropping any information from the pointer.
>> + **/
>> +#define RTE_PTR_COMPRESS_BIT_SHIFT_FROM_ALIGNMENT(alignment) \
>> + ((alignment) == 0 ? 0 : rte_ctz64((uint64_t)alignment))
>> +
>> +/**
>> + * Determine if rte_ptr_compress_16_shift can be used to compress pointers
>> + * that contain addresses of memory objects whose memory is aligned by
>> + * a given amount and contained in a given memory range.
>> + *
>> + * @param mem_range
>> + * The size of the memory region that contains the objects pointed to.
>
> Again, can you rephrase and not use the term "range"?
> Expected input here is a number of pointers / elements in a space.
>
>
I hope I addressed all of the comments above about the use of range in
macros in v15. I have renamed them to better reflect their use. The
functions don't care about number of elements but instead about the
memory ranges that the pointers are constrained to - hence the range in
names originally. I agree that range name implies two values, I am now
calling it a mem_length to make it unambiguous. The parameter is the
length of the memory region.
All the macros exposed are useful - they allow you to easily determine
the params to use in the compression functions.
>> + * @param obj_alignment
>> + * The alignment of objects pointed to.
>> + * @return
>> + * 1 if function can be used, 0 otherwise.
>> + **/
>> +#define RTE_PTR_COMPRESS_CAN_COMPRESS_16_SHIFT(mem_range, obj_alignment) \
>> + ((RTE_PTR_COMPRESS_BITS_REQUIRED_TO_STORE_VALUE_IN_RANGE(mem_range) - \
>> + RTE_PTR_COMPRESS_BIT_SHIFT_FROM_ALIGNMENT(obj_alignment)) <= 16 ? 1 : 0)
>> +
>> +/**
>> + * Determine if rte_ptr_compress_32_shift can be used to compress pointers
>> + * that contain addresses of memory objects whose memory is aligned by
>> + * a given amount and contained in a given memory range.
>> + *
>> + * @param mem_range
>> + * The size of the memory region that contains the objects pointed to.
>> + * @param obj_alignment
>> + * The alignment of objects pointed to.
>> + * @return
>> + * 1 if function can be used, 0 otherwise.
>> + **/
>> +#define RTE_PTR_COMPRESS_CAN_COMPRESS_32_SHIFT(mem_range, obj_alignment) \
>> + ((RTE_PTR_COMPRESS_BITS_REQUIRED_TO_STORE_VALUE_IN_RANGE(mem_range) - \
>> + RTE_PTR_COMPRESS_BIT_SHIFT_FROM_ALIGNMENT(obj_alignment)) <= 32 ? 1 : 0)
>
> RTE_PTR_COMPRESS_CAN_COMPRESS_16_SHIFT and
> RTE_PTR_COMPRESS_CAN_COMPRESS_32_SHIFT do the same job.
> The only thing that differs is the size of the compressed space, so
> maybe this size could be passed as an input.
>
> No strong opinion though.
>
>
They are different and depending on your architecture you might not want
to use 16-bit even if you can fit the pointers in 16 bits due to
performance. I am keeping them separate.
>> +
>> +/**
>> + * Compress pointers into 32-bit offsets from base pointer.
>> + *
>> + * @note It is programmer's responsibility to ensure the resulting offsets fit
>> + * into 32 bits. Alignment of the structures pointed to by the pointers allows
>> + * us to drop bits from the offsets. This is controlled by the bit_shift
>> + * parameter. This means that if structures are aligned by 8 bytes they must be
>> + * within 32GB of the base pointer. If there is no such alignment guarantee they
>> + * must be within 4GB.
>> + *
>> + * @param ptr_base
>> + * A pointer used to calculate offsets of pointers in src_table.
>> + * @param src_table
>> + * A pointer to an array of pointers.
>> + * @param dest_table
>> + * A pointer to an array of compressed pointers returned by this function.
>> + * @param n
>> + * The number of objects to compress, must be strictly positive.
>> + * @param bit_shift
>> + * Byte alignment of memory pointed to by the pointers allows for
>> + * bits to be dropped from the offset and hence widen the memory region that
>> + * can be covered. This controls how many bits are right shifted.
>> + **/
>> +static __rte_always_inline void
>> +rte_ptr_compress_32_shift(void *ptr_base, void * const *src_table,
>> + uint32_t *dest_table, size_t n, uint8_t bit_shift)
>> +{
>> + size_t i = 0;
>> +#if defined RTE_HAS_SVE_ACLE && !defined RTE_ARCH_ARMv8_AARCH32
>
> Note: DPDK usually split per architecture implementation in separate
> header files, but I guess this is okay in the current form with #ifdef
> for now..
>
> [snip]
>
>
More information about the dev
mailing list