[RFC 0/3] lib/fastmem: fast small-object allocator
Mattias Rönnblom
hofors at lysator.liu.se
Mon May 25 12:36:39 CEST 2026
This RFC introduces fastmem, a general-purpose small-object allocator
for DPDK. It is intended to replace per-type mempools with a single
allocator that handles arbitrary sizes, grows on demand, and matches
mempool-level performance on the hot path.
Motivation
----------
DPDK applications commonly maintain many mempools — one per object
type (connections, sessions, timers, work items). Each must be sized
up front, wastes memory when over-provisioned, and cannot serve
objects of a different size. Fastmem eliminates this by accepting
arbitrary sizes at runtime, backed by a slab allocator that
repurposes memory across size classes as demand shifts.
Design
------
Three-layer architecture:
1. Backing memory: 128 MiB IOVA-contiguous memzones from EAL,
reserved lazily (or pre-reserved for deterministic latency).
2. Slabs: 2 MiB, 2 MiB-aligned regions carved from memzones.
The alignment enables O(1) slab lookup from any object pointer
via bitmask — no radix tree or index structure. Slabs move
freely between 18 power-of-2 size classes (8 B to 1 MiB).
3. Per-lcore caches: bounded LIFO stacks (no locks on the hot
path). Cache misses trigger bulk transfers to/from the shared
bin under a spinlock.
Key properties:
- Zero per-object metadata in the production build.
- NUMA-aware, with per-socket bins and free-slab pools.
- DMA-usable memory with O(1) virt-to-IOVA translation.
- Bulk alloc/free with all-or-nothing semantics.
- Backing memory never returned during lifetime (slabs recycled).
- Non-EAL threads supported (bypass cache, take bin lock).
API surface
-----------
rte_fastmem_init / deinit
rte_fastmem_reserve
rte_fastmem_set_limit / get_limit
rte_fastmem_alloc / alloc_socket
rte_fastmem_alloc_bulk / alloc_bulk_socket
rte_fastmem_free / free_bulk
rte_fastmem_virt2iova
rte_fastmem_cache_flush
rte_fastmem_max_size / classes
rte_fastmem_stats / stats_class / stats_lcore / stats_lcore_class
rte_fastmem_stats_reset
All APIs are marked __rte_experimental.
Performance
-----------
The single-object hot path is roughly 2-3x the cost of mempool
and an order of magnitude faster than rte_malloc. Under
multi-lcore contention, fastmem scales similarly to mempool,
while rte_malloc collapses.
Limitations
-----------
- Maximum allocation: 1 MiB. Larger requests should use rte_malloc.
- Power-of-2 classes only; worst-case internal fragmentation ~50%.
- Backing memory not reclaimable short of deinit.
Future work
-----------
- Lcore-affine allocations (false-sharing-free by construction).
- Mempool ops driver for transparent drop-in use.
- Pre-resolved allocator handle binding size class and socket,
eliminating per-call class lookup and enabling an inline
cache-hit fast path.
- Debug mode (cookies, double-free detection, poison-on-free).
- Telemetry integration.
- EAL integration, allowing EAL-internal subsystems to use
fastmem for their small-object allocations.
Mattias Rönnblom (3):
doc: add fastmem programming guide
lib: add fastmem library
app/test: add fastmem test suite
app/test/meson.build | 3 +
app/test/test_fastmem.c | 1682 +++++++++++++++++++++++++
app/test/test_fastmem_perf.c | 997 +++++++++++++++
app/test/test_fastmem_profile.c | 157 +++
doc/api/doxy-api-index.md | 1 +
doc/api/doxy-api.conf.in | 1 +
doc/guides/prog_guide/fastmem_lib.rst | 301 +++++
doc/guides/prog_guide/index.rst | 1 +
lib/fastmem/meson.build | 6 +
lib/fastmem/rte_fastmem.c | 1486 ++++++++++++++++++++++
lib/fastmem/rte_fastmem.h | 644 ++++++++++
lib/meson.build | 1 +
12 files changed, 5280 insertions(+)
create mode 100644 app/test/test_fastmem.c
create mode 100644 app/test/test_fastmem_perf.c
create mode 100644 app/test/test_fastmem_profile.c
create mode 100644 doc/guides/prog_guide/fastmem_lib.rst
create mode 100644 lib/fastmem/meson.build
create mode 100644 lib/fastmem/rte_fastmem.c
create mode 100644 lib/fastmem/rte_fastmem.h
--
2.43.0
More information about the dev
mailing list