[RFC 0/3] lib/fastmem: fast small-object allocator

Mattias Rönnblom hofors at lysator.liu.se
Mon May 25 21:39:20 CEST 2026


On 5/25/26 16:30, Stephen Hemminger wrote:
> On Mon, 25 May 2026 12:36:39 +0200
> Mattias Rönnblom <hofors at lysator.liu.se> wrote:
> 
>> This RFC introduces fastmem, a general-purpose small-object allocator
>> for DPDK. It is intended to replace per-type mempools with a single
>> allocator that handles arbitrary sizes, grows on demand, and matches
>> mempool-level performance on the hot path.
> 
> Makes sense, what a simple wrapper inline to allow full replacement
> testing/performance A/B comparison?

Do you mean a mempool or a heap wrapper? Or both?

I haven't looked into what options there are with mempools. A mempool 
driver should be possible, but then I guess one might attempt a 
whole-sale mempool-compatible API as well.

The role(s) fastmem could serve are:
a) An lcore/fast path small-object allocator when you don't know the 
object size and/or count beforehand (i.e., what the cover letter says).
b) Do what mempools do and a.
c) Do what the rte_malloc heap does, but lcore/fast path-friendly. In 
other words, option a but with larger objects too.
e) Something that's both b and c.

I haven't really formed an opinion yet, other than that option a seems 
like a natural first step.

Fastmem is significantly slower than mempools for the moment. Claude 
will tell you to inline, but that doesn't help (at least not in the 
micro benchmarks). Then it will tell you to go remove the statistics, 
which also doesn't help. (Latency is data dependency-driven, so stats 
load/store/compute runs on resources that otherwise would have been idle.)

What does help however is pre-compute socket and bin-related info and 
put into a handle, which the application may optionally use to quickly 
retrieve objects of-a-certain-size/from-a-certain-socket. Still slower 
than mempool though.

> === Scenario 1: Single-object hot path — cycles per (alloc + free) ===
> allocator             8 B         64 B        256 B       1024 B       4096 B
> fastmem              16.9         16.7         17.7         17.6         17.9
> fastmem_h             9.5          9.4          9.5          9.5          9.4
> mempool               6.9          6.9          6.9          7.0          6.6
> rte_malloc           93.7         93.8         94.8        100.1        130.0
> libc                118.8        119.2         20.4         20.4        111.0
> 
> === Scenario 2: Batch alloc, FIFO free — cycles per alloc ===
> allocator             8 B         64 B        256 B       1024 B       4096 B
> fastmem              10.1         10.2         10.8         12.7         14.7
> fastmem_h             6.8          6.7          7.4          9.3         11.4
> mempool               4.2          4.1          4.1          4.1          4.1
> rte_malloc           58.6         58.5         62.1         67.5         68.5
> libc                104.8        104.6         73.7        203.9       1254.0

Intel(R) Xeon(R) Gold 6421N / Ubuntu 24.04 / clang

Best regards,
	Mattias


More information about the dev mailing list