[PATCH v6 3/4] mempool: fix cache flushing algorithm
Morten Brørup
mb at smartsharesystems.com
Sun Oct 9 16:31:50 CEST 2022
> From: Andrew Rybchenko [mailto:andrew.rybchenko at oktetlabs.ru]
> Sent: Sunday, 9 October 2022 15.38
>
> From: Morten Brørup <mb at smartsharesystems.com>
>
> Fix the rte_mempool_do_generic_put() caching flushing algorithm to
> keep hot objects in cache instead of cold ones.
>
> The algorithm was:
> 1. Add the objects to the cache.
> 2. Anything greater than the cache size (if it crosses the cache flush
> threshold) is flushed to the backend.
>
> Please note that the description in the source code said that it kept
> "cache min value" objects after flushing, but the function actually
> kept
> the cache full after flushing, which the above description reflects.
>
> Now, the algorithm is:
> 1. If the objects cannot be added to the cache without crossing the
> flush threshold, flush some cached objects to the backend to
> free up required space.
> 2. Add the objects to the cache.
>
> The most recent (hot) objects were flushed, leaving the oldest (cold)
> objects in the mempool cache. The bug degraded performance, because
> flushing prevented immediate reuse of the (hot) objects already in
> the CPU cache. Now, the existing (cold) objects in the mempool cache
> are flushed before the new (hot) objects are added the to the mempool
> cache.
>
> Since nearby code is touched anyway fix flush threshold comparison
> to do flushing if the threshold is really exceed, not just reached.
> I.e. it must be "len > flushthresh", not "len >= flushthresh".
> Consider a flush multiplier of 1 instead of 1.5; the cache would be
> flushed already when reaching size objects, not when exceeding size
> objects. In other words, the cache would not be able to hold "size"
> objects, which is clearly a bug. The bug could degraded performance
> due to premature flushing.
>
> Since we never exceed flush threshold now, cache size in the mempool
> may be decreased from RTE_MEMPOOL_CACHE_MAX_SIZE * 3 to
> RTE_MEMPOOL_CACHE_MAX_SIZE * 2. In fact it could be
> CALC_CACHE_FLUSHTHRESH(RTE_MEMPOOL_CACHE_MAX_SIZE), but flush
> threshold multiplier is internal.
>
> Signed-off-by: Morten Brørup <mb at smartsharesystems.com>
> Signed-off-by: Andrew Rybchenko <andrew.rybchenko at oktetlabs.ru>
> ---
[...]
> --- a/lib/mempool/rte_mempool.h
> +++ b/lib/mempool/rte_mempool.h
> @@ -90,7 +90,7 @@ struct rte_mempool_cache {
> * Cache is allocated to this size to allow it to overflow in
> certain
> * cases to avoid needless emptying of cache.
> */
> - void *objs[RTE_MEMPOOL_CACHE_MAX_SIZE * 3]; /**< Cache objects */
> + void *objs[RTE_MEMPOOL_CACHE_MAX_SIZE * 2]; /**< Cache objects */
> } __rte_cache_aligned;
How much are we allowed to break the ABI here?
This patch reduces the size of the structure by removing a now unused part at the end, which should be harmless.
If we may also move the position of the objs array, I would add __rte_cache_aligned to the objs array. It makes no difference in the general case, but if get/put operations are always 32 objects, it will reduce the number of memory (or last level cache) accesses from five to four 64 B cache lines for every get/put operation.
uint32_t len; /**< Current cache count */
- /*
- * Cache is allocated to this size to allow it to overflow in certain
- * cases to avoid needless emptying of cache.
- */
- void *objs[RTE_MEMPOOL_CACHE_MAX_SIZE * 3]; /**< Cache objects */
+ /**
+ * Cache objects
+ *
+ * Cache is allocated to this size to allow it to overflow in certain
+ * cases to avoid needless emptying of cache.
+ */
+ void *objs[RTE_MEMPOOL_CACHE_MAX_SIZE * 2] __rte_cache_aligned;
} __rte_cache_aligned;
With or without the above suggested optimization...
Reviewed-by: Morten Brørup <mb at smartsharesystems.com>
More information about the dev
mailing list