[PATCH] mempool: split statistics from debug
    Morten Brørup 
    mb at smartsharesystems.com
       
    Sun Oct 30 21:29:13 CET 2022
    
    
  
> From: Stephen Hemminger [mailto:stephen at networkplumber.org]
> Sent: Sunday, 30 October 2022 17.13
> 
> On Sun, 30 Oct 2022 15:04:18 +0100
> Morten Brørup <mb at smartsharesystems.com> wrote:
> 
> > > From: Morten Brørup [mailto:mb at smartsharesystems.com]
> > > Sent: Sunday, 30 October 2022 12.55
> > >
> > > Split statistics from debug, to make mempool statistics available
> > > without
> > > the performance cost of continuously validating the cookies in the
> > > mempool
> > > elements.
> >
> > mempool_perf_autotest shows that the rate_persec drops to a third (-
> 66 %) when enabling full mempool debug - quite prohibitive!
> >
> > With this patch, the performance cost is much lower. I don't have
> detailed test results, but the initial tests without mempool cache show
> a performance drop of -11 %. Significant, but not prohibitive.
> 
> 
> One trick to avoid conditional in fast path would be to add a dummy
> stats[] per core.
> 
> Another would be to move the fast path get/put stats into the
> mempool_cache.
> The current model has stats[] per cpu always on another cache line.
I submitted a patch to move the likely get/put stats to the mempool cache [1], but retracted it shortly after. I realized that splitting the stats from the debug cookies had much higher performance effect, so we should do this first. We can move statistics counters into the mempool_cache as part two.
[1]: https://patchwork.dpdk.org/project/dpdk/patch/20221028064152.98341-2-mb@smartsharesystems.com/
Also, it might be too late for 22.11 to move stats to the mempool cache. However, splitting the stats from the debug cookies is extremely simple, so that should be accepted for 22.11 (if reviewed properly).
> 
> diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
> index 1f5707f46a21..87905b7286a6 100644
> --- a/lib/mempool/rte_mempool.h
> +++ b/lib/mempool/rte_mempool.h
> @@ -236,8 +236,10 @@ struct rte_mempool {
>         struct rte_mempool_memhdr_list mem_list; /**< List of memory
> chunks */
> 
>  #ifdef RTE_LIBRTE_MEMPOOL_DEBUG
> -       /** Per-lcore statistics. */
> -       struct rte_mempool_debug_stats stats[RTE_MAX_LCORE];
> +       /** Per-lcore statistics.
> +        * Allocate one additional per-cpu slot for non-DPDK threads
> +        */
> +       struct rte_mempool_debug_stats stats[RTE_MAX_LCORE + 1];
Excellent! As a bonus, this also fixes a bug in the statistics: Non-DPDK thread operations were not counted.
>  #endif
>  }  __rte_cache_aligned;
> 
> @@ -302,10 +304,7 @@ struct rte_mempool {
>   */
>  #ifdef RTE_LIBRTE_MEMPOOL_DEBUG
>  #define RTE_MEMPOOL_STAT_ADD(mp, name, n) do {                  \
> -               unsigned __lcore_id = rte_lcore_id();           \
> -               if (__lcore_id < RTE_MAX_LCORE) {               \
> -                       mp->stats[__lcore_id].name += n;        \
> -               }                                               \
> +       (mp)->stats[rte_lcore_id()].name += n;
I suppose the index must be offset by one, for rte_lcore_id() returning LCORE_ID_ANY (=UINT32_MAX) to be stored at offset zero:
+       (mp)->stats[rte_lcore_id() + 1].name += n;
>         } while (0)
>  #else
>  #define RTE_MEMPOOL_STAT_ADD(mp, name, n) do {} while (0)
I have marked this patch as Changes Requested, and will submit a v2 patch series with this improvement - but not today, considering the local time.
NB: I am aware that the loop in rte_mempool_dump() in rte_mempool.c must also be updated to account for the additional slot in the stats array. I'll check if there are similar effects elsewhere in the library.
    
    
More information about the dev
mailing list