[RFC PATCH 0/5] Introduce mempool object new debug capabilities
    Morten Brørup 
    mb at smartsharesystems.com
       
    Mon Sep  1 17:34:43 CEST 2025
    
    
  
> From: Slava Ovsiienko [mailto:viacheslavo at nvidia.com]
> Sent: Monday, 25 August 2025 13.28
> 
> Hi, Morten
> 
> > From: Morten Brørup <mb at smartsharesystems.com>
> > Sent: Saturday, July 19, 2025 5:39 PM
> >
> > > From: Morten Brørup [mailto:mb at smartsharesystems.com]
> > > Sent: Monday, 7 July 2025 14.11
> > >
> > > > From: Shani Peretz [mailto:shperetz at nvidia.com]
> > > > Sent: Monday, 7 July 2025 07.45
> > > >
> > > > > From: Stephen Hemminger <stephen at networkplumber.org>
> > > > > Sent: Monday, 16 June 2025 18:30
> > > > >
> > > > > On Mon, 16 Jun 2025 10:29:05 +0300 Shani Peretz
> > > > > <shperetz at nvidia.com> wrote:
> > > > >
> > > > > > This feature is designed to monitor the lifecycle of mempool
> > > objects
> > > > > > as they move between the application and the PMD.
> > > > > >
> > > > > > It will allow us to track the operations and transitions of each
> > > > > > mempool object throughout the system, helping in debugging and
> > > > > understanding objects flow.
> > > > > >
> > > > > > The implementation include several key components:
> > > > > > 1. Added a bitmap to mempool's header (rte_mempool_objhdr)
> > > > > >    that represent the operations history.
> > > > > > 2. Added functions that allow marking operations on an
> > > > > >    mempool objects.
> > > > > > 3. Dumps the history to a file or the console
> > > > > >    (rte_mempool_objects_dump).
> > > > > > 4. Added python script that can parse, analyze the data and
> > > > > >    present it in an human readable format.
> > > > > > 5. Added compilation flag to enable the feature.
> > > > > >
> > > > >
> > > > > Could this not already be done with tracing infrastructure?
> > > >
> > > > Hey,
> > > > We did consider tracing but:
> > > > 	- It has limited capacity, which will result in older mbufs being
> > > > lost in the tracing output while they are still in use
> > > > 	- Some operations may be lost, and we might not capture the
> > > > complete picture due to trace misses caused by the performance
> > > overhead
> > > > of tracking on the datapath as far as I understand WDYT?
> > >
> > > This looks like an alternative trace infrastructure, just for mempool
> > > objects.
> 
> It is rather not alternative, but some orthogonal way to trace.
> We gather the life history per mbuf and it helps a lot with nailing the issues
> on the user's sites. From our practice - it helped instantly to find the mbuf
> double free,
> and core policy violations for rx/tx_burst routines in the user applications .
Great with some real life use cases, proving the value of this patch series!
<sidetracking>
The mbuf library has a performance optimization where it keeps ref_cnt==1 for free mbufs in the pool, so double free of mbufs cannot be detected (without this series). Some other patch series could make that performance optimization optional, so double free can be detected by the mbuf library at runtime. Although that would be a major API change, considering how many drivers bypass the mbuf library and call the mempool library directly for allocating/freeing mbufs.
</sidetracking>
What are "core policy violations for rx/tx_burst routines"?
(I'm trying to understand what other types of bugs this series has helped you track down.)
> 
> > > But the list of operations is limited to basic operations on mbuf
> > > mempool objects.
> 
> Agree, but this limitation is because of only mbuf life milestones
> are well-defined in DPDK:
> - alloc by app/tx_burst/tx queued|tx_busy/free by PMD|app
> - alloc  by PMD/rx replenish/rx_burst_returned/free by app
> 
> No other mempool-based object milestones are defined so well.
> So, do you think we should move this history data to the mbuf object
> itself ?
This series specifically targets mbufs, and it doesn't look useful for generic mempool objects, so yes, please move the history data to the mbuf object itself.
> 
> > > It lacks support for other operations on mbufs, e.g. IP
> > > fragmentation/defragmentation library operations, application specific
> 
> Yes, these ones can be added later, once we see the user's request.
> There are too many options to cover and limited field bit width to embrace all
> of them 😊
Agree.
> 
> > > operations, and transitions between the mempool cache and the mempool
> > > backing store.
> > > It also lacks support for operations on other mempool objects than
> > > mbufs.
> > >
> > > You might better off using the trace infrastructure, or something
> > > similar.
> We considered using the existing trace feature.
> We even tried in practice to achieve our debugging goals with trace
> on the user's sites:
> Trace:
> - is overwhelming - to many events with too many mbufs
> - is incomplete - sometimes there were longtime runs to catch the bug
> conditions
> and it was a little bit hard to record all the huge trace to the files in real
> time,
> usually we had gaps
> - is performance impacting. It records the full event information, that is
> redundant,
> queries timestamp (and this action might introduce execution barriers), uses
> more memory
> (as a large linear address range - it impacts cache performance). Sometimes it
> is crucially
> not to impact the datapath performance to have an issue repro onsite.
OK. Personally, I'm not a big fan of trace in the fast path either.
> 
> > > Using the trace infrastructure allows you to record more detailed
> > > information along with the transitions of "owners" of each mbuf.
> > >
> > > I'm not opposing this RFC, but I think it is very limited, and not
> > > sufficiently expandable.
> It was intentionally developed to be limited, very compact and very fast.
OK. I agree to focusing only on the lifecycle of mbufs with this series.
> 
> > >
> > > I get the point that trace can cause old events on active mbufs to be
> > > lost, and the concept of a trace buffer per mempool object is a good
> > > solution to that.
> > > But I think you need to be able to store much more information with
> > > each transition; at least a timestamp. And if you do that, you need rdtsc
> > > much more than 4 bits per event.
> > >
> > > Alternatively, if you do proceed with the RFC in the current form, I
> > > have two key suggestions:
> > > 1. Make it possible to register operations at runtime. (Look at
> > > dynamic mbuf fields for inspiration.) 2. Use 8 bits for the operation,
> > > instead of 4.
> OK, we can do:
> - narrow the gather history feature to mbuf only
Agree.
> - use the dynamic mbuf field for that
Agree.
> - registration would involve the extra memory access and affects the perf,
> I would prefer to have fixed set of events to record, at least on the feature
> firstintoduction.
OK, let's stick with a fixed set of event types for now.
Patch 1/5 has only 8 event types, so using 4 bits for each event history entry probably suffices.
That still leaves room for 8 more event types.
Suggest adding event types USR1 = 15, USR2 = 14, USR3 = 13, USR4 = 12 for application use.
> 
> > > And if you need a longer trace history, you can use the rte_bitset
> > > library instead of a single uint64_t.
> Would be not the best option for the perf ☹
Agree.
I suppose a trace history of 16 or 8 entries suffices for debugging.
> >
> > One more comment:
> > If this feature is meant for mbuf type mempool objects only, it might be
> > possible to add it to the mbuf library (and store the trace in the rte_mbuf
> > structure) instead of the mempool library.
> 
> > Although that would prevent tracing internal mempool library operations,
> > specifically moving the object between the mempool cache and mempool
> > backing store.
> 
> We can update free/alloc mbuf functions, so functionality will not be lost.
> 
> With best regards,
> Slava
> 
Please go ahead with this, Slava.
I'm looking forward to seeing the next version of this series. :-)
-Morten
    
    
More information about the dev
mailing list