[PATCH v5] graph: add optional profiling stats

saeed bishara saeed.bishara.os at gmail.com
Wed Jun 24 15:09:12 CEST 2026


On Wed, Jun 24, 2026 at 10:59 AM Morten Brørup <mb at smartsharesystems.com> wrote:
>
> +Pavan Nikhilesh, +Stephen Hemminger, +Wathsala Vithanage, +Bruce Richardson, +Thomas Monjalon
>
> > From: saeed bishara [mailto:saeed.bishara.os at gmail.com]
> > Sent: Tuesday, 23 June 2026 16.11
> >
> > > > also, instead of adding cacheline for this profiling data, can we
> > > > share with line 1 that used solely for xstats?
> > >
> > > This profiling data is 4 indexes * 2 values * 8-byte fields, so one
> > cache line in itself.
> > make sense.
> > btw, the default value of RTE_GRAPH_BURST_SIZE is 256, I suspect that
> > real applications will enforce smaller burst when pulling from input
> > devices (e.g. 32). Do you expect such cases to change
> > RTE_GRAPH_BURST_SIZE?
>
> Excellent question! I don't know.
> They should. E.g. an application optimized for latency should certainly not process bursts of 256 objects.
>
> IMO, the root problem is the lack of a unified burst size across DPDK, which causes every library to be designed with its own optimal burst size.
> E.g. the Mbuf library uses 64 (for rte_pktmbuf_free_bulk()), and the Graph library uses 256.
>
> There has been an attempt at introducing a unified burst size [1] for DPDK, but it met a lot of resistance, so it still needs to be refined before we can reach a conclusion.
> The drivers supposedly can report an "optimal" burst size at run-time, which the application can then use. But the application is unable to configure its internal burst sizes if one driver reports 64 and another reports 32.
> I'm strongly in favor of a build time constant, used across DPDK. The default value should work reasonably well across drivers and libraries.
> And if an application wants to optimize for performance (either throughput or latency), the developer should experiment to find the optimal value.
> Furthermore, designing for a build time constant max burst size throughout DPDK might provide performance benefits in itself, as the compiler can optimize for this.
>
> [1]: https://inbox.dpdk.org/dev/KdOygM96Qb6d6ADK1-AcnA@monjalon.net/
>
> Now, back to your question...
> As a workaround, I can sample Graph node performance data for 32 objects, instead of sampling for RTE_GRAPH_BURST_SIZE / 2.
I see, so there is no simple static parameter here. what about
tracking max burst, then report the calls/cycles for that case, the
user will also find what was that max burst, and how often it occured.

saeed


More information about the dev mailing list