[PATCH v7] mempool: test performance with larger bursts
Bruce Richardson
bruce.richardson at intel.com
Tue Jun 18 15:21:34 CEST 2024
On Mon, Jun 10, 2024 at 10:56:00AM +0200, Morten Brørup wrote:
> PING (again) for review.
>
> Many applications use bursts of more than 32 packets,
> and some applications buffer more than 512 packets.
>
> This patch updates the mempool perf test accordingly.
>
> -Morten
>
> > From: Morten Brørup [mailto:mb at smartsharesystems.com]
> > Sent: Thursday, 4 April 2024 11.27
> >
> > PING for review. This patch is relatively trivial.
> >
> > > From: Morten Brørup [mailto:mb at smartsharesystems.com]
> > > Sent: Saturday, 2 March 2024 21.04
> > >
> > > Bursts of up to 64, 128 and 256 packets are not uncommon, so increase the
> > > maximum tested get and put burst sizes from 32 to 256.
> > > For convenience, also test get and put burst sizes of
> > > RTE_MEMPOOL_CACHE_MAX_SIZE.
> > >
> > > Some applications keep more than 512 objects, so increase the maximum
> > > number of kept objects from 512 to 32768, still in jumps of factor four.
> > > This exceeds the typical mempool cache size of 512 objects, so the test
> > > also exercises the mempool driver.
> > >
> > > Increased the precision of rate_persec calculation by timing the actual
> > > duration of the test, instead of assuming it took exactly 5 seconds.
> > >
> > > Added cache guard to per-lcore stats structure.
> > >
> > > Signed-off-by: Morten Brørup <mb at smartsharesystems.com>
> > > Acked-by: Chengwen Feng <fengchengwen at huawei.com>
> > > ---
> > >
> > > v7:
> > > * Increase max burst size to 256. (Inspired by Honnappa)
> > > v6:
> > > * Do not test with more lcores than available. (Thomas)
> > > v5:
> > > * Increased N, to reduce measurement overhead with large numbers of kept
> > > objects.
> > > * Increased precision of rate_persec calculation.
> > > * Added missing cache guard to per-lcore stats structure.
This looks ok to me. However, the test itself takes a very long time to
run, with 5 seconds per iteration. One suggest I have is to reduce the
5-seconds to 1-second - given we are looking at millions of iterations each
time, the difference in results should not be that great, I'd hope. A very
quick test of the delta on my end indicates variance in the first couple of
results of a couple of %, just.
With or without this suggestion.
Acked-by: Bruce Richardson <bruce.richardson at intel.com>
More information about the dev
mailing list