[dpdk-dev] [PATCH v1] event/sw: performance improvements

Van Haaren, Harry harry.van.haaren at intel.com
Tue Oct 6 09:59:37 CEST 2020


> -----Original Message-----
> From: Jerin Jacob <jerinjacobk at gmail.com>
> Sent: Monday, October 5, 2020 5:35 PM
> To: Nicolau, Radu <radu.nicolau at intel.com>
> Cc: Honnappa Nagarahalli <Honnappa.Nagarahalli at arm.com>; Richardson, Bruce
> <bruce.richardson at intel.com>; Ananyev, Konstantin
> <konstantin.ananyev at intel.com>; Van Haaren, Harry
> <harry.van.haaren at intel.com>; dev at dpdk.org; jerinj at marvell.com; nd
> <nd at arm.com>
> Subject: Re: [dpdk-dev] [PATCH v1] event/sw: performance improvements
> 
> On Tue, Sep 29, 2020 at 2:32 PM Nicolau, Radu <radu.nicolau at intel.com> wrote:
> >
> >
> > On 9/28/2020 5:02 PM, Honnappa Nagarahalli wrote:
> > > <snip>
> > >>> Add minimum burst throughout the scheduler pipeline and a flush counter.
> > >>> Replace ring API calls with local single threaded implementation where
> > >>> possible.
> > >>>
> > >>> Signed-off-by: Radu Nicolau mailto:radu.nicolau at intel.com
> > >>>
> > >>> Thanks for the patch, a few comments inline.
> > >>>
> 
> > >>> Why not make these APIs part of the rte_ring library? You could further
> > >> optimize them by keeping the indices on the same cacheline.
> > >>> I'm not sure there is any need for non thread-safe rings outside this
> > >> particular case.
> > >>> [Honnappa] I think if we add the APIs, we will find the use cases.
> > >>> But, more than that, I understand that rte_ring structure is exposed to the
> > >> application. The reason for doing that is the inline functions that rte_ring
> > >> provides. IMO, we should still maintain modularity and should not use the
> > >> internals of the rte_ring structure outside of the library.
> > >>> +1 to that.
> > >>>
> > >>> BTW, is there any real perf benefit from such micor-optimisation?
> > >> I'd tend to view these as use-case specific, and I'm not sure we should clutter
> > >> up the ring library with yet more functions, especially since they can't be
> > >> mixed with the existing enqueue/dequeue functions, since they don't use
> > >> the head pointers.
> > > IMO, the ring library is pretty organized with the recent addition of HTS/RTS
> modes. This can be one of the modes and should allow us to use the existing
> functions (though additional functions are required as well).
> > > The other concern I have is, this implementation can be further optimized by
> using a single cache line for the pointers. It uses 2 cache lines just because of the
> layout of the rte_ring structure.
> > > There was a question earlier about the performance improvements of this
> patch? Are there any % performance improvements that can be shared?
> > > It is also possible to change the above functions to use the head/tail pointers
> from producer or the consumer cache line alone to check for perf differences.
> >
> > I don't have a % for the final improvement for this change alone, but
> > there was some improvement in the memory overhead measurable during
> > development, which very likely resulted in the whole optimization having
> > more headroom.
> >
> > I agree that this may be further optimized, maybe by having a local
> > implementation of a ring-like container instead.
> 
> Have we decided on the next steps for this patch? Is the plan to
> supersede this patch and have different
> one in rte_ring subsystem,

My preference is to merge this version of the patch;
1) The ring helper functions are stripped to the SW PMD usage, and not valid to use in the general.
2) Adding static inline APIs in an LTS without extensive doesn't seem a good idea.

If Honnappa is OK with the above solution for 20.11, we can see about moving the rings part of the
code to rte_ring library location in 21.02, and give ourselves some time to settle the usage/API before
the next LTS.

Regards, -Harry



More information about the dev mailing list