[dpdk-dev] [PATCH v1] event/sw: performance improvements

Jerin Jacob jerinjacobk at gmail.com
Mon Oct 5 18:35:18 CEST 2020


On Tue, Sep 29, 2020 at 2:32 PM Nicolau, Radu <radu.nicolau at intel.com> wrote:
>
>
> On 9/28/2020 5:02 PM, Honnappa Nagarahalli wrote:
> > <snip>
> >>> Add minimum burst throughout the scheduler pipeline and a flush counter.
> >>> Replace ring API calls with local single threaded implementation where
> >>> possible.
> >>>
> >>> Signed-off-by: Radu Nicolau mailto:radu.nicolau at intel.com
> >>>
> >>> Thanks for the patch, a few comments inline.
> >>>

> >>> Why not make these APIs part of the rte_ring library? You could further
> >> optimize them by keeping the indices on the same cacheline.
> >>> I'm not sure there is any need for non thread-safe rings outside this
> >> particular case.
> >>> [Honnappa] I think if we add the APIs, we will find the use cases.
> >>> But, more than that, I understand that rte_ring structure is exposed to the
> >> application. The reason for doing that is the inline functions that rte_ring
> >> provides. IMO, we should still maintain modularity and should not use the
> >> internals of the rte_ring structure outside of the library.
> >>> +1 to that.
> >>>
> >>> BTW, is there any real perf benefit from such micor-optimisation?
> >> I'd tend to view these as use-case specific, and I'm not sure we should clutter
> >> up the ring library with yet more functions, especially since they can't be
> >> mixed with the existing enqueue/dequeue functions, since they don't use
> >> the head pointers.
> > IMO, the ring library is pretty organized with the recent addition of HTS/RTS modes. This can be one of the modes and should allow us to use the existing functions (though additional functions are required as well).
> > The other concern I have is, this implementation can be further optimized by using a single cache line for the pointers. It uses 2 cache lines just because of the layout of the rte_ring structure.
> > There was a question earlier about the performance improvements of this patch? Are there any % performance improvements that can be shared?
> > It is also possible to change the above functions to use the head/tail pointers from producer or the consumer cache line alone to check for perf differences.
>
> I don't have a % for the final improvement for this change alone, but
> there was some improvement in the memory overhead measurable during
> development, which very likely resulted in the whole optimization having
> more headroom.
>
> I agree that this may be further optimized, maybe by having a local
> implementation of a ring-like container instead.

Have we decided on the next steps for this patch? Is the plan to
supersede this patch and have different
one in rte_ring subsystem,


>


More information about the dev mailing list