[RFC PATCH] eventdev: ensure 16-byte alignment for events

Jerin Jacob jerinjacobk at gmail.com
Fri Oct 6 09:19:42 CEST 2023


On Thu, Oct 5, 2023 at 6:52 PM Bruce Richardson
<bruce.richardson at intel.com> wrote:
>
> On Thu, Oct 05, 2023 at 06:41:34PM +0530, Jerin Jacob wrote:
> > On Thu, Oct 5, 2023 at 6:01 PM Bruce Richardson
> > <bruce.richardson at intel.com> wrote:
> > >
> > > On Thu, Oct 05, 2023 at 12:51:00PM +0100, Bruce Richardson wrote:
> > > > The event structure in DPDK is 16-bytes in size, and events are
> > > > regularly passed as parameters directly rather than being passed as
> > > > pointers. To help compiler optimize correctly, we can explicitly request
> > > > 16-byte alignment for events, which means that we should be able
> > > > to do aligned vector loads/stores (e.g. with SSE or Neon) when working
> > > > with those events.
> > > >
> > > > Signed-off-by: Bruce Richardson <bruce.richardson at intel.com>
> > > > ---
> > > >  lib/eventdev/rte_eventdev.h | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/lib/eventdev/rte_eventdev.h b/lib/eventdev/rte_eventdev.h
> > > > index 2ba8a7b090..bb0d59b059 100644
> > > > --- a/lib/eventdev/rte_eventdev.h
> > > > +++ b/lib/eventdev/rte_eventdev.h
> > > > @@ -1344,7 +1344,7 @@ struct rte_event {
> > > >               struct rte_event_vector *vec;
> > > >               /**< Event vector pointer. */
> > > >       };
> > > > -};
> > > > +} __rte_aligned(16);
> > > >
> > >
> >
> > + Eventdev driver maintainers for review and for performance testing.
> >
> > > Looking for feedback on this idea - hence the fact this is going as an RFC.
> >
> > Are you seeing any performance improvement ? Look like only DLB2
> > driver only using SEE or AVX512 instructions.
> >
>
> The idea would be that the driver code (and eventdev code) should not need
> to use SSE directly. If we mark the event struct as aligned, it should help
> encourage the compiler to use these instructions under-the-hood. For
> example, when copying an event, the compiler should be emitting 128-bit
> loads and stores for most platforms.

With limited testing, there is no performance regression is seen. In
fact, the compiler is generating the
same instruction stream on both cases.

If there are no objections from others, Please send v1 with "ABI
change" update in doc/guides/rel_notes/release_23_11.rst.

With above change,
Acked-by: Jerin Jacob <jerinj at marvell.com>

NOTE: I already made PR to Thomas for rc1. Since is needs to part of
rc1, I need to sync with @Thomas Monjalon  as well.







>
> /Bruce


More information about the dev mailing list