[dpdk-dev] [PATCH 10/17] librte_acl: add AVX2 as new rte_acl_classify() method

Neil Horman nhorman at tuxdriver.com
Mon Dec 15 21:20:43 CET 2014


On Mon, Dec 15, 2014 at 04:33:47PM +0000, Ananyev, Konstantin wrote:
> Hi Neil,
> 
> > -----Original Message-----
> > From: Neil Horman [mailto:nhorman at tuxdriver.com]
> > Sent: Monday, December 15, 2014 4:00 PM
> > To: Ananyev, Konstantin
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH 10/17] librte_acl: add AVX2 as new rte_acl_classify() method
> > 
> > On Sun, Dec 14, 2014 at 06:10:52PM +0000, Konstantin Ananyev wrote:
> > > Introduce new classify() method that uses AVX2 instructions.
> > > From my measurements:
> > > On HSW boards when processing >= 16 packets per call,
> > > AVX2 method outperforms it's SSE counterpart by 10-25%,
> > > (depending on the ruleset).
> > > At runtime, this method is selected as default one on HW that supports AVX2.
> > >
> > > Signed-off-by: Konstantin Ananyev <konstantin.ananyev at intel.com>
> > > ---
> > >  lib/librte_acl/Makefile       |   9 +
> > >  lib/librte_acl/acl.h          |   4 +
> > >  lib/librte_acl/acl_run.h      |   2 +-
> > >  lib/librte_acl/acl_run_avx2.c |  58 +++++
> > >  lib/librte_acl/acl_run_avx2.h | 305 ++++++++++++++++++++++++
> > >  lib/librte_acl/acl_run_sse.c  | 537 +-----------------------------------------
> > >  lib/librte_acl/acl_run_sse.h  | 533 +++++++++++++++++++++++++++++++++++++++++
> > >  lib/librte_acl/rte_acl.c      |   5 +-
> > >  lib/librte_acl/rte_acl.h      |   2 +
> > >  9 files changed, 917 insertions(+), 538 deletions(-)
> > >  create mode 100644 lib/librte_acl/acl_run_avx2.c
> > >  create mode 100644 lib/librte_acl/acl_run_avx2.h
> > >  create mode 100644 lib/librte_acl/acl_run_sse.h
> > >
> > > diff --git a/lib/librte_acl/Makefile b/lib/librte_acl/Makefile
> > > index 65e566d..223ec31 100644
> > > --- a/lib/librte_acl/Makefile
> > > +++ b/lib/librte_acl/Makefile
> > > @@ -45,8 +45,17 @@ SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_bld.c
> > >  SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_gen.c
> > >  SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_run_scalar.c
> > >  SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_run_sse.c
> > > +SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_run_avx2.c
> > >
> > >  CFLAGS_acl_run_sse.o += -msse4.1
> > > +ifeq ($(CC), icc)
> > > +CFLAGS_acl_run_avx2.o += -march=core-avx2
> > > +else ifneq ($(shell \
> > > +test $(GCC_MAJOR_VERSION) -le 4 -a $(GCC_MINOR_VERSION) -le 6 && echo 1), 1)
> > > +CFLAGS_acl_run_avx2.o += -mavx2
> > > +else
> > > +CFLAGS_acl_run_avx2.o += -msse4.1
> > > +endif
> > >
> > This seems broken.  You've unilaterally included acl_run_avx2.c in the build
> > list above, but only enable -mavx2 if the compiler is at least gcc 4.6.
> 
> Actually 4.7 (before that version, as I know,  gcc doesn't support avx2) 
> 
> >  Unless
> > you want to make gcc 4.6 a requirement for building,
> 
> I believe DPDK is required to be buildable by gcc 4.6
> As I remember, we have to support it all way down to gcc 4.3.
> 
> > you need to also exclude
> > the file above from the build list.
> 
> That means that for  gcc 4.6 and below rte_acl_classify_avx2() would not be defined.
> And then at runtime, I have to check for that somehow and (re)populate classify_fns[]. 
> Doesn't seems like a good way to me.
There are plenty of ways around that.

At a minimum you could make the classify_fns array the one place that you need
to add an ifdef __AVX__ call.

You could also create a secondary definition of rte_acl_classify_avx2, and mark
it as a weak symbol, which only returns -EOPNOTSUPP.  That would be good, since
the right thing will just automatically happen then if you don't build the
actual avx2 classification code

> Instead, I prefer to always build acl_run_avx2.c,
But you can't do that.  You just said above that you need to support down to gcc
4.3.  I see you've worked around that with some additional ifdef __AVX__
instructions, but in so doing you ignore the possibiity that sse isn't
supported, so you need to add __SSE__ checks now as well.  ifdeffing that much
just isn't scalable.  And for your effort, you get an AVX2 classification path
that potentially doesn't actually do vectorized classification.

It really seems better to me to not build the code if the compiler doesn't
support the instruction set it was meant to enable, and change the
classification function pointer to something that informs the user of the lack
of support at run time.

> but for old compilers that don't support AVX2 -
> rte_acl_classify_avx2() would simply be identical to rte_acl_classify_sse(). 
> 
That doesn't make sense to me, for two reasons:

1) What if the machine being targeted doesn't support sse either?

2) If an application selects an AVX2 classifier, I as a developer expect to
either get AVX2 based classification, or an error indicating that I can't do
AVX2 classification, not a silent performance degradation down to scalar
classifcation.

> >  That in turn I think allows you to remove a
> > bunch of the ifdeffing that you've done in some of the avx2 specific files.
> 
> Actually there are not many of them.
> One in acl_run_avx2.h and another in acl_run_avx2.c.
> 
2 in acl_run_avx2.h and 1 in rte_acl_osdep_alone.h, which is really 3 more than
you need if you just do an intellegent weak classifier function defintion.

Neil


More information about the dev mailing list