[dpdk-dev] [EXT] [PATCH 1/3] acl: fix arm argument types
Jerin Jacob Kollanukkaran
jerinj at marvell.com
Wed Apr 10 18:07:53 CEST 2019
On Wed, 2019-04-10 at 11:52 -0400, Aaron Conole wrote:
> Jerin Jacob Kollanukkaran <jerinj at marvell.com> writes:
>
> > On Mon, 2019-04-08 at 14:24 -0400, Aaron Conole wrote:
> > > ---------------------------------------------------------------
> > > ----
> > > ---
> > > Compiler complains of argument type mismatch, like:
> >
> > Can you share more details on how to reproduce this issue?
>
> It will be generated using the meson build after enabling the neon
> extension support (which isn't currently happening on ARM using meson
> as
> the build environment).
Can you share the patch to enable this for testing.
Since the additional memcpy in fastpath, I need to check the overhead
and check the possibility to avoid the memcpy to case.
>
> > We already have
> > CFLAGS_acl_run_neon.o += -flax-vector-conversions
> > in the Makefile.
> >
> > If you are taking out -flax-vector-conversions the correct way to
> > fix will be use vreinterpret*.
> >
> > For me the code looks clean, If unnecessary casting is avoided.
>
> I agree. I merely make explicit the casts that the compiler will be
> implicitly introducing.
>
> > > ../lib/librte_acl/acl_run_neon.h: In function ‘transition4’:
> > > ../lib/librte_acl/acl_run_neon.h:115:2: note: use -flax-
> > > vector-
> > > conversions
> > > to permit conversions between vectors with differing
> > > element
> > > types
> > > or numbers of subparts
> > > node_type = vbicq_s32(tr_hi_lo.val[0], index_msk);
> > > ^
> > > ../lib/librte_acl/acl_run_neon.h:115:41: error: incompatible
> > > type
> > > for
> > > argument 2 of ‘vbicq_s32’
> > >
> > > Signed-off-by: Aaron Conole <aconole at redhat.com>
> > > ---
> > > lib/librte_acl/acl_run_neon.h | 46 ++++++++++++++++++++---------
> > > ----
> > > --
> > > 1 file changed, 27 insertions(+), 19 deletions(-)
> > >
> > >
> > >
> > > /*
> > > @@ -179,6 +183,9 @@ search_neon_8(const struct rte_acl_ctx *ctx,
> > > const uint8_t **data,
> > > acl_match_check_x4(0, ctx, parms, &flows, &index_array[0]);
> > > acl_match_check_x4(4, ctx, parms, &flows, &index_array[4]);
> > >
> > > + memset(&input0, 0, sizeof(input0));
> > > + memset(&input1, 0, sizeof(input1));
> >
> > Why this memset only required for arm64? If it real issue,
> > Shouldn't
> > it required for x86 and ppc ?
>
> No. Please see the following lines (which is due to the ARM neon
> intrinsic for setting individual lanes):
>
> while (flows.started > 0) {
> /* Gather 4 bytes of input data for each stream. */
> input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0),
> input0, 0);
> input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4),
> input1, 0);
>
> Note: the first time through this loop, input0 and input1 appear on
> the
> rhs of the assignment before appearing on the lhs. This will
> generate
> an uninitialized value warning, even though the assignments are to
> individual lanes of the vector.
>
> I squelched the warning from the compiler in the most brute-force way
> possible. Perhaps it would be better to use a static initialization
> for
> the vector but this code was intended to be RFC and to generate
> feedback.
>
> I guess one alternate approach could be:
>
> static const int32x4_t ZERO_VEC;
> int32x4_t input0 = ZERO_VEC, input1 = ZERO_VEC;
>
> ...
>
> int32x4_t input = ZERO_VEC;
>
> This would have the benefit of keeping the initializer as 'fast' as
> possible (although I recall a memset under a certain size threshold
> is
> the same effect, but not certain).
>
> Either way, I prefer it to squelching the warning, since the warning
> has been found to catch legitimate errors many times.
I will get back to this after reproducing the issue locally.
More information about the dev
mailing list