[dpdk-dev] [EXT] [PATCH 1/3] acl: fix arm argument types

Aaron Conole aconole at redhat.com
Wed Apr 10 17:52:38 CEST 2019


Jerin Jacob Kollanukkaran <jerinj at marvell.com> writes:

> On Mon, 2019-04-08 at 14:24 -0400, Aaron Conole wrote:
>> -------------------------------------------------------------------
>> ---
>> Compiler complains of argument type mismatch, like:
>
> Can you share more details on how to reproduce this issue?

It will be generated using the meson build after enabling the neon
extension support (which isn't currently happening on ARM using meson as
the build environment).

> We already have
> CFLAGS_acl_run_neon.o += -flax-vector-conversions
> in the Makefile.
>
> If you are taking out -flax-vector-conversions the correct way to
> fix will be use vreinterpret*.
>
> For me the code looks clean, If unnecessary casting is avoided.

I agree.  I merely make explicit the casts that the compiler will be
implicitly introducing.

>
>> 
>>    ../lib/librte_acl/acl_run_neon.h: In function ‘transition4’:
>>    ../lib/librte_acl/acl_run_neon.h:115:2: note: use -flax-vector-
>> conversions
>>       to permit conversions between vectors with differing element
>> types
>>       or numbers of subparts
>>      node_type = vbicq_s32(tr_hi_lo.val[0], index_msk);
>>      ^
>>    ../lib/librte_acl/acl_run_neon.h:115:41: error: incompatible type
>> for
>>       argument 2 of ‘vbicq_s32’
>> 
>> Signed-off-by: Aaron Conole <aconole at redhat.com>
>> ---
>>  lib/librte_acl/acl_run_neon.h | 46 ++++++++++++++++++++-------------
>> --
>>  1 file changed, 27 insertions(+), 19 deletions(-)
>> 
>> 
>>  
>>  /*
>> @@ -179,6 +183,9 @@ search_neon_8(const struct rte_acl_ctx *ctx,
>> const uint8_t **data,
>>  	acl_match_check_x4(0, ctx, parms, &flows, &index_array[0]);
>>  	acl_match_check_x4(4, ctx, parms, &flows, &index_array[4]);
>>  
>> +	memset(&input0, 0, sizeof(input0));
>> +	memset(&input1, 0, sizeof(input1));
>
> Why this memset only required for arm64? If it real issue, Shouldn't
> it required for x86 and ppc ?

No.  Please see the following lines (which is due to the ARM neon
intrinsic for setting individual lanes):

	while (flows.started > 0) {
		/* Gather 4 bytes of input data for each stream. */
		input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input0, 0);
		input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), input1, 0);

Note: the first time through this loop, input0 and input1 appear on the
rhs of the assignment before appearing on the lhs.  This will generate
an uninitialized value warning, even though the assignments are to
individual lanes of the vector.

I squelched the warning from the compiler in the most brute-force way
possible.  Perhaps it would be better to use a static initialization for
the vector but this code was intended to be RFC and to generate
feedback.

I guess one alternate approach could be:

   static const int32x4_t ZERO_VEC;
   int32x4_t input0 = ZERO_VEC, input1 = ZERO_VEC;

   ...

   int32x4_t input = ZERO_VEC;

This would have the benefit of keeping the initializer as 'fast' as
possible (although I recall a memset under a certain size threshold is
the same effect, but not certain).

Either way, I prefer it to squelching the warning, since the warning
has been found to catch legitimate errors many times.


More information about the dev mailing list