[dpdk-dev] [PATCH] acl: fix build issue with some arm64 compiler

Jerin Jacob Kollanukkaran jerinj at marvell.com
Tue Jun 11 16:24:57 CEST 2019


> -----Original Message-----
> From: Honnappa Nagarahalli <Honnappa.Nagarahalli at arm.com>
> Sent: Tuesday, June 11, 2019 6:58 AM
> To: Jerin Jacob Kollanukkaran <jerinj at marvell.com>; dev at dpdk.org
> Cc: thomas at monjalon.net; Gavin Hu (Arm Technology China)
> <Gavin.Hu at arm.com>; msantana at redhat.com; aconole at redhat.com;
> stable at dpdk.org; Honnappa Nagarahalli <Honnappa.Nagarahalli at arm.com>;
> nd <nd at arm.com>; nd <nd at arm.com>
> Subject: [EXT] RE: [dpdk-dev] [PATCH] acl: fix build issue with some arm64
> compiler
> 
> > >
> > > >
> > > > Since it used in fastpath, a temp variable would be additional
> > > > cost for no reason.
> > > Then, I would suggest we can go with using 'vdupq_n_s32'.
> >
> > We have to form uint64x2_t with 4 x uint32_t variable, How does
> > 'vdupq_n_s32' help here?
> We would use 'vdupq_n_s32' only for the first initialization, the rest of the code
> remains the same (see the diff below)
> 
> > Can you share code snippet without any temp variable?
> diff --git a/lib/librte_acl/acl_run_neon.h b/lib/librte_acl/acl_run_neon.h index
> 01b9766d8..b3196cd12 100644
> --- a/lib/librte_acl/acl_run_neon.h
> +++ b/lib/librte_acl/acl_run_neon.h
> @@ -181,8 +181,8 @@ search_neon_8(const struct rte_acl_ctx *ctx, const
> uint8_t **data,
> 
>         while (flows.started > 0) {
>                 /* Gather 4 bytes of input data for each stream. */
> -               input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input0, 0);
> -               input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), input1, 0);
> +               input0 = vdupq_n_s32(GET_NEXT_4BYTES(parms, 0));
> +               input1 = vdupq_n_s32(GET_NEXT_4BYTES(parms, 4));
> 
>                 input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input0, 1);
>                 input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 5), input1, 1); @@ -
> 242,7 +242,7 @@ search_neon_4(const struct rte_acl_ctx *ctx, const uint8_t
> **data,
> 
>         while (flows.started > 0) {
>                 /* Gather 4 bytes of input data for each stream. */
> -               input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input, 0);
> +               input = vdupq_n_s32(GET_NEXT_4BYTES(parms, 0));
>                 input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input, 1);
>                 input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), input, 2);
>                 input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input, 3);
> 
> My understanding is that the generated code for both your patch and my
> changes above is the same. Above suggested changes will conform to ACLE
> recommendation.

Though instructions are different. Effective cycles are same even though
First dup updates the four positions.
To make forward progress send the v2 based on the updated logic
 just to make ACLE  Spec happy, I don’t see any real reason to do it though 😊

http://patches.dpdk.org/patch/54656/




More information about the dev mailing list