[dpdk-dev] [PATCH v3 18/18] lpm: choose vector path at runtime

Ananyev, Konstantin konstantin.ananyev at intel.com
Thu Oct 8 16:40:24 CEST 2020
Previous message: [dpdk-dev] [PATCH] net/bnxt: remove unused macros and fields
Next message: [dpdk-dev] [PATCH v3 18/18] lpm: choose vector path at runtime
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
> 
> Hi Ciara,
> 
> 
> On 30/09/2020 14:04, Ciara Power wrote:
> > When choosing the vector path, max SIMD bitwidth is now checked to
> > ensure a vector path is allowable. To do this, rather than the vector
> > lookup functions being called directly from apps, a generic lookup
> > function is called which will call the vector functions if suitable.
> >
> > Signed-off-by: Ciara Power <ciara.power at intel.com>
> > ---
> >   lib/librte_lpm/rte_lpm.h         | 57 ++++++++++++++++++++++++++------
> >   lib/librte_lpm/rte_lpm_altivec.h |  2 +-
> >   lib/librte_lpm/rte_lpm_neon.h    |  2 +-
> >   lib/librte_lpm/rte_lpm_sse.h     |  2 +-
> >   4 files changed, 50 insertions(+), 13 deletions(-)
> >
> > diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
> > index 03da2d37e0..edba7cafd5 100644
> > --- a/lib/librte_lpm/rte_lpm.h
> > +++ b/lib/librte_lpm/rte_lpm.h
> > @@ -397,8 +397,18 @@ rte_lpm_lookup_bulk_func(const struct rte_lpm *lpm, const uint32_t *ips,
> >   /* Mask four results. */
> >   #define	 RTE_LPM_MASKX4_RES	UINT64_C(0x00ffffff00ffffff)
> >
> > +#if defined(RTE_ARCH_ARM) || defined(RTE_ARCH_ARM64)
> > +#include "rte_lpm_neon.h"
> > +#elif defined(RTE_ARCH_PPC_64)
> > +#include "rte_lpm_altivec.h"
> > +#else
> > +#include "rte_lpm_sse.h"
> > +#endif
> > +
> >   /**
> > - * Lookup four IP addresses in an LPM table.
> > + * Lookup four IP addresses in an LPM table individually by calling the
> > + * lookup function for each ip. This is used when lookupx4 is called but
> > + * the vector path is not suitable.
> >    *
> >    * @param lpm
> >    *   LPM object handle
> > @@ -417,16 +427,43 @@ rte_lpm_lookup_bulk_func(const struct rte_lpm *lpm, const uint32_t *ips,
> >    *   if lookup would fail.
> >    */
> >   static inline void
> > -rte_lpm_lookupx4(const struct rte_lpm *lpm, xmm_t ip, uint32_t hop[4],
> > -	uint32_t defv);
> > +rte_lpm_lookupx4_scalar(struct rte_lpm *lpm, xmm_t ip, uint32_t hop[4],
> > +	uint32_t defv)
> > +{
> > +	int i;
> > +	for (i = 0; i < 4; i++)
> > +		if (rte_lpm_lookup(lpm, ((uint32_t *) &ip)[i], &hop[i]) < 0)
> > +			hop[i] = defv; /* lookupx4 expected to set on failure */
> > +}
> >
> > -#if defined(RTE_ARCH_ARM) || defined(RTE_ARCH_ARM64)
> > -#include "rte_lpm_neon.h"
> > -#elif defined(RTE_ARCH_PPC_64)
> > -#include "rte_lpm_altivec.h"
> > -#else
> > -#include "rte_lpm_sse.h"
> > -#endif
> > +/**
> > + * Lookup four IP addresses in an LPM table.
> > + *
> > + * @param lpm
> > + *   LPM object handle
> > + * @param ip
> > + *   Four IPs to be looked up in the LPM table
> > + * @param hop
> > + *   Next hop of the most specific rule found for IP (valid on lookup hit only).
> > + *   This is an 4 elements array of two byte values.
> > + *   If the lookup was successful for the given IP, then least significant byte
> > + *   of the corresponding element is the  actual next hop and the most
> > + *   significant byte is zero.
> > + *   If the lookup for the given IP failed, then corresponding element would
> > + *   contain default value, see description of then next parameter.
> > + * @param defv
> > + *   Default value to populate into corresponding element of hop[] array,
> > + *   if lookup would fail.
> > + */
> > +static inline void
> > +rte_lpm_lookupx4(struct rte_lpm *lpm, xmm_t ip, uint32_t hop[4],
> > +	uint32_t defv)
> > +{
> > +	if (rte_get_max_simd_bitwidth() >= RTE_MAX_128_SIMD)
> > +		rte_lpm_lookupx4_vec(lpm, ip, hop, defv);
> > +	else
> > +		rte_lpm_lookupx4_scalar(lpm, ip, hop, defv);
> > +}
> 
> I'm afraid this will lead to a drop in performance. rte_lpm_lookupx4 is
> used in the hot path, and a bulk size is too small to amortize the cost
> of adding this extra logic.

I do share Vladimir's concern regarding performance here.
As I said in other mail - it seems not much point to insert
these checks into inline SSE specific function, as SSE is enabled
by default for all x86 builds. 

As another more generic thought - might be better to avoid
these checks in other public SIMD-specific inline functions (if any).
If such function get called from some .c, then at least such SIMD
ISA is already enabled for that .c file and I think this check should be
left for caller to do.     
 
> >
> >   #ifdef __cplusplus
> >   }
> > diff --git a/lib/librte_lpm/rte_lpm_altivec.h b/lib/librte_lpm/rte_lpm_altivec.h
> > index 228c41b38e..82142d3351 100644
> > --- a/lib/librte_lpm/rte_lpm_altivec.h
> > +++ b/lib/librte_lpm/rte_lpm_altivec.h
> > @@ -16,7 +16,7 @@ extern "C" {
> >   #endif
> >
> >   static inline void
> > -rte_lpm_lookupx4(const struct rte_lpm *lpm, xmm_t ip, uint32_t hop[4],
> > +rte_lpm_lookupx4_vec(const struct rte_lpm *lpm, xmm_t ip, uint32_t hop[4],
> >   	uint32_t defv)
> >   {
> >   	vector signed int i24;
> > diff --git a/lib/librte_lpm/rte_lpm_neon.h b/lib/librte_lpm/rte_lpm_neon.h
> > index 6c131d3125..14b184515d 100644
> > --- a/lib/librte_lpm/rte_lpm_neon.h
> > +++ b/lib/librte_lpm/rte_lpm_neon.h
> > @@ -16,7 +16,7 @@ extern "C" {
> >   #endif
> >
> >   static inline void
> > -rte_lpm_lookupx4(const struct rte_lpm *lpm, xmm_t ip, uint32_t hop[4],
> > +rte_lpm_lookupx4_vec(const struct rte_lpm *lpm, xmm_t ip, uint32_t hop[4],
> >   	uint32_t defv)
> >   {
> >   	uint32x4_t i24;
> > diff --git a/lib/librte_lpm/rte_lpm_sse.h b/lib/librte_lpm/rte_lpm_sse.h
> > index 44770b6ff8..cb5477c6cf 100644
> > --- a/lib/librte_lpm/rte_lpm_sse.h
> > +++ b/lib/librte_lpm/rte_lpm_sse.h
> > @@ -15,7 +15,7 @@ extern "C" {
> >   #endif
> >
> >   static inline void
> > -rte_lpm_lookupx4(const struct rte_lpm *lpm, xmm_t ip, uint32_t hop[4],
> > +rte_lpm_lookupx4_vec(const struct rte_lpm *lpm, xmm_t ip, uint32_t hop[4],
> >   	uint32_t defv)
> >   {
> >   	__m128i i24;
> >
> 
> --
> Regards,
> Vladimir
Previous message: [dpdk-dev] [PATCH] net/bnxt: remove unused macros and fields
Next message: [dpdk-dev] [PATCH v3 18/18] lpm: choose vector path at runtime
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the dev mailing list