[PATCH v2 2/2] lpm: add a scalar version of lookupx4 function

Bruce Richardson bruce.richardson at intel.com
Mon May 30 12:42:00 CEST 2022


On Mon, May 30, 2022 at 10:00:34AM +0200, Morten Brørup wrote:
> > From: Bruce Richardson [mailto:bruce.richardson at intel.com]
> > Sent: Monday, 30 May 2022 09.52
> > 
> > On Fri, May 27, 2022 at 01:15:20PM -0700, Stephen Hemminger wrote:
> > > On Fri, 27 May 2022 20:18:22 +0200
> > > Stanislaw Kardach <kda at semihalf.com> wrote:
> > >
> > > > +static inline void
> > > > +rte_lpm_lookupx4(const struct rte_lpm *lpm, xmm_t ip, uint32_t
> > hop[4],
> > > > +		uint32_t defv)
> > > > +{
> > > > +	uint32_t nh;
> > > > +	int i, ret;
> > > > +
> > > > +	for (i = 0; i < 4; i++) {
> > > > +		ret = rte_lpm_lookup(lpm, ((rte_xmm_t)ip).u32[i], &nh);
> > > > +		hop[i] = (ret == 0) ? nh : defv;
> > > > +	}
> > > > +}
> > >
> > > For performance, manually unroll the loop.
> > 
> > Given a constant 4x iterations, will compilers not unroll this
> > automatically. I think the loop is a little clearer if it can be kept
> > 
> > /Bruce
> 
> If in doubt, add this and look at the assembler output:
> 
> #define REVIEW_INLINE_FUNCTIONS 1
> 
> #if REVIEW_INLINE_FUNCTIONS /* For compiler output review purposes only. */
> #pragma GCC diagnostic push
> #pragma GCC diagnostic ignored "-Wmissing-prototypes"
> void review_rte_lpm_lookupx4(const struct rte_lpm *lpm, xmm_t ip, uint32_t hop[4], uint32_t defv)
> {
> 	rte_lpm_lookupx4(lpm, ip, hop, defv);
> }
> #pragma GCC diagnostic pop
> #endif /* REVIEW_INLINE_FUNCTIONS */
> 

Used godbolt.org to check and indeed the function is not unrolled.
(Gcc 11.2, with flags "-O3 -march=icelake-server").

Manually unrolling changes the assembly generated in interesting ways. For
example, it appears to generate more cmov-type instructions for the
miss/default-value case rather than using branches as in the looped
version. Whether this is better or not may depend upon usecase - if one
expects most lpm lookup entries to hit, then having (predictable) branches
may well be cheaper.

In any case, I'll withdraw any object to unrolling, but I'm still not
convinced it's necessary.

/Bruce


More information about the dev mailing list