[dpdk-dev] [PATCH v5 4/8] net/ether: use bitops to speedup comparison
Olivier Matz
olivier.matz at 6wind.com
Tue Jul 2 11:26:15 CEST 2019
On Tue, Jul 02, 2019 at 09:53:14AM +0200, Olivier Matz wrote:
> Hi,
>
> On Mon, Jun 24, 2019 at 01:44:31PM -0700, Stephen Hemminger wrote:
> > Using bit operations like or and xor is faster than a loop
> > on all architectures. Really just explicit unrolling.
> >
> > Similar cast to uint16 unaligned is already done in
> > other functions here.
> >
> > Signed-off-by: Stephen Hemminger <stephen at networkplumber.org>
> > Reviewed-by: Andrew Rybchenko <arybchenko at solarflare.com>
> > ---
> > lib/librte_net/rte_ether.h | 17 +++++++----------
> > 1 file changed, 7 insertions(+), 10 deletions(-)
> >
> > diff --git a/lib/librte_net/rte_ether.h b/lib/librte_net/rte_ether.h
> > index 8edc7e217b25..feb35a33c94b 100644
> > --- a/lib/librte_net/rte_ether.h
> > +++ b/lib/librte_net/rte_ether.h
> > @@ -81,11 +81,10 @@ struct rte_ether_addr {
> > static inline int rte_is_same_ether_addr(const struct rte_ether_addr *ea1,
> > const struct rte_ether_addr *ea2)
> > {
> > - int i;
> > - for (i = 0; i < RTE_ETHER_ADDR_LEN; i++)
> > - if (ea1->addr_bytes[i] != ea2->addr_bytes[i])
> > - return 0;
> > - return 1;
> > + const unaligned_uint16_t *w1 = (const uint16_t *)ea1;
> > + const unaligned_uint16_t *w2 = (const uint16_t *)ea2;
> > +
> > + return ((w1[0] ^ w2[0]) | (w1[1] ^ w2[1]) | (w1[2] ^ w2[2])) == 0;
> > }
> >
> > /**
> > @@ -100,11 +99,9 @@ static inline int rte_is_same_ether_addr(const struct rte_ether_addr *ea1,
> > */
> > static inline int rte_is_zero_ether_addr(const struct rte_ether_addr *ea)
> > {
> > - int i;
> > - for (i = 0; i < RTE_ETHER_ADDR_LEN; i++)
> > - if (ea->addr_bytes[i] != 0x00)
> > - return 0;
> > - return 1;
> > + const unaligned_uint16_t *w = (const uint16_t *)ea;
> > +
> > + return (w[0] | w[1] | w[2]) == 0;
> > }
> >
> > /**
>
> I wonder if using memcmp() isn't faster with recent compilers (gcc >= 7).
> I tried it quickly, and it seems the generated code is good (no call):
> https://godbolt.org/z/9MOL7g
>
> It would avoid the use of unaligned_uint16_t, and the next patch that
> adds the alignment constraint.
As pointed out by Konstantin privately (I guess he wanted to do a
reply-all), the size of addr_bytes is wrong in my previous link (8
instead of 6). Thanks for catching it.
With 6, the gcc code is not as good: there is still no call to memcmp(),
but there are some jumps. With the latest clang, the generated code is
nice: https://godbolt.org/z/nfptnY
More information about the dev
mailing list