[dpdk-dev] Odp.: [PATCH v3] examples/l3fwd: em path performance fix

Czekaj, Maciej Maciej.Czekaj at caviumnetworks.com
Tue Mar 15 20:42:57 CET 2016



________________________________________
Od: Kulasek, TomaszX <tomaszx.kulasek at intel.com>
Wysłane: 15 marca 2016 17:06
Do: Thomas Monjalon; Czekaj, Maciej
DW: dev at dpdk.org
Temat: RE: [dpdk-dev] [PATCH v3] examples/l3fwd: em path performance fix

> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Tuesday, March 15, 2016 15:50
> To: Kulasek, TomaszX <tomaszx.kulasek at intel.com>; Maciej Czekaj
> <maciej.czekaj at caviumnetworks.com>
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v3] examples/l3fwd: em path performance fix
>
> 2016-03-15 14:31, Kulasek, TomaszX:
> > From: Kulasek, TomaszX
> > > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > > > There is an error:
> > > > examples/l3fwd/l3fwd_em_hlm_sse.h:72:38: error:
> > > >         incompatible type for argument 2 of ‘_mm_and_si128’
> > >
> > > It's caused by
> > >
> > > commit 64d3955de1de4d7879a0930a6d2f501369d3445a
> > > Author: Maciej Czekaj <maciej.czekaj at caviumnetworks.com>
> > > Date:   Thu Mar 10 17:06:22 2016 +0100
> > >
> > >     examples/l3fwd: fix ARM build
> > >
> > >     Enable NEON support in exact match mode.
> > >     l3fwd example did not compile on ARM due to SSE2 instrincics used
> > >     in generic part.
> > >     Some instrinsins were used to initialize data structures and
> > > those were
> > >     replaced by ordinary structure initalization.
> > >     All SSE2 intrinsics used in forwarding, i.e. masking the IP/TCP
> header
> > >     are moved to single inline function and made arch-specific.
> > >
> > >     Signed-off-by: Maciej Czekaj <maciej.czekaj at caviumnetworks.com>
> > >
> > > Which doesn't include rework of l3fwd_em_hlm_sse.h file.
> > >
> > > When you compile it now with global "#define HASH_MULTI_LOOKUP 1"
> > > and alternative classification is used, and compilation will also fail
> now.
> > >
> > > I need a little bit more time to investigate it, because I'm not an
> > > expert in ARM. It will be nice if Maciej will help me in that.
> > >
> > > Tomasz
> >
> > Will be that ok for you to disable this path for arm?
>
> Please, what do you mean?
> Maciej, have you looked at this issue?

This fix uses platform specific part of code which wasn't reworked in previous patch for ARM. It causes compilation error.
What I mean, is to leave current classification path for ARM and turn on alternative only for Intel platform.

Like that:

60 +#if defined(NO_HASH_MULTI_LOOKUP) || defined(__ARM_NEON)
61  #include "l3fwd_em_sse.h"
62  #else
63  #include "l3fwd_em_hlm_sse.h"

Thanks guys for pointing this out. The issue is that after my patch mask0, mask1 and mask2 are now defined as:

static rte_xmm_t mask0;
static rte_xmm_t mask1;
static rte_xmm_t mask2;

rte_xmm_t is a union with xmm_t field inside.

Apparently, I overlooked the HASH_MULTI_LOOKUP define

I can provide a quick fix for that, I need to rename all maskN references to maskN.x, to point out to xmm_t variable. E.g. the following diff is fixing the compilation.

diff --git a/examples/l3fwd/l3fwd_em_hlm_sse.h b/examples/l3fwd/l3fwd_em_hlm_sse.h
index d3388da..eb23163 100644
--- a/examples/l3fwd/l3fwd_em_hlm_sse.h
+++ b/examples/l3fwd/l3fwd_em_hlm_sse.h
@@ -77,14 +77,14 @@ em_get_dst_port_ipv4x8(struct lcore_conf *qconf, struct rte_mbuf *m[8],
                                sizeof(struct ether_hdr) +
                                offsetof(struct ipv4_hdr, time_to_live)));
 
-       key[0].xmm = _mm_and_si128(data[0], mask0);
-       key[1].xmm = _mm_and_si128(data[1], mask0);
-       key[2].xmm = _mm_and_si128(data[2], mask0);
-       key[3].xmm = _mm_and_si128(data[3], mask0);
-       key[4].xmm = _mm_and_si128(data[4], mask0);
-       key[5].xmm = _mm_and_si128(data[5], mask0);
-       key[6].xmm = _mm_and_si128(data[6], mask0);
-       key[7].xmm = _mm_and_si128(data[7], mask0);
+       key[0].xmm = _mm_and_si128(data[0], mask0.x);
+       key[1].xmm = _mm_and_si128(data[1], mask0.x);
+       key[2].xmm = _mm_and_si128(data[2], mask0.x);
+       key[3].xmm = _mm_and_si128(data[3], mask0.x);
+       key[4].xmm = _mm_and_si128(data[4], mask0.x);
+       key[5].xmm = _mm_and_si128(data[5], mask0.x);
+       key[6].xmm = _mm_and_si128(data[6], mask0.x);
+       key[7].xmm = _mm_and_si128(data[7], mask0.x);
 
        const void *key_array[8] = {&key[0], &key[1], &key[2], &key[3],
                                &key[4], &key[5], &key[6], &key[7]};
@@ -175,14 +175,14 @@ em_get_dst_port_ipv6x8(struct lcore_conf *qconf, struct rte_mbuf *m[8],
        int32_t ret[8];
        union ipv6_5tuple_host key[8];
 
-       get_ipv6_5tuple(m[0], mask1, mask2, &key[0]);
-       get_ipv6_5tuple(m[1], mask1, mask2, &key[1]);
-       get_ipv6_5tuple(m[2], mask1, mask2, &key[2]);
-       get_ipv6_5tuple(m[3], mask1, mask2, &key[3]);
-       get_ipv6_5tuple(m[4], mask1, mask2, &key[4]);
-       get_ipv6_5tuple(m[5], mask1, mask2, &key[5]);
-       get_ipv6_5tuple(m[6], mask1, mask2, &key[6]);
-       get_ipv6_5tuple(m[7], mask1, mask2, &key[7]);
+       get_ipv6_5tuple(m[0], mask1.x, mask2.x, &key[0]);
+       get_ipv6_5tuple(m[1], mask1.x, mask2.x, &key[1]);
+       get_ipv6_5tuple(m[2], mask1.x, mask2.x, &key[2]);
+       get_ipv6_5tuple(m[3], mask1.x, mask2.x, &key[3]);
+       get_ipv6_5tuple(m[4], mask1.x, mask2.x, &key[4]);
+       get_ipv6_5tuple(m[5], mask1.x, mask2.x, &key[5]);
+       get_ipv6_5tuple(m[6], mask1.x, mask2.x, &key[6]);
+       get_ipv6_5tuple(m[7], mask1.x, mask2.x, &key[7]);
 
        const void *key_array[8] = {&key[0], &key[1], &key[2], &key[3],
                        &key[4], &key[5], &key[6], &key[7]};

Would you like me to re-post the patch?

Thanks
Maciej











More information about the dev mailing list