[dpdk-dev] Odp.: [PATCH v3] examples/l3fwd: em path performance fix
Czekaj, Maciej
Maciej.Czekaj at caviumnetworks.com
Tue Mar 15 20:42:57 CET 2016
________________________________________
Od: Kulasek, TomaszX <tomaszx.kulasek at intel.com>
Wysłane: 15 marca 2016 17:06
Do: Thomas Monjalon; Czekaj, Maciej
DW: dev at dpdk.org
Temat: RE: [dpdk-dev] [PATCH v3] examples/l3fwd: em path performance fix
> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Tuesday, March 15, 2016 15:50
> To: Kulasek, TomaszX <tomaszx.kulasek at intel.com>; Maciej Czekaj
> <maciej.czekaj at caviumnetworks.com>
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v3] examples/l3fwd: em path performance fix
>
> 2016-03-15 14:31, Kulasek, TomaszX:
> > From: Kulasek, TomaszX
> > > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > > > There is an error:
> > > > examples/l3fwd/l3fwd_em_hlm_sse.h:72:38: error:
> > > > incompatible type for argument 2 of ‘_mm_and_si128’
> > >
> > > It's caused by
> > >
> > > commit 64d3955de1de4d7879a0930a6d2f501369d3445a
> > > Author: Maciej Czekaj <maciej.czekaj at caviumnetworks.com>
> > > Date: Thu Mar 10 17:06:22 2016 +0100
> > >
> > > examples/l3fwd: fix ARM build
> > >
> > > Enable NEON support in exact match mode.
> > > l3fwd example did not compile on ARM due to SSE2 instrincics used
> > > in generic part.
> > > Some instrinsins were used to initialize data structures and
> > > those were
> > > replaced by ordinary structure initalization.
> > > All SSE2 intrinsics used in forwarding, i.e. masking the IP/TCP
> header
> > > are moved to single inline function and made arch-specific.
> > >
> > > Signed-off-by: Maciej Czekaj <maciej.czekaj at caviumnetworks.com>
> > >
> > > Which doesn't include rework of l3fwd_em_hlm_sse.h file.
> > >
> > > When you compile it now with global "#define HASH_MULTI_LOOKUP 1"
> > > and alternative classification is used, and compilation will also fail
> now.
> > >
> > > I need a little bit more time to investigate it, because I'm not an
> > > expert in ARM. It will be nice if Maciej will help me in that.
> > >
> > > Tomasz
> >
> > Will be that ok for you to disable this path for arm?
>
> Please, what do you mean?
> Maciej, have you looked at this issue?
This fix uses platform specific part of code which wasn't reworked in previous patch for ARM. It causes compilation error.
What I mean, is to leave current classification path for ARM and turn on alternative only for Intel platform.
Like that:
60 +#if defined(NO_HASH_MULTI_LOOKUP) || defined(__ARM_NEON)
61 #include "l3fwd_em_sse.h"
62 #else
63 #include "l3fwd_em_hlm_sse.h"
Thanks guys for pointing this out. The issue is that after my patch mask0, mask1 and mask2 are now defined as:
static rte_xmm_t mask0;
static rte_xmm_t mask1;
static rte_xmm_t mask2;
rte_xmm_t is a union with xmm_t field inside.
Apparently, I overlooked the HASH_MULTI_LOOKUP define
I can provide a quick fix for that, I need to rename all maskN references to maskN.x, to point out to xmm_t variable. E.g. the following diff is fixing the compilation.
diff --git a/examples/l3fwd/l3fwd_em_hlm_sse.h b/examples/l3fwd/l3fwd_em_hlm_sse.h
index d3388da..eb23163 100644
--- a/examples/l3fwd/l3fwd_em_hlm_sse.h
+++ b/examples/l3fwd/l3fwd_em_hlm_sse.h
@@ -77,14 +77,14 @@ em_get_dst_port_ipv4x8(struct lcore_conf *qconf, struct rte_mbuf *m[8],
sizeof(struct ether_hdr) +
offsetof(struct ipv4_hdr, time_to_live)));
- key[0].xmm = _mm_and_si128(data[0], mask0);
- key[1].xmm = _mm_and_si128(data[1], mask0);
- key[2].xmm = _mm_and_si128(data[2], mask0);
- key[3].xmm = _mm_and_si128(data[3], mask0);
- key[4].xmm = _mm_and_si128(data[4], mask0);
- key[5].xmm = _mm_and_si128(data[5], mask0);
- key[6].xmm = _mm_and_si128(data[6], mask0);
- key[7].xmm = _mm_and_si128(data[7], mask0);
+ key[0].xmm = _mm_and_si128(data[0], mask0.x);
+ key[1].xmm = _mm_and_si128(data[1], mask0.x);
+ key[2].xmm = _mm_and_si128(data[2], mask0.x);
+ key[3].xmm = _mm_and_si128(data[3], mask0.x);
+ key[4].xmm = _mm_and_si128(data[4], mask0.x);
+ key[5].xmm = _mm_and_si128(data[5], mask0.x);
+ key[6].xmm = _mm_and_si128(data[6], mask0.x);
+ key[7].xmm = _mm_and_si128(data[7], mask0.x);
const void *key_array[8] = {&key[0], &key[1], &key[2], &key[3],
&key[4], &key[5], &key[6], &key[7]};
@@ -175,14 +175,14 @@ em_get_dst_port_ipv6x8(struct lcore_conf *qconf, struct rte_mbuf *m[8],
int32_t ret[8];
union ipv6_5tuple_host key[8];
- get_ipv6_5tuple(m[0], mask1, mask2, &key[0]);
- get_ipv6_5tuple(m[1], mask1, mask2, &key[1]);
- get_ipv6_5tuple(m[2], mask1, mask2, &key[2]);
- get_ipv6_5tuple(m[3], mask1, mask2, &key[3]);
- get_ipv6_5tuple(m[4], mask1, mask2, &key[4]);
- get_ipv6_5tuple(m[5], mask1, mask2, &key[5]);
- get_ipv6_5tuple(m[6], mask1, mask2, &key[6]);
- get_ipv6_5tuple(m[7], mask1, mask2, &key[7]);
+ get_ipv6_5tuple(m[0], mask1.x, mask2.x, &key[0]);
+ get_ipv6_5tuple(m[1], mask1.x, mask2.x, &key[1]);
+ get_ipv6_5tuple(m[2], mask1.x, mask2.x, &key[2]);
+ get_ipv6_5tuple(m[3], mask1.x, mask2.x, &key[3]);
+ get_ipv6_5tuple(m[4], mask1.x, mask2.x, &key[4]);
+ get_ipv6_5tuple(m[5], mask1.x, mask2.x, &key[5]);
+ get_ipv6_5tuple(m[6], mask1.x, mask2.x, &key[6]);
+ get_ipv6_5tuple(m[7], mask1.x, mask2.x, &key[7]);
const void *key_array[8] = {&key[0], &key[1], &key[2], &key[3],
&key[4], &key[5], &key[6], &key[7]};
Would you like me to re-post the patch?
Thanks
Maciej
More information about the dev
mailing list