[dpdk-dev] [EXT] Re: [PATCH v2] examples/l3fwd: fix unaligned memory access

Jerin Jacob Kollanukkaran jerinj at marvell.com
Fri Jul 26 12:58:08 CEST 2019



> -----Original Message-----
> From: David Christensen <drc at linux.vnet.ibm.com>
> Sent: Friday, July 26, 2019 3:36 AM
> To: Jerin Jacob Kollanukkaran <jerinj at marvell.com>; Bruce Richardson
> <bruce.richardson at intel.com>; hgovindh
> <hariprasad.govindharajan at intel.com>
> Cc: Remy Horton <remy.horton at intel.com>; Marko Kovacevic
> <marko.kovacevic at intel.com>; Ori Kam <orika at mellanox.com>; Pablo de
> Lara <pablo.de.lara.guarch at intel.com>; Radu Nicolau
> <radu.nicolau at intel.com>; Akhil Goyal <akhil.goyal at nxp.com>; Tomasz
> Kantecki <tomasz.kantecki at intel.com>; dev at dpdk.org;
> maciej.czekaj at caviumnetworks.com; stable at dpdk.org; Gavin Hu
> <gavin.hu at arm.com>
> Subject: [EXT] Re: [dpdk-dev] [PATCH v2] examples/l3fwd: fix unaligned
> memory access
> 
> 
>>>> Fix unaligned memory access when reading IPv6 header which leads to
> >>>> segmentation fault by changing aligned memory read to unaligned
> >>>> memory read.
> >>>>
> >>>> Bugzilla ID: 279
> >>>> Fixes: 64d3955de1de ("examples/l3fwd: fix ARM build")
> >>>> Cc: maciej.czekaj at caviumnetworks.com
> >>>> Cc: stable at dpdk.org
> >>>> Signed-off-by: hgovindh <hariprasad.govindharajan at intel.com>
> >>>> ---
> >>>> V2: Added functions which will do unaligned load based on the
> >>>> underlying architecture
> >>>> ---
> >>>> ---
> >>>>   examples/l3fwd/l3fwd_em.c | 26 ++++++++++++++++++++++++--
> >>>>   1 file changed, 24 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/examples/l3fwd/l3fwd_em.c
> b/examples/l3fwd/l3fwd_em.c
> >>>> index fa8f82be6..f2641586b 100644
> >>>> --- a/examples/l3fwd/l3fwd_em.c
> >>>> +++ b/examples/l3fwd/l3fwd_em.c
> >>>> @@ -244,6 +244,29 @@ em_mask_key(void *key, xmm_t
> mask)  #error No
> >>>> vector engine (SSE, NEON, ALTIVEC) available, check your toolchain
> >>>> #endif
> >>>>
> >>>> +#if defined(RTE_MACHINE_CPUFLAG_SSE2) static inline xmm_t
> >>>> +em_load_key(void *key) {
> >>>> +    return _mm_loadu_si128((__m128i *)(key)); } #elif
> >>>> +defined(RTE_MACHINE_CPUFLAG_NEON)
> >>>> +static inline xmm_t
> >>>> +em_load_key(void *key)
> >>>> +{
> >>>> +    return vld1q_s32((int32_t *)key); } #elif
> >>>> +defined(RTE_MACHINE_CPUFLAG_ALTIVEC)
> >>>> +static inline xmm_t
> >>>> +em_load_key(void *key)
> >>>> +{
> >>>> +    return vec_ld(0, (xmm_t *)(key)); }
> >>
> >> Added power pc maintainer
> >
> >> Not sure all architecture need SIMD instructions for access to
> >> unaligned memory location.
> >>
> >> @hgovindh,
> >> Could you provide exact setup details for reproducing this issue, I
> >> can test it on arm64.
> >> Like l3fwd command, Traffic generator traffic pattern
> >
> > The vec_ld() function requires 16 byte alignment.  (My understanding
> > is that GCC code will mask the lower four bits of the address to
> > enforce the requirement:
> > https://gcc.gcc.gnu.narkive.com/cJndcMpR/vec-ld-versus-vec-vsx-ld-on-p
> > ower8)
> >   Power 8 and later processors support the vec_vsx_ld() function which
> > does not have the same memory alignment requirements.
> >
> > I'll need to try and reproduce the original bug to see what code is
> > actually being generated.  Outside of vector instructions I wouldn't
> > expect to see errors with unaligned data references.
> 
> Tested original bugzilla 279 on Power 9 system with RHEL 7.6 and gcc 4.8.5, no
> segmentation fault observed after 30 minutes (observed segmentation fault
> on Intel system immediately).
> 
> Code dissassembly:
> (gdb) info line l3fwd_em.c:290
> Line 290 of "/home/davec/src/dpdk/examples/l3fwd/l3fwd_em.c" starts at
> address 0x10146fbc <em_main_loop+1660>
>     and ends at 0x10146fc0 <em_main_loop+1664>.
> (gdb) disass /m 0x10146fbc,0x10146fc0
> Dump of assembler code from 0x10146fbc to 0x10146fc0:
> 290		key.xmm[1] = *(xmm_t *)data1;
>     0x0000000010146fbc <em_main_loop+1660>:	li      r7,20
> 
> End of assembler dump.
> 
> Since vector element ordering is different on Intel vs Power/ARM, suggest
> only applying vector operation to Intel code at this time otherwise additional
> steps may be required to modify MASK values to match the new vector
> operations.

On arm64, Generated assembly is following. Where LDUR and STR works
With unaligned memory(i.e no need for special handling).
I would suggest to have eal function to abstract The difference between x86 vs Power/ARM
to avoid ifdef clutter in all the applications.

             key.xmm[1] = *(xmm_t *)data1;
   0x00000000004ebed4 <+1188>:  60 40 c1 3c     ldur    q0, [x3, #20]
   0x00000000004ebedc <+1196>:  a0 73 80 3d     str     q0, [x29, #448]
   0x00000000004ec064 <+1588>:  41 40 c1 3c     ldur    q1, [x2, #20]
   0x00000000004ec06c <+1596>:  a1 73 80 3d     str     q1, [x29, #448]

> 
> Dave



More information about the dev mailing list