[dpdk-dev] [PATCH v3] Implement memcmp using SIMD intrinsics
Bruce Richardson
bruce.richardson at intel.com
Fri Jun 12 11:03:35 CEST 2015
On Fri, Jun 12, 2015 at 10:30:56AM +0200, Ondřej Bílka wrote:
> On Mon, May 18, 2015 at 01:01:42PM -0700, Ravi Kerur wrote:
> > Background:
> > After preliminary discussion with John (Zhihong) and Tim from Intel it was
> > decided that it would be beneficial to use AVX/SSE intrinsics for memcmp
> > similar to memcpy that had been implemeneted. In addition, we decided to use
> > librte_hash as a test candidate to test both functionality and performance.
> >
> > Further discussions lead to complete functionality implementation of memory
> > comparison and v3 code reflects that.
> >
> > Test was conducted on Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz, Ubuntu 14.04,
> > x86_64, 16GB DDR3 system.
> >
> > Ravi Kerur (1):
> > Implement memcmp using Intel SIMD instrinsics.
>
> As my previous mail got lost I am resending it.
>
> In short you shouldn't
> use sse2/avx2 for memcmp at all. In 95% of calls you find inequality in
> first 8 bytes so sse2 adds just unnecessary overhead versus checking
> these with.
>
> 190: 48 8b 4e 08 mov 0x8(%rsi),%rcx
> 194: 48 39 4f 08 cmp %rcx,0x8(%rdi)
> 198: 75 f3 jne 18d <memeq30+0xd>
>
> Also as you have full memcmp does in your gcc optimize out
> if (memcmp(x,y))
> like in mine?
>
> So run also implementation below in your benchmark, my guess is it will
> be faster.
>
<snip for brevity>
Thanks for the contribution. It's very informative!
/Bruce
More information about the dev
mailing list