[dpdk-dev] [PATCH 4/4] lib/librte_eal: Optimized memcpy in arch/x86/rte_memcpy.h for both SSE and AVX platforms

Wodkowski, PawelX pawelx.wodkowski at intel.com
Mon Jan 26 15:43:04 CET 2015


Hi,

I must say: greate work.

I have some small comments:

> +/**
> + * Macro for copying unaligned block from one location to another,
> + * 47 bytes leftover maximum,
> + * locations should not overlap.
> + * Requirements:
> + * - Store is aligned
> + * - Load offset is <offset>, which must be immediate value within [1, 15]
> + * - For <src>, make sure <offset> bit backwards & <16 - offset> bit forwards
> are available for loading
> + * - <dst>, <src>, <len> must be variables
> + * - __m128i <xmm0> ~ <xmm8> must be pre-defined
> + */
> +#define MOVEUNALIGNED_LEFT47(dst, src, len, offset)
> \
> +{                                                                                                           \
...
> +}

Why not do { ... } while(0) or ({ ... }) ? This could have unpredictable side
effects.

Second:
Why you completely substitute
#define rte_memcpy(dst, src, n)              \
	({ (__builtin_constant_p(n)) ?       \
	memcpy((dst), (src), (n)) :          \
	rte_memcpy_func((dst), (src), (n)); })

with inline rte_memcpy()? This construction  can help compiler to deduce
which version to use (static?) inline implementation or call external
function.

Did you try 'extern inline' type? It could help reducing compilation time.



More information about the dev mailing list