[dpdk-dev] [PATCH v4 1/3] eal/x86: run-time dispatch over memcpy
xiaoyun.li at intel.com
Tue Oct 3 01:10:43 CEST 2017
> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Tuesday, October 3, 2017 00:39
> To: Li, Xiaoyun <xiaoyun.li at intel.com>; Richardson, Bruce
> <bruce.richardson at intel.com>
> Cc: Lu, Wenzhuo <wenzhuo.lu at intel.com>; Zhang, Helin
> <helin.zhang at intel.com>; dev at dpdk.org
> Subject: RE: [PATCH v4 1/3] eal/x86: run-time dispatch over memcpy
> > -----Original Message-----
> > From: Li, Xiaoyun
> > Sent: Monday, October 2, 2017 5:13 PM
> > To: Ananyev, Konstantin <konstantin.ananyev at intel.com>; Richardson,
> Bruce <bruce.richardson at intel.com>
> > Cc: Lu, Wenzhuo <wenzhuo.lu at intel.com>; Zhang, Helin
> <helin.zhang at intel.com>; dev at dpdk.org; Li, Xiaoyun <xiaoyun.li at intel.com>
> > Subject: [PATCH v4 1/3] eal/x86: run-time dispatch over memcpy
> > This patch dynamically selects functions of memcpy at run-time based
> > on CPU flags that current machine supports. This patch uses function
> > pointers which are bind to the relative functions at constrctor time.
> > In addition, AVX512 instructions set would be compiled only if users
> > config it enabled and the compiler supports it.
> > Signed-off-by: Xiaoyun Li <xiaoyun.li at intel.com>
> > ---
> > v2
> > * Use gcc function multi-versioning to avoid compilation issues.
> > * Add macros for AVX512 and AVX2. Only if users enable AVX512 and the
> > compiler supports it, the AVX512 codes would be compiled. Only if the
> > compiler supports AVX2, the AVX2 codes would be compiled.
> > v3
> > * Reduce function calls via only keep rte_memcpy_xxx.
> > * Add conditions that when copy size is small, use inline code path.
> > Otherwise, use dynamic code path.
> > * To support attribute target, clang version must be greater than 3.7.
> > Otherwise, would choose SSE/AVX code path, the same as before.
> > * Move two mocro functions to the top of the code since they would be
> > used in inline SSE/AVX and dynamic SSE/AVX codes.
> > v4
> > * Modify rte_memcpy.h to several .c files and modify makefiles to compile
> > AVX2 and AVX512 files.
> Could you explain to me why instead of reusing existing rte_memcpy() code
> to generate _sse/_avx2/ax512f flavors you keep pushing changes with 3
> separate implementations?
> Obviously that is much more expensive in terms of maintenance and doesn't
> look like
> feasible solution to me.
> Is existing rte_memcpy() implementation is not good enough in terms of
> functionality and/or performance?
> If so, can you outline these problems and try to fix them first.
I just change many small functions to one function in those 3 separate functions.
Because the existing codes are totally inline, including rte_memcpy() itself. So the compilation will
change all rte_memcpy() calls into the basic codes like xmm0=xxx.
The existing codes in this way are OK. But when run-time, it will bring lots of function calls
and cause perf drop.
More information about the dev