[dpdk-dev] [PATCH 1/4] eal/common: introduce rte_memset on IA platform
Yang, Zhiyong
zhiyong.yang at intel.com
Fri Dec 16 03:15:39 CET 2016
Hi,Konstantin:
> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Thursday, December 15, 2016 6:54 PM
> To: Yang, Zhiyong <zhiyong.yang at intel.com>; Thomas Monjalon
> <thomas.monjalon at 6wind.com>
> Cc: dev at dpdk.org; yuanhan.liu at linux.intel.com; Richardson, Bruce
> <bruce.richardson at intel.com>; De Lara Guarch, Pablo
> <pablo.de.lara.guarch at intel.com>
> Subject: RE: [dpdk-dev] [PATCH 1/4] eal/common: introduce rte_memset on
> IA platform
>
> Hi Zhiyong,
>
> > -----Original Message-----
> > From: Yang, Zhiyong
> > Sent: Thursday, December 15, 2016 6:51 AM
> > To: Yang, Zhiyong <zhiyong.yang at intel.com>; Ananyev, Konstantin
> > <konstantin.ananyev at intel.com>; Thomas Monjalon
> > <thomas.monjalon at 6wind.com>
> > Cc: dev at dpdk.org; yuanhan.liu at linux.intel.com; Richardson, Bruce
> > <bruce.richardson at intel.com>; De Lara Guarch, Pablo
> > <pablo.de.lara.guarch at intel.com>
> > Subject: RE: [dpdk-dev] [PATCH 1/4] eal/common: introduce rte_memset
> > on IA platform
> >
> > Hi, Thomas, Konstantin:
> >
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Yang, Zhiyong
> > > Sent: Sunday, December 11, 2016 8:33 PM
> > > To: Ananyev, Konstantin <konstantin.ananyev at intel.com>; Thomas
> > > Monjalon <thomas.monjalon at 6wind.com>
> > > Cc: dev at dpdk.org; yuanhan.liu at linux.intel.com; Richardson, Bruce
> > > <bruce.richardson at intel.com>; De Lara Guarch, Pablo
> > > <pablo.de.lara.guarch at intel.com>
> > > Subject: Re: [dpdk-dev] [PATCH 1/4] eal/common: introduce
> rte_memset
> > > on IA platform
> > >
> > > Hi, Konstantin, Bruce:
> > >
> > > > -----Original Message-----
> > > > From: Ananyev, Konstantin
> > > > Sent: Thursday, December 8, 2016 6:31 PM
> > > > To: Yang, Zhiyong <zhiyong.yang at intel.com>; Thomas Monjalon
> > > > <thomas.monjalon at 6wind.com>
> > > > Cc: dev at dpdk.org; yuanhan.liu at linux.intel.com; Richardson, Bruce
> > > > <bruce.richardson at intel.com>; De Lara Guarch, Pablo
> > > > <pablo.de.lara.guarch at intel.com>
> > > > Subject: RE: [dpdk-dev] [PATCH 1/4] eal/common: introduce
> > > > rte_memset on IA platform
> > > >
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Yang, Zhiyong
> > > > > Sent: Thursday, December 8, 2016 9:53 AM
> > > > > To: Ananyev, Konstantin <konstantin.ananyev at intel.com>; Thomas
> > > > > Monjalon <thomas.monjalon at 6wind.com>
> > > > > Cc: dev at dpdk.org; yuanhan.liu at linux.intel.com; Richardson, Bruce
> > > > > <bruce.richardson at intel.com>; De Lara Guarch, Pablo
> > > > > <pablo.de.lara.guarch at intel.com>
> > > > > Subject: RE: [dpdk-dev] [PATCH 1/4] eal/common: introduce
> > > > > rte_memset on IA platform
> > > > >
> > > > extern void *(*__rte_memset_vector)( (void *s, int c, size_t n);
> > > >
> > > > static inline void*
> > > > rte_memset_huge(void *s, int c, size_t n) {
> > > > return __rte_memset_vector(s, c, n); }
> > > >
> > > > static inline void *
> > > > rte_memset(void *s, int c, size_t n) {
> > > > If (n < XXX)
> > > > return rte_memset_scalar(s, c, n);
> > > > else
> > > > return rte_memset_huge(s, c, n); }
> > > >
> > > > XXX could be either a define, or could also be a variable, so it
> > > > can be setuped at startup, depending on the architecture.
> > > >
> > > > Would that work?
> > > > Konstantin
> > > >
> > I have implemented the code for choosing the functions at run time.
> > rte_memcpy is used more frequently, So I test it at run time.
> >
> > typedef void *(*rte_memcpy_vector_t)(void *dst, const void *src,
> > size_t n); extern rte_memcpy_vector_t rte_memcpy_vector; static inline
> > void * rte_memcpy(void *dst, const void *src, size_t n) {
> > return rte_memcpy_vector(dst, src, n); } In order to reduce
> > the overhead at run time, I assign the function address to var
> > rte_memcpy_vector before main() starts to init the var.
> >
> > static void __attribute__((constructor))
> > rte_memcpy_init(void)
> > {
> > if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2))
> > {
> > rte_memcpy_vector = rte_memcpy_avx2;
> > }
> > else if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_1))
> > {
> > rte_memcpy_vector = rte_memcpy_sse;
> > }
> > else
> > {
> > rte_memcpy_vector = memcpy;
> > }
> >
> > }
>
> I thought we discussed a bit different approach.
> In which rte_memcpy_vector() (rte_memeset_vector) would be called only
> after some cutoff point, i.e:
>
> void
> rte_memcpy(void *dst, const void *src, size_t len) {
> if (len < N) memcpy(dst, src, len);
> else rte_memcpy_vector(dst, src, len);
> }
>
> If you just always call rte_memcpy_vector() for every len, then it means that
> compiler most likely has always to generate a proper call (not inlining
> happening).
> For small length(s) price of extra function would probably overweight any
> potential gain with SSE/AVX2 implementation.
>
> Konstantin
Yes, in fact, from my tests, For small length(s) rte_memset is far better than glibc memset,
For large lengths, rte_memset is only a bit better than memset.
because memset use the AVX2/SSE, too. Of course, it will use AVX512 on future machine.
>For small length(s) price of extra function would probably overweight any
>potential gain.
This is the key point. I think it should include the scalar optimization, not only vector optimization.
The value of rte_memset is always inlined and for small lengths it will be better.
when in some case We are not sure that memset is always inlined by compiler.
It seems that choosing function at run time will lose the gains.
The following is tested on haswell by patch code.
** rte_memset() - memset perf tests
(C = compile-time constant) **
======== ======= ======== ======= ========
Size memset in cache memset in mem
(bytes) (ticks) (ticks)
------- -------------- ---------------
============= 32B aligned ================
3 3 - 8 19 - 128
4 4 - 8 13 - 128
8 2 - 7 19 - 128
9 2 - 7 19 - 127
12 2 - 7 19 - 127
17 3 - 8 19 - 132
64 3 - 8 28 - 168
128 7 - 13 54 - 200
255 8 - 20 100 - 223
511 14 - 20 187 - 314
1024 24 - 29 328 - 379
8192 198 - 225 1829 - 2193
Thanks
Zhiyong
More information about the dev
mailing list