[dpdk-dev] [RFC] ring: make ring implementation non-inlined

Ananyev, Konstantin konstantin.ananyev at intel.com
Sat Mar 21 02:03:35 CET 2020



> -----Original Message-----
> From: Stephen Hemminger <stephen at networkplumber.org>
> Sent: Friday, March 20, 2020 5:55 PM
> To: Ananyev, Konstantin <konstantin.ananyev at intel.com>
> Cc: dev at dpdk.org; olivier.matz at 6wind.com; honnappa.nagarahalli at arm.com; jerinj at marvell.com; drc at linux.vnet.ibm.com
> Subject: Re: [RFC] ring: make ring implementation non-inlined
> 
> On Fri, 20 Mar 2020 16:41:38 +0000
> Konstantin Ananyev <konstantin.ananyev at intel.com> wrote:
> 
> > As was discussed here:
> > http://mails.dpdk.org/archives/dev/2020-February/158586.html
> > this RFC aimed to hide ring internals into .c and make all
> > ring functions non-inlined. In theory that might help to
> > maintain ABI stability in future.
> > This is just a POC to measure the impact of proposed idea,
> > proper implementation would definetly need some extra effort.
> > On IA box (SKX) ring_perf_autotest shows ~20-30 cycles extra for
> > enqueue+dequeue pair. On some more realistic code, I suspect
> > the impact it might be a bit higher.
> > For MP/MC bulk transfers degradation seems quite small,
> > though for SP/SC and/or small transfers it is more then noticable
> > (see exact numbers below).
> > From my perspective we'd probably keep it inlined for now
> > to avoid any non-anticipated perfomance degradations.
> > Though intersted to see perf results and opinions from
> > other interested parties.
> >
> > Intel(R) Xeon(R) Platinum 8160 CPU @ 2.10GHz
> > ring_perf_autotest (without patch/with patch)
> >
> > ### Testing single element enq/deq ###
> > legacy APIs: SP/SC: single: 8.75/43.23
> > legacy APIs: MP/MC: single: 56.18/80.44
> >
> > ### Testing burst enq/deq ###
> > legacy APIs: SP/SC: burst (size: 8): 37.36/53.37
> > legacy APIs: SP/SC: burst (size: 32): 93.97/117.30
> > legacy APIs: MP/MC: burst (size: 8): 78.23/91.45
> > legacy APIs: MP/MC: burst (size: 32): 131.59/152.49
> >
> > ### Testing bulk enq/deq ###
> > legacy APIs: SP/SC: bulk (size: 8): 37.29/54.48
> > legacy APIs: SP/SC: bulk (size: 32): 92.68/113.01
> > legacy APIs: MP/MC: bulk (size: 8): 78.40/93.50
> > legacy APIs: MP/MC: bulk (size: 32): 131.49/154.25
> >
> > ### Testing empty bulk deq ###
> > legacy APIs: SP/SC: bulk (size: 8): 4.00/16.86
> > legacy APIs: MP/MC: bulk (size: 8): 7.01/15.55
> >
> > ### Testing using two hyperthreads ###
> > legacy APIs: SP/SC: bulk (size: 8): 10.64/17.56
> > legacy APIs: MP/MC: bulk (size: 8): 15.30/16.69
> > legacy APIs: SP/SC: bulk (size: 32): 5.84/7.09
> > legacy APIs: MP/MC: bulk (size: 32): 6.34/7.54
> >
> > ### Testing using two physical cores ###
> > legacy APIs: SP/SC: bulk (size: 8): 24.34/42.40
> > legacy APIs: MP/MC: bulk (size: 8): 70.34/71.82
> > legacy APIs: SP/SC: bulk (size: 32): 12.67/14.68
> > legacy APIs: MP/MC: bulk (size: 32): 22.41/17.93
> >
> > ### Testing single element enq/deq ###
> > elem APIs: element size 16B: SP/SC: single: 10.65/41.96
> > elem APIs: element size 16B: MP/MC: single: 44.33/81.36
> >
> > ### Testing burst enq/deq ###
> > elem APIs: element size 16B: SP/SC: burst (size: 8): 39.20/58.52
> > elem APIs: element size 16B: SP/SC: burst (size: 32): 123.19/142.79
> > elem APIs: element size 16B: MP/MC: burst (size: 8): 80.72/101.36
> > elem APIs: element size 16B: MP/MC: burst (size: 32): 169.21/185.38
> >
> > ### Testing bulk enq/deq ###
> > elem APIs: element size 16B: SP/SC: bulk (size: 8): 41.64/58.46
> > elem APIs: element size 16B: SP/SC: bulk (size: 32): 122.74/142.52
> > elem APIs: element size 16B: MP/MC: bulk (size: 8): 80.60/103.14
> > elem APIs: element size 16B: MP/MC: bulk (size: 32): 169.39/186.67
> >
> > ### Testing empty bulk deq ###
> > elem APIs: element size 16B: SP/SC: bulk (size: 8): 5.01/17.17
> > elem APIs: element size 16B: MP/MC: bulk (size: 8): 6.01/14.80
> >
> > ### Testing using two hyperthreads ###
> > elem APIs: element size 16B: SP/SC: bulk (size: 8): 12.02/17.18
> > elem APIs: element size 16B: MP/MC: bulk (size: 8): 16.81/21.14
> > elem APIs: element size 16B: SP/SC: bulk (size: 32): 7.87/9.01
> > elem APIs: element size 16B: MP/MC: bulk (size: 32): 8.22/10.57
> >
> > ### Testing using two physical cores ###
> > elem APIs: element size 16B: SP/SC: bulk (size: 8): 27.00/51.94
> > elem APIs: element size 16B: MP/MC: bulk (size: 8): 78.24/74.48
> > elem APIs: element size 16B: SP/SC: bulk (size: 32): 15.41/16.14
> > elem APIs: element size 16B: MP/MC: bulk (size: 32): 18.72/21.64
> >
> > Signed-off-by: Konstantin Ananyev <konstantin.ananyev at intel.com>
> 
> What is impact with LTO? I suspect compiler might have a chance to
> get speed back with LTO.

Might be, but LTO is not enabled by default.
So don't see much point in digging any further here.


More information about the dev mailing list