[dpdk-dev] [PATCH] vhost: batch used descriptors chains write-back with packed ring

Michael S. Tsirkin mst at redhat.com
Wed Dec 19 17:10:44 CET 2018


On Wed, Dec 19, 2018 at 10:16:24AM +0100, Maxime Coquelin wrote:
> 
> 
> On 12/12/18 7:53 PM, Michael S. Tsirkin wrote:
> > On Wed, Dec 12, 2018 at 05:34:31PM +0100, Maxime Coquelin wrote:
> > > Hi Ilya,
> > > 
> > > On 12/12/18 4:23 PM, Ilya Maximets wrote:
> > > > On 12.12.2018 11:24, Maxime Coquelin wrote:
> > > > > Instead of writing back descriptors chains in order, let's
> > > > > write the first chain flags last in order to improve batching.
> > > > > 
> > > > > With Kernel's pktgen benchmark, ~3% performance gain is measured.
> > > > > 
> > > > > Signed-off-by: Maxime Coquelin <maxime.coquelin at redhat.com>
> > > > > ---
> > > > >    lib/librte_vhost/virtio_net.c | 39 +++++++++++++++++++++--------------
> > > > >    1 file changed, 24 insertions(+), 15 deletions(-)
> > > > > 
> > > > 
> > > > Hi.
> > > > I made some rough testing on my ARMv8 system with this patch and v1 of it.
> > > > Here is the performance difference with current master:
> > > >       v1: +1.1 %
> > > >       v2: -3.6 %
> > > > 
> > > > So, write barriers are quiet heavy in practice.
> > > 
> > > Thanks for testing it on ARM. Indeed, SMP WMB is heavier on ARM.
> > 
> > Besides your ideas for improving packed rings, maybe we should switch to
> > load_acquite/store_release?
> > 
> > See
> > 	virtio: use smp_load_acquire/smp_store_release
> > 
> > which worked fine but as I only tested on x86 did not result in any gains.
> > 
> 
> Thanks for the pointer.
> We'll look into it for v19.05, as -rc1 for v19.02 is planned for end of
> week, so it will be too late to introduce such changes.
> 
> Regards,
> Maxime

That's not the only option BTW.  For loads, another option it to work
the value into an indirect dependency which does not need
a barrier.

For example:

#define OPTIMIZER_HIDE_VAR(var)                                         \
        __asm__ ("" : "=r" (var) : "0" (var))

unsigned empty = last_used == idx->used;
if (!empty) {
	OPTIMIZER_HIDE_VAR(empty);
	desc = used->ring[last_used + empty];
}


See linux for definitions of OPTIMIZER_HIDE_VAR.

One side effect of this is that this also blocks code speculation.
which can be a good or a bad thing for performance,
but can be a good thing for security.


-- 
MST


More information about the dev mailing list