[dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue

Wang, Zhihong zhihong.wang at intel.com
Tue Aug 23 12:43:36 CEST 2016



> -----Original Message-----
> From: Wang, Zhihong
> Sent: Tuesday, August 23, 2016 10:31 AM
> To: Maxime Coquelin <maxime.coquelin at redhat.com>; dev at dpdk.org
> Cc: yuanhan.liu at linux.intel.com
> Subject: RE: [PATCH v3 0/5] vhost: optimize enqueue
> 
> 
> 
> > -----Original Message-----
> > From: Maxime Coquelin [mailto:maxime.coquelin at redhat.com]
> > Sent: Monday, August 22, 2016 6:02 PM
> > To: Wang, Zhihong <zhihong.wang at intel.com>; dev at dpdk.org
> > Cc: yuanhan.liu at linux.intel.com
> > Subject: Re: [PATCH v3 0/5] vhost: optimize enqueue
> >
> >
> > On 08/22/2016 10:11 AM, Maxime Coquelin wrote:
> > > Hi Zhihong,
> > >
> > > On 08/19/2016 07:43 AM, Zhihong Wang wrote:
> > > > This patch set optimizes the vhost enqueue function.
> > > >
> > > > It implements the vhost logic from scratch into a single function
> > > > designed
> > > > for high performance and good maintainability, and improves CPU
> > > > efficiency
> > > > significantly by optimizing cache access, which means:
> > > >
> > > >  *  For fast frontends (eg. DPDK virtio pmd), higher performance
> > (maximum
> > > >     throughput) can be achieved.
> > > >
> > > >  *  For slow frontends (eg. kernel virtio-net), better scalability can be
> > > >     achieved, each vhost core can support more connections since it takes
> > > >     less cycles to handle each single frontend.
> > > >
> > > > The main optimization techniques are:
> > > >
> > > >  1. Reorder code to reduce CPU pipeline stall cycles.
> > > >
> > > >  2. Batch update the used ring for better efficiency.
> > > >
> > > >  3. Prefetch descriptor to hide cache latency.
> > > >
> > > >  4. Remove useless volatile attribute to allow compiler optimization.
> > >
> > > Thanks for these details, this is helpful to understand where the perf
> > > gain comes from.
> > > I would suggest to add these information as comments in the code
> > > where/if it makes sense. If more a general comment, at least add it in
> > > the commit message of the patch introducing it.
> > > Indeed, adding it to the cover letter is fine, but the information is
> > > lost as soon as the series is applied.
> > >
> > > You don't mention any figures, so I set up a benchmark on my side to
> > > evaluate your series. It indeed shows an interesting performance gain.
> > >
> > > My setup consists of one host running a guest.
> > > The guest generates as much 64bytes packets as possible using
> > > pktgen-dpdk. The hosts forwards received packets back to the guest
> > > using testpmd on vhost pmd interface. Guest's vCPUs are pinned to
> > > physical CPUs.
> > >
> > > I tested it with and without your v1 patch, with and without
> > > rx-mergeable feature turned ON.
> > > Results are the average of 8 runs of 60 seconds:
> > >
> > > Rx-Mergeable ON : 7.72Mpps
> > > Rx-Mergeable ON + "vhost: optimize enqueue" v1: 9.19Mpps
> > > Rx-Mergeable OFF: 10.52Mpps
> > > Rx-Mergeable OFF + "vhost: optimize enqueue" v1: 10.60Mpps
> > >
> > I forgot to add that before this series, I think we should first fix the windows
> bug.
> > Else we will need a dedicated fix for the stable branch.
> 
> Okay I'll try to fix it, though I can't make any promises at present.
> 
> Have tried once but stopped since we don't have enough debug info from the
> frontend side so basically I was debugging the backend based on guesses.

Hi Maxime, Yuanhan,

I've identified the root cause, do you think it makes sense to put the fix
in the same patch set? Or send it as a separated patch?


Thanks
Zhihong

> 
> 
> >
> > Regards,
> > Maxime



More information about the dev mailing list