[dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue
Wang, Zhihong
zhihong.wang at intel.com
Tue Aug 23 04:31:29 CEST 2016
> -----Original Message-----
> From: Maxime Coquelin [mailto:maxime.coquelin at redhat.com]
> Sent: Monday, August 22, 2016 6:02 PM
> To: Wang, Zhihong <zhihong.wang at intel.com>; dev at dpdk.org
> Cc: yuanhan.liu at linux.intel.com
> Subject: Re: [PATCH v3 0/5] vhost: optimize enqueue
>
>
> On 08/22/2016 10:11 AM, Maxime Coquelin wrote:
> > Hi Zhihong,
> >
> > On 08/19/2016 07:43 AM, Zhihong Wang wrote:
> > > This patch set optimizes the vhost enqueue function.
> > >
> > > It implements the vhost logic from scratch into a single function
> > > designed
> > > for high performance and good maintainability, and improves CPU
> > > efficiency
> > > significantly by optimizing cache access, which means:
> > >
> > > * For fast frontends (eg. DPDK virtio pmd), higher performance
> (maximum
> > > throughput) can be achieved.
> > >
> > > * For slow frontends (eg. kernel virtio-net), better scalability can be
> > > achieved, each vhost core can support more connections since it takes
> > > less cycles to handle each single frontend.
> > >
> > > The main optimization techniques are:
> > >
> > > 1. Reorder code to reduce CPU pipeline stall cycles.
> > >
> > > 2. Batch update the used ring for better efficiency.
> > >
> > > 3. Prefetch descriptor to hide cache latency.
> > >
> > > 4. Remove useless volatile attribute to allow compiler optimization.
> >
> > Thanks for these details, this is helpful to understand where the perf
> > gain comes from.
> > I would suggest to add these information as comments in the code
> > where/if it makes sense. If more a general comment, at least add it in
> > the commit message of the patch introducing it.
> > Indeed, adding it to the cover letter is fine, but the information is
> > lost as soon as the series is applied.
> >
> > You don't mention any figures, so I set up a benchmark on my side to
> > evaluate your series. It indeed shows an interesting performance gain.
> >
> > My setup consists of one host running a guest.
> > The guest generates as much 64bytes packets as possible using
> > pktgen-dpdk. The hosts forwards received packets back to the guest
> > using testpmd on vhost pmd interface. Guest's vCPUs are pinned to
> > physical CPUs.
> >
> > I tested it with and without your v1 patch, with and without
> > rx-mergeable feature turned ON.
> > Results are the average of 8 runs of 60 seconds:
> >
> > Rx-Mergeable ON : 7.72Mpps
> > Rx-Mergeable ON + "vhost: optimize enqueue" v1: 9.19Mpps
> > Rx-Mergeable OFF: 10.52Mpps
> > Rx-Mergeable OFF + "vhost: optimize enqueue" v1: 10.60Mpps
> >
> I forgot to add that before this series, I think we should first fix the windows bug.
> Else we will need a dedicated fix for the stable branch.
Okay I'll try to fix it, though I can't make any promises at present.
Have tried once but stopped since we don't have enough debug info from the
frontend side so basically I was debugging the backend based on guesses.
>
> Regards,
> Maxime
More information about the dev
mailing list