[dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue

Wang, Zhihong zhihong.wang at intel.com
Wed Oct 12 14:22:08 CEST 2016



> -----Original Message-----
> From: Yuanhan Liu [mailto:yuanhan.liu at linux.intel.com]
> Sent: Wednesday, October 12, 2016 10:53 AM
> To: Wang, Zhihong <zhihong.wang at intel.com>; Jianbo Liu <jianbo.liu at linaro.org>
> Cc: Maxime Coquelin <maxime.coquelin at redhat.com>; dev at dpdk.org; Thomas
> Monjalon <thomas.monjalon at 6wind.com>
> Subject: Re: [dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue
> 
> On Thu, Sep 22, 2016 at 01:47:45PM +0800, Jianbo Liu wrote:
> > On 22 September 2016 at 10:29, Yuanhan Liu <yuanhan.liu at linux.intel.com>
> wrote:
> > > On Wed, Sep 21, 2016 at 08:54:11PM +0800, Jianbo Liu wrote:
> > >> >> > My setup consists of one host running a guest.
> > >> >> > The guest generates as much 64bytes packets as possible using
> > >> >>
> > >> >> Have you tested with other different packet size?
> > >> >> My testing shows that performance is dropping when packet size is more
> > >> >> than 256.
> > >> >
> > >> >
> > >> > Hi Jianbo,
> > >> >
> > >> > Thanks for reporting this.
> > >> >
> > >> >  1. Are you running the vector frontend with mrg_rxbuf=off?
> > >> >
> > Yes, my testing is mrg_rxbuf=off, but not vector frontend PMD.
> >
> > >> >  2. Could you please specify what CPU you're running? Is it Haswell
> > >> >     or Ivy Bridge?
> > >> >
> > It's an ARM server.
> >
> > >> >  3. How many percentage of drop are you seeing?
> > The testing result:
> > size (bytes)     improvement (%)
> > 64                   3.92
> > 128                 11.51
> > 256                  24.16
> > 512                  -13.79
> > 1024                -22.51
> > 1500                -12.22
> > A correction is that performance is dropping if byte size is larger than 512.
> 
> I have thought of this twice. Unfortunately, I think I may need NACK this
> series.
> 
> Merging two code path into one is really good: as you stated, it improves
> the maintainability. But only if we see no performance regression on both
> path after the refactor. Unfortunately, that's not the case here: it hurts
> the performance for one code path (non-mrg Rx).
> 
> That makes me think we may should not do the code path merge at all. I think
> that also aligns with what you have said before (internally): we could do the
> merge if it gives comparable performance before and after that.
> 
> Besides that, I don't quite like the way you did in patch 2 (rewrite enqueue):
> you made a lot of changes in one patch. That means if something wrong
> happened,
> it is hard to narrow down which change introduces that regression. Badly,
> that's exactly what we met here. Weeks have been passed, I see no progress.
> 
> That's the reason we like the idea of "one patch only does one thing, an
> atomic thing".


Yuanhan, folks,

Thanks for the analysis. I disagree here though.

I analyze, develop, benchmark on x86 platforms, where this patch
works great.

I've been trying to analyze on ARM too but it takes time and I've
had a schedule. Also since the ARM perf issue comes when it's
v6 already, I might not be able to make it in time. However
that's what I have to do for this patch to be merged in this
or the next release.

In the meantime, may I suggest we consider the possibility to
have dedicated codes for **perf critical paths** for different
kinds of architecture?

It can be hard for a person to have both the knowledge and the
development environment for multiple archs at the same time.

Moreover, different optimization techniques might be required for
different archs, so it's hard and unnecessary to make a function
works for all archs, sometimes it's just not the right thing to do.


Thanks
Zhihong


> 
> So I will apply the first patch (it's a bug fixing patch) and ask you to
> refactor the rest, without the code path merge.
> 
> I think we could still have a good maintainability code base if we introduce
> more common helper functions that can be used on both Rx path, or even on
> Tx path (such as update_used_ring, or shadow_used_ring).
> 
> It's a bit late for too many changes for v16.11. I think you could just
> grab patch 6 (vhost: optimize cache access) to the old mrg-Rx code path,
> if that also helps the performance? Let us handle the left in next release,
> such as shadow used ring.
> 
> Thanks.
> 
> 	--yliu


More information about the dev mailing list