[dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue

Wang, Zhihong zhihong.wang at intel.com
Thu Oct 13 03:21:21 CEST 2016

Previous message: [dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue
Next message: [dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]


> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Wednesday, October 12, 2016 11:31 PM
> To: Wang, Zhihong <zhihong.wang at intel.com>
> Cc: Yuanhan Liu <yuanhan.liu at linux.intel.com>; Jianbo Liu
> <jianbo.liu at linaro.org>; Maxime Coquelin <maxime.coquelin at redhat.com>;
> dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue
> 
> Sorry guys, you lost me in the discussion.
> 
> Is there some regression only on ARM?

ARM is what we see, no info on ppc yet.

> Does it need some work specifically on memcpy for ARM,
> or vhost for ARM?
> Who can work on ARM optimization?

These are still open questions, Jiaobo who reported this doesn't
have time for more testing now according to the reply.

I'm trying to do some test in the hope to identify the root cause.

> 
> More comments below.
> 
> 2016-10-12 12:22, Wang, Zhihong:
> > From: Yuanhan Liu [mailto:yuanhan.liu at linux.intel.com]
> > > > It's an ARM server.
> > > >
> > > > >> >  3. How many percentage of drop are you seeing?
> > > > The testing result:
> > > > size (bytes)     improvement (%)
> > > > 64                   3.92
> > > > 128                 11.51
> > > > 256                  24.16
> > > > 512                  -13.79
> > > > 1024                -22.51
> > > > 1500                -12.22
> > > > A correction is that performance is dropping if byte size is larger than
> 512.
> > >
> > > I have thought of this twice. Unfortunately, I think I may need NACK this
> > > series.
> > >
> > > Merging two code path into one is really good: as you stated, it improves
> > > the maintainability. But only if we see no performance regression on both
> > > path after the refactor. Unfortunately, that's not the case here: it hurts
> > > the performance for one code path (non-mrg Rx).
> 
> +1
> 
> > > That makes me think we may should not do the code path merge at all. I
> think
> > > that also aligns with what you have said before (internally): we could do
> the
> > > merge if it gives comparable performance before and after that.
> > >
> > > Besides that, I don't quite like the way you did in patch 2 (rewrite
> enqueue):
> > > you made a lot of changes in one patch. That means if something wrong
> > > happened,
> > > it is hard to narrow down which change introduces that regression. Badly,
> > > that's exactly what we met here. Weeks have been passed, I see no
> progress.
> 
> +1, it is important to have simple patches making changes step by step.
> 
> > > That's the reason we like the idea of "one patch only does one thing, an
> > > atomic thing".
> >
> >
> > Yuanhan, folks,
> >
> > Thanks for the analysis. I disagree here though.
> >
> > I analyze, develop, benchmark on x86 platforms, where this patch
> > works great.
> >
> > I've been trying to analyze on ARM too but it takes time and I've
> > had a schedule. Also since the ARM perf issue comes when it's
> > v6 already, I might not be able to make it in time. However
> > that's what I have to do for this patch to be merged in this
> > or the next release.
> >
> > In the meantime, may I suggest we consider the possibility to
> > have dedicated codes for **perf critical paths** for different
> > kinds of architecture?
> 
> Yes that's what we do in several parts of DPDK.
> 
> > It can be hard for a person to have both the knowledge and the
> > development environment for multiple archs at the same time.
> 
> Yes we do not expect you work on ARM.
> So if nobody work on the ARM issue, you could make 2 code paths
> in order to allow your optimization for x86 only.
> But that's not the preferred way.
> And you must split your rework to better identify which part is
> a regression on ARM.
> 
> > Moreover, different optimization techniques might be required for
> > different archs, so it's hard and unnecessary to make a function
> > works for all archs, sometimes it's just not the right thing to do.
> 
> Yes sometimes. Please help us to be convinced for this case.
> 
> > > So I will apply the first patch (it's a bug fixing patch) and ask you to
> > > refactor the rest, without the code path merge.
> > >
> > > I think we could still have a good maintainability code base if we introduce
> > > more common helper functions that can be used on both Rx path, or
> even on
> > > Tx path (such as update_used_ring, or shadow_used_ring).
> 
> Yes it is a good step.
> And the code path merge could be reconsidered later.
> 
> > > It's a bit late for too many changes for v16.11. I think you could just
> > > grab patch 6 (vhost: optimize cache access) to the old mrg-Rx code path,
> > > if that also helps the performance? Let us handle the left in next release,
> > > such as shadow used ring.
> 
> Thank you

Previous message: [dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue
Next message: [dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the dev mailing list