[dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue

Wang, Zhihong zhihong.wang at intel.com
Mon Sep 26 07:25:05 CEST 2016

Previous message: [dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue
Next message: [dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]


> -----Original Message-----
> From: Jianbo Liu [mailto:jianbo.liu at linaro.org]
> Sent: Monday, September 26, 2016 1:13 PM
> To: Wang, Zhihong <zhihong.wang at intel.com>
> Cc: Thomas Monjalon <thomas.monjalon at 6wind.com>; dev at dpdk.org; Yuanhan
> Liu <yuanhan.liu at linux.intel.com>; Maxime Coquelin
> <maxime.coquelin at redhat.com>
> Subject: Re: [dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue
> 
> On 25 September 2016 at 13:41, Wang, Zhihong <zhihong.wang at intel.com>
> wrote:
> >
> >
> >> -----Original Message-----
> >> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> >> Sent: Friday, September 23, 2016 9:41 PM
> >> To: Jianbo Liu <jianbo.liu at linaro.org>
> >> Cc: dev at dpdk.org; Wang, Zhihong <zhihong.wang at intel.com>; Yuanhan Liu
> >> <yuanhan.liu at linux.intel.com>; Maxime Coquelin
> >> <maxime.coquelin at redhat.com>
> ....
> > This patch does help in ARM for small packets like 64B sized ones,
> > this actually proves the similarity between x86 and ARM in terms
> > of caching optimization in this patch.
> >
> > My estimation is based on:
> >
> >  1. The last patch are for mrg_rxbuf=on, and since you said it helps
> >     perf, we can ignore it for now when we discuss mrg_rxbuf=off
> >
> >  2. Vhost enqueue perf =
> >     Ring overhead + Virtio header overhead + Data memcpy overhead
> >
> >  3. This patch helps small packets traffic, which means it helps
> >     ring + virtio header operations
> >
> >  4. So, when you say perf drop when packet size larger than 512B,
> >     this is most likely caused by memcpy in ARM not working well
> >     with this patch
> >
> > I'm not saying glibc's memcpy is not good enough, it's just that
> > this is a rather special use case. And since we see specialized
> > memcpy + this patch give better performance than other combinations
> > significantly on x86, we suggest to hand-craft a specialized memcpy
> > for it.
> >
> > Of course on ARM this is still just my speculation, and we need to
> > either prove it or find the actual root cause.
> >
> > It can be **REALLY HELPFUL** if you could help to test this patch on
> > ARM for mrg_rxbuf=on cases to see if this patch is in fact helpful
> > to ARM at all, since mrg_rxbuf=on the more widely used cases.
> >
> Actually it's worse than mrg_rxbuf=off.

I mean compare the perf of original vs. original + patch with
mrg_rxbuf turned on. Is there any perf improvement?

Previous message: [dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue
Next message: [dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the dev mailing list