[dpdk-dev] [PATCH] vhost: adaptively batch small guest memory copies

Maxime Coquelin maxime.coquelin at redhat.com
Thu Sep 7 19:47:57 CEST 2017

Hi Tiwei,

On 08/24/2017 04:19 AM, Tiwei Bie wrote:
> This patch adaptively batches the small guest memory copies.
> By batching the small copies, the efficiency of executing the
> memory LOAD instructions can be improved greatly, because the
> memory LOAD latency can be effectively hidden by the pipeline.
> We saw great performance boosts for small packets PVP test.
> This patch improves the performance for small packets, and has
> distinguished the packets by size. So although the performance
> for big packets doesn't change, it makes it relatively easy to
> do some special optimizations for the big packets too.
> Signed-off-by: Tiwei Bie<tiwei.bie at intel.com>
> Signed-off-by: Zhihong Wang<zhihong.wang at intel.com>
> Signed-off-by: Zhiyong Yang<zhiyong.yang at intel.com>
> ---
> This optimization depends on the CPU internal pipeline design.
> So further tests (e.g. ARM) from the community is appreciated.
>   lib/librte_vhost/vhost.c      |   2 +-
>   lib/librte_vhost/vhost.h      |  13 +++
>   lib/librte_vhost/vhost_user.c |  12 +++
>   lib/librte_vhost/virtio_net.c | 240 ++++++++++++++++++++++++++++++++----------
>   4 files changed, 209 insertions(+), 58 deletions(-)

I did some PVP benchmark with your patch.
First I tried my standard PVP setup, with io forwarding on host and
macswap on guest in bidirectional mode.

With this, I notice no improvement (18.8Mpps), but I think it explains
because guest is the bottleneck here.
So I change my setup to do csum forwarding on host side, so that host's
PMD threads are more loaded.

In this case, I notice a great improvement, I get 18.8Mpps with your
patch instead of 14.8Mpps without! Great work!

Reviewed-by: Maxime Coquelin <maxime.coquelin at redhat.com>


