[dpdk-dev] [PATCH] vhost: adaptively batch small guest memory copies
maxime.coquelin at redhat.com
Thu Sep 7 19:47:57 CEST 2017
On 08/24/2017 04:19 AM, Tiwei Bie wrote:
> This patch adaptively batches the small guest memory copies.
> By batching the small copies, the efficiency of executing the
> memory LOAD instructions can be improved greatly, because the
> memory LOAD latency can be effectively hidden by the pipeline.
> We saw great performance boosts for small packets PVP test.
> This patch improves the performance for small packets, and has
> distinguished the packets by size. So although the performance
> for big packets doesn't change, it makes it relatively easy to
> do some special optimizations for the big packets too.
> Signed-off-by: Tiwei Bie<tiwei.bie at intel.com>
> Signed-off-by: Zhihong Wang<zhihong.wang at intel.com>
> Signed-off-by: Zhiyong Yang<zhiyong.yang at intel.com>
> This optimization depends on the CPU internal pipeline design.
> So further tests (e.g. ARM) from the community is appreciated.
> lib/librte_vhost/vhost.c | 2 +-
> lib/librte_vhost/vhost.h | 13 +++
> lib/librte_vhost/vhost_user.c | 12 +++
> lib/librte_vhost/virtio_net.c | 240 ++++++++++++++++++++++++++++++++----------
> 4 files changed, 209 insertions(+), 58 deletions(-)
I did some PVP benchmark with your patch.
First I tried my standard PVP setup, with io forwarding on host and
macswap on guest in bidirectional mode.
With this, I notice no improvement (18.8Mpps), but I think it explains
because guest is the bottleneck here.
So I change my setup to do csum forwarding on host side, so that host's
PMD threads are more loaded.
In this case, I notice a great improvement, I get 18.8Mpps with your
patch instead of 14.8Mpps without! Great work!
Reviewed-by: Maxime Coquelin <maxime.coquelin at redhat.com>
More information about the dev