[dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue
Wang, Zhihong
zhihong.wang at intel.com
Thu Oct 13 08:02:04 CEST 2016
> -----Original Message-----
> From: Yuanhan Liu [mailto:yuanhan.liu at linux.intel.com]
> Sent: Thursday, October 13, 2016 1:33 PM
> To: Wang, Zhihong <zhihong.wang at intel.com>
> Cc: Jianbo Liu <jianbo.liu at linaro.org>; Thomas Monjalon
> <thomas.monjalon at 6wind.com>; Maxime Coquelin
> <maxime.coquelin at redhat.com>; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue
>
> On Wed, Oct 12, 2016 at 12:22:08PM +0000, Wang, Zhihong wrote:
> > > > >> > 3. How many percentage of drop are you seeing?
> > > > The testing result:
> > > > size (bytes) improvement (%)
> > > > 64 3.92
> > > > 128 11.51
> > > > 256 24.16
> > > > 512 -13.79
> > > > 1024 -22.51
> > > > 1500 -12.22
> > > > A correction is that performance is dropping if byte size is larger than
> 512.
> > >
> > > I have thought of this twice. Unfortunately, I think I may need NACK this
> > > series.
> > >
> > > Merging two code path into one is really good: as you stated, it improves
> > > the maintainability. But only if we see no performance regression on both
> > > path after the refactor. Unfortunately, that's not the case here: it hurts
> > > the performance for one code path (non-mrg Rx).
> > >
> > > That makes me think we may should not do the code path merge at all. I
> think
> > > that also aligns with what you have said before (internally): we could do
> the
> > > merge if it gives comparable performance before and after that.
> > >
> > > Besides that, I don't quite like the way you did in patch 2 (rewrite
> enqueue):
> > > you made a lot of changes in one patch. That means if something wrong
> > > happened,
> > > it is hard to narrow down which change introduces that regression. Badly,
> > > that's exactly what we met here. Weeks have been passed, I see no
> progress.
> > >
> > > That's the reason we like the idea of "one patch only does one thing, an
> > > atomic thing".
> >
> >
> > Yuanhan, folks,
> >
> > Thanks for the analysis. I disagree here though.
> >
> > I analyze, develop, benchmark on x86 platforms, where this patch
> > works great.
>
> Yes, that's great effort! With your hardwork, we know what the bottleneck
> is and how it could be improved.
>
> However, you don't have to do code refactor (merge two code path to one)
> to apply those improvements. From what I know, in this patchset, there
> are two factors could improve the performance:
>
> - copy hdr together with packet data
>
> - shadow used ring update and update at once
>
> The overall performance boost I got with your v6 patchset with mrg-Rx
> code path is about 27% (in PVP case). And I have just applied the 1st
> optimization, it yields about 20% boosts. The left could be covered if
> we apply the 2nd optimization (I guess).
>
> That would be a clean way to optimize vhost mergeable Rx path:
>
> - you don't touch non-mrg Rx path (well, you may could apply the
> shadow_used_ring trick to it as wel)
>
> This would at least make sure we will have no such performance
> regression issue reported by ARM guys.
>
> - you don't refactor the code
>
> The rewrite from scratch could introduce other issues, besides the
> performance regression. We may just don't know it yet.
>
>
> Make sense to you? If you agree, I think we could still make it in
> this release: they would be some small changes after all. For example,
> below is the patch applies the 1st optimization tip on top of
> dpdk-next-virtio
Thanks for this great idea. I think it's a better way to do it.
I'll start to make the patch then.
>
> --yliu
>
> ---------------------------------------------------------------
> diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
> index 8a151af..0ddb5af 100644
> --- a/lib/librte_vhost/virtio_net.c
> +++ b/lib/librte_vhost/virtio_net.c
> @@ -379,7 +379,7 @@ copy_mbuf_to_desc_mergeable(struct virtio_net
> *dev, struct vhost_virtqueue *vq,
> uint16_t end_idx, struct rte_mbuf *m,
> struct buf_vector *buf_vec)
> {
> - struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, 0, 0, 0, 0, 0}, 0};
> + struct virtio_net_hdr_mrg_rxbuf *virtio_hdr;
> uint32_t vec_idx = 0;
> uint16_t start_idx = vq->last_used_idx;
> uint16_t cur_idx = start_idx;
> @@ -388,6 +388,8 @@ copy_mbuf_to_desc_mergeable(struct virtio_net
> *dev, struct vhost_virtqueue *vq,
> uint32_t desc_offset, desc_avail;
> uint32_t cpy_len;
> uint16_t desc_idx, used_idx;
> + uint16_t num_buffers = end_idx - start_idx;
> + int hdr_copied = 0;
>
> if (unlikely(m == NULL))
> return 0;
> @@ -399,16 +401,11 @@ copy_mbuf_to_desc_mergeable(struct virtio_net
> *dev, struct vhost_virtqueue *vq,
> if (buf_vec[vec_idx].buf_len < dev->vhost_hlen || !desc_addr)
> return 0;
>
> - rte_prefetch0((void *)(uintptr_t)desc_addr);
> + virtio_hdr = (struct virtio_net_hdr_mrg_rxbuf
> *)(uintptr_t)desc_addr;
> + rte_prefetch0(virtio_hdr);
>
> - virtio_hdr.num_buffers = end_idx - start_idx;
> LOG_DEBUG(VHOST_DATA, "(%d) RX: num merge buffers %d\n",
> - dev->vid, virtio_hdr.num_buffers);
> -
> - virtio_enqueue_offload(m, &virtio_hdr.hdr);
> - copy_virtio_net_hdr(dev, desc_addr, virtio_hdr);
> - vhost_log_write(dev, buf_vec[vec_idx].buf_addr, dev->vhost_hlen);
> - PRINT_PACKET(dev, (uintptr_t)desc_addr, dev->vhost_hlen, 0);
> + dev->vid, num_buffers);
>
> desc_avail = buf_vec[vec_idx].buf_len - dev->vhost_hlen;
> desc_offset = dev->vhost_hlen;
> @@ -450,6 +447,15 @@ copy_mbuf_to_desc_mergeable(struct virtio_net
> *dev, struct vhost_virtqueue *vq,
> mbuf_avail = rte_pktmbuf_data_len(m);
> }
>
> + if (hdr_copied == 0) {
> + virtio_hdr->num_buffers = num_buffers;
> + virtio_enqueue_offload(m, &virtio_hdr->hdr);
> + vhost_log_write(dev, buf_vec[vec_idx].buf_addr,
> dev->vhost_hlen);
> + PRINT_PACKET(dev, (uintptr_t)desc_addr, dev-
> >vhost_hlen, 0);
> +
> + hdr_copied = 1;
> + }
> +
> cpy_len = RTE_MIN(desc_avail, mbuf_avail);
> rte_memcpy((void *)((uintptr_t)(desc_addr + desc_offset)),
> rte_pktmbuf_mtod_offset(m, void *, mbuf_offset),
More information about the dev
mailing list