[dpdk-dev] [PATCH 4/6] vhost: add Tx zero copy

Yuanhan Liu yuanhan.liu at linux.intel.com
Tue Aug 23 16:31:47 CEST 2016


On Tue, Aug 23, 2016 at 04:04:30PM +0200, Maxime Coquelin wrote:
> 
> 
> On 08/23/2016 10:10 AM, Yuanhan Liu wrote:
> >The basic idea of Tx zero copy is, instead of copying data from the
> >desc buf, here we let the mbuf reference the desc buf addr directly.
> >
> >Doing so, however, has one major issue: we can't update the used ring
> >at the end of rte_vhost_dequeue_burst. Because we don't do the copy
> >here, an update of the used ring would let the driver to reclaim the
> >desc buf. As a result, DPDK might reference a stale memory region.
> >
> >To update the used ring properly, this patch does several tricks:
> >
> >- when mbuf references a desc buf, refcnt is added by 1.
> >
> >  This is to pin lock the mbuf, so that a mbuf free from the DPDK
> >  won't actually free it, instead, refcnt is subtracted by 1.
> >
> >- We chain all those mbuf together (by tailq)
> >
> >  And we check it every time on the rte_vhost_dequeue_burst entrance,
> >  to see if the mbuf is freed (when refcnt equals to 1). If that
> >  happens, it means we are the last user of this mbuf and we are
> >  safe to update the used ring.
> >
> >- "struct zcopy_mbuf" is introduced, to associate an mbuf with the
> >  right desc idx.
> >
> >Tx zero copy is introduced for performance reason, and some rough tests
> >show about 40% perfomance boost for packet size 1400B. FOr small packets,
> >(e.g. 64B), it actually slows a bit down. That is expected because this
> >patch introduces some extra works, and it outweighs the benefit from
> >saving few bytes copy.
> >
> >Signed-off-by: Yuanhan Liu <yuanhan.liu at linux.intel.com>
> >---
> > lib/librte_vhost/vhost.c      |   2 +
> > lib/librte_vhost/vhost.h      |  21 ++++++
> > lib/librte_vhost/vhost_user.c |  41 +++++++++-
> > lib/librte_vhost/virtio_net.c | 169 +++++++++++++++++++++++++++++++++++++-----
> > 4 files changed, 214 insertions(+), 19 deletions(-)
> >
> ...
> 
> > rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
> > 	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count)
> >@@ -823,6 +943,30 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
> > 	if (unlikely(vq->enabled == 0))
> > 		return 0;
> >
> >+	if (dev->tx_zero_copy) {
> >+		struct zcopy_mbuf *zmbuf, *next;
> >+		int nr_updated = 0;
> >+
> >+		for (zmbuf = TAILQ_FIRST(&vq->zmbuf_list);
> >+		     zmbuf != NULL; zmbuf = next) {
> >+			next = TAILQ_NEXT(zmbuf, next);
> >+
> >+			if (mbuf_is_consumed(zmbuf->mbuf)) {
> >+				used_idx = vq->last_used_idx++ & (vq->size - 1);
> >+				update_used_ring(dev, vq, used_idx,
> >+						 zmbuf->desc_idx);
> >+				nr_updated += 1;
> >+
> >+				TAILQ_REMOVE(&vq->zmbuf_list, zmbuf, next);
> >+				rte_pktmbuf_free(zmbuf->mbuf);
> >+				put_zmbuf(zmbuf);
> >+				vq->nr_zmbuf -= 1;
> >+			}
> Shouldn't you break the loop here as soon as a mbuf is not consumed?

I have thought of that as well, as a micro optimization. But I was
wondering what if a heading mbuf is pin locked by the DPDK APP? Then
the whole chain would be blocked. This should be rare, but I think
we should think of the worst case.

Besides that, the performance boost I got is quite decent, that I think
we could drop this micro optimization.

> Indeed, they might not be consumed sequentially, and would cause
> last_used_idx to be incremented whereas it shouldn't.

I think the out of order used vring update won't be an issue here.
Well, there might be some problems for reconnect. The trick the
commit 0823c1cb0a73 ("vhost: workaround stale vring base") introduced
assumes that used vring will always be updated in order.

	--yliu


More information about the dev mailing list