[dpdk-dev] [PATCH] vhost: improve dirty pages logging performance

Maxime Coquelin maxime.coquelin at redhat.com
Mon Apr 30 17:59:54 CEST 2018


This patch caches all dirty pages logging until the used ring index
is updated. These dirty pages won't be accessed by the guest as
long as the host doesn't give them back to it by updating the
index.

The goal of this optimization is to fix a performance regression
introduced when the vhost library started to use atomic operations
to set bits in the shared dirty log map. While the fix was valid
as previous implementation wasn't safe against concurent accesses,
contention was induced.

With this patch, during migration, we have:
1. Less atomic operations as only a single atomic OR operation
per 32 pages.
2. Less atomic operations as during a burst, the same page will
be marked dirty only once.
3. Less write memory barriers.

Fixes: 897f13a1f726 ("vhost: make page logging atomic")

Cc: stable at dpdk.org

Suggested-by: Michael S. Tsirkin <mst at redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin at redhat.com>
---

Hi,

This series was tested with migrating a guest while running PVP
benchmark at 1Mpps with both ovs-dpdk and testpmd as vswitch.

With this patch we recover the packet drops regressions seen since
the use of atomic operations to log dirty pages.

Some numbers:

A) PVP Live migration using testpmd (Single queue pair)
-------------------------------------------------------

Without patch:
=========================Stream Rate: 1Mpps==============
No Stream_Rate Downtime Totaltime Ping_Loss trex_Loss
 0       1Mpps      134     18966        16   11628963.0
 1       1Mpps      125     18790        16    8436300.0
 2       1Mpps      122     19171        15   13881342.0
 3       1Mpps      132     18913        15   12079492.0
<------------------------Summary------------------------>
   Max   1Mpps      134     19171        16     13881342
   Min   1Mpps      122     18790        15      8436300
  Mean   1Mpps      128     18960        15     11506524
Median   1Mpps      128     18939        15     11854227
 Stdev       0     5.68    158.81      0.58   2266371.52

=========================================================

With patch:
=========================Stream Rate: 1Mpps==============
No Stream_Rate Downtime Totaltime Ping_Loss trex_Loss
 0       1Mpps      119     13521        15     478174.0
 1       1Mpps      116     14084        14     452018.0
 2       1Mpps      122     13908        14     464486.0
 3       1Mpps      129     13787        16     478234.0
<------------------------Summary------------------------>
   Max   1Mpps      129     14084        16       478234
   Min   1Mpps      116     13521        14       452018
  Mean   1Mpps      121     13825        14       468228
Median   1Mpps      120     13847        14       471330
 Stdev       0     5.57    236.52      0.96     12593.78
=========================================================


B) OVS-DPDK migration with 2 queue pairs
----------------------------------------

Without patch:
=======================Stream Rate: 1Mpps================
No Stream_Rate Downtime Totaltime Ping_Loss trex_Loss
 0       1Mpps      146     20270       116   15937394.0
 1       1Mpps      150     25617        16   11120370.0
 2       1Mpps      138     40983       115   24779971.0
 3       1Mpps      161     20435        17   15363519.0
<------------------------Summary------------------------>
   Max   1Mpps      161     40983       116     24779971
   Min   1Mpps      138     20270        16     11120370
  Mean   1Mpps      148     26826        66     16800313
Median   1Mpps      148     23026        66     15650456
 Stdev       0     9.57    9758.9     57.16   5737179.93

=========================================================

With patch:
=======================Stream Rate: 1Mpps================
No Stream_Rate Downtime Totaltime Ping_Loss trex_Loss
 0       1Mpps      155     18915        17     481330.0
 1       1Mpps      156     21097        18     370556.0
 2       1Mpps      156     49610        15     369291.0
 3       1Mpps      144     31124        15     361914.0
<------------------------Summary------------------------>
   Max   1Mpps      156     49610        18       481330
   Min   1Mpps      144     18915        15       361914
  Mean   1Mpps      152     30186        16       395772
Median   1Mpps      155     26110        16       369923
 Stdev       0     5.85  13997.82       1.5     57165.33
=========================================================

C) OVS-DPDK migration with single queue pair
--------------------------------------------

Without patch:
=======================Stream Rate: 1Mpps================
No Stream_Rate Downtime Totaltime Ping_Loss trex_Loss
 0       1Mpps      129     17411        15   11105414.0
 1       1Mpps      130     16544        15    8028438.0
 2       1Mpps      132     15202        15    7491584.0
 3       1Mpps      133     18100        15    8385047.0
<------------------------Summary------------------------>
   Max   1Mpps      133     18100        15     11105414
   Min   1Mpps      129     15202        15      7491584
  Mean   1Mpps      131     16814        15      8752620
Median   1Mpps      131     16977        15      8206742
 Stdev       0     1.83   1249.22       0.0   1610941.84
=========================================================

With patch:
=======================Stream Rate: 1Mpps================
No Stream_Rate Downtime Totaltime Ping_Loss trex_Loss
 0       1Mpps      117     12966        14     275873.0
 1       1Mpps      127     13398        14     460323.0
 2       1Mpps      119     13216        14     280831.0
 3       1Mpps      131     13544        15     465951.0
<------------------------Summary------------------------>
   Max   1Mpps      131     13544        15       465951
   Min   1Mpps      117     12966        14       275873
  Mean   1Mpps      123     13281        14       370744
Median   1Mpps      123     13307        14       370577
 Stdev       0     6.61     249.2       0.5     106729.6
=========================================================



 lib/librte_vhost/vhost.h      | 96 +++++++++++++++++++++++++++++++++++++++++++
 lib/librte_vhost/virtio_net.c | 29 ++++++++-----
 2 files changed, 115 insertions(+), 10 deletions(-)

diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 891978131..0d9129504 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -36,6 +36,8 @@
 
 #define BUF_VECTOR_MAX 256
 
+#define VHOST_LOG_CACHE_NR 32
+
 /**
  * Structure contains buffer address, length and descriptor index
  * from vring to do scatter RX.
@@ -69,6 +71,14 @@ struct batch_copy_elem {
 	uint64_t log_addr;
 };
 
+/*
+ * Structure that contains the info for batched dirty logging.
+ */
+struct log_cache_entry {
+	uint32_t offset;
+	uint32_t val;
+};
+
 /**
  * Structure contains variables relevant to RX/TX virtqueues.
  */
@@ -112,6 +122,9 @@ struct vhost_virtqueue {
 	struct batch_copy_elem	*batch_copy_elems;
 	uint16_t		batch_copy_nb_elems;
 
+	struct log_cache_entry log_cache[VHOST_LOG_CACHE_NR];
+	uint16_t log_cache_nb_elem;
+
 	rte_rwlock_t	iotlb_lock;
 	rte_rwlock_t	iotlb_pending_lock;
 	struct rte_mempool *iotlb_pool;
@@ -340,6 +353,89 @@ vhost_log_write(struct virtio_net *dev, uint64_t addr, uint64_t len)
 	}
 }
 
+static __rte_always_inline void
+vhost_log_cache_sync(struct virtio_net *dev, struct vhost_virtqueue *vq)
+{
+	uint32_t *log_base;
+	int i;
+
+	if (likely(((dev->features & (1ULL << VHOST_F_LOG_ALL)) == 0) ||
+		   !dev->log_base))
+		return;
+
+	log_base = (uint32_t *)(uintptr_t)dev->log_base;
+
+	/* To make sure guest memory updates are committed before logging */
+	rte_smp_wmb();
+
+	for (i = 0; i < vq->log_cache_nb_elem; i++) {
+		struct log_cache_entry *elem = vq->log_cache + i;
+
+		__sync_fetch_and_or(log_base + elem->offset, elem->val);
+	}
+
+	vq->log_cache_nb_elem = 0;
+}
+
+static __rte_always_inline void
+vhost_log_cache_page(struct virtio_net *dev, struct vhost_virtqueue *vq,
+					 uint64_t page)
+{
+	uint32_t bit_nr = page % 32;
+	uint32_t offset = page / 32;
+	int i;
+
+	for (i = 0; i < vq->log_cache_nb_elem; i++) {
+		struct log_cache_entry *elem = vq->log_cache + i;
+
+		if (elem->offset == offset) {
+			elem->val |= (1U << bit_nr);
+			return;
+		}
+	}
+
+	if (unlikely(i >= VHOST_LOG_CACHE_NR)) {
+		/*
+		 * No more room for a new log cache entry,
+		 * so write the dirty log map directly.
+		 */
+		rte_smp_wmb();
+		vhost_log_page((uint8_t *)(uintptr_t)dev->log_base, page);
+
+		return;
+	}
+
+	vq->log_cache[i].offset = offset;
+	vq->log_cache[i].val = (1U << bit_nr);
+}
+
+static __rte_always_inline void
+vhost_log_cache_write(struct virtio_net *dev, struct vhost_virtqueue *vq,
+					  uint64_t addr, uint64_t len)
+{
+	uint64_t page;
+
+	if (likely(((dev->features & (1ULL << VHOST_F_LOG_ALL)) == 0) ||
+		   !dev->log_base || !len))
+		return;
+
+	if (unlikely(dev->log_size <= ((addr + len - 1) / VHOST_LOG_PAGE / 8)))
+		return;
+
+	page = addr / VHOST_LOG_PAGE;
+	while (page * VHOST_LOG_PAGE < addr + len) {
+		vhost_log_cache_page(dev, vq, page);
+		page += 1;
+	}
+}
+
+static __rte_always_inline void
+vhost_log_cache_used_vring(struct virtio_net *dev, struct vhost_virtqueue *vq,
+		     uint64_t offset, uint64_t len)
+{
+	vhost_log_cache_write(dev, vq, vq->log_guest_addr + offset, len);
+}
+
 static __rte_always_inline void
 vhost_log_used_vring(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		     uint64_t offset, uint64_t len)
diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index 5fdd4172b..ac42cdbd0 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -78,7 +78,7 @@ do_flush_shadow_used_ring(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	rte_memcpy(&vq->used->ring[to],
 			&vq->shadow_used_ring[from],
 			size * sizeof(struct vring_used_elem));
-	vhost_log_used_vring(dev, vq,
+	vhost_log_cache_used_vring(dev, vq,
 			offsetof(struct vring_used, ring[to]),
 			size * sizeof(struct vring_used_elem));
 }
@@ -106,6 +106,8 @@ flush_shadow_used_ring(struct virtio_net *dev, struct vhost_virtqueue *vq)
 
 	rte_smp_wmb();
 
+	vhost_log_cache_sync(dev, vq);
+
 	*(volatile uint16_t *)&vq->used->idx += vq->shadow_used_idx;
 	vhost_log_used_vring(dev, vq, offsetof(struct vring_used, idx),
 		sizeof(vq->used->idx));
@@ -130,7 +132,7 @@ do_data_copy_enqueue(struct virtio_net *dev, struct vhost_virtqueue *vq)
 
 	for (i = 0; i < count; i++) {
 		rte_memcpy(elem[i].dst, elem[i].src, elem[i].len);
-		vhost_log_write(dev, elem[i].log_addr, elem[i].len);
+		vhost_log_cache_write(dev, vq, elem[i].log_addr, elem[i].len);
 		PRINT_PACKET(dev, (uintptr_t)elem[i].dst, elem[i].len, 0);
 	}
 }
@@ -251,7 +253,7 @@ copy_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		virtio_enqueue_offload(m,
 				(struct virtio_net_hdr *)(uintptr_t)desc_addr);
 		PRINT_PACKET(dev, (uintptr_t)desc_addr, dev->vhost_hlen, 0);
-		vhost_log_write(dev, desc_gaddr, dev->vhost_hlen);
+		vhost_log_cache_write(dev, vq, desc_gaddr, dev->vhost_hlen);
 	} else {
 		struct virtio_net_hdr vnet_hdr;
 		uint64_t remain = dev->vhost_hlen;
@@ -274,7 +276,7 @@ copy_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq,
 					(void *)(uintptr_t)src, len);
 
 			PRINT_PACKET(dev, (uintptr_t)dst, (uint32_t)len, 0);
-			vhost_log_write(dev, guest_addr, len);
+			vhost_log_cache_write(dev, vq, guest_addr, len);
 			remain -= len;
 			guest_addr += len;
 			dst += len;
@@ -355,7 +357,8 @@ copy_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq,
 							desc_offset)),
 				rte_pktmbuf_mtod_offset(m, void *, mbuf_offset),
 				cpy_len);
-			vhost_log_write(dev, desc_gaddr + desc_offset, cpy_len);
+			vhost_log_cache_write(dev, vq, desc_gaddr + desc_offset,
+					cpy_len);
 			PRINT_PACKET(dev, (uintptr_t)(desc_addr + desc_offset),
 				     cpy_len, 0);
 		} else {
@@ -444,7 +447,7 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
 		vq->used->ring[used_idx].id = desc_indexes[i];
 		vq->used->ring[used_idx].len = pkts[i]->pkt_len +
 					       dev->vhost_hlen;
-		vhost_log_used_vring(dev, vq,
+		vhost_log_cache_used_vring(dev, vq,
 			offsetof(struct vring_used, ring[used_idx]),
 			sizeof(vq->used->ring[used_idx]));
 	}
@@ -504,6 +507,8 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
 
 	rte_smp_wmb();
 
+	vhost_log_cache_sync(dev, vq);
+
 	*(volatile uint16_t *)&vq->used->idx += count;
 	vq->last_used_idx += count;
 	vhost_log_used_vring(dev, vq,
@@ -767,7 +772,8 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct vhost_virtqueue *vq,
 
 					PRINT_PACKET(dev, (uintptr_t)dst,
 							(uint32_t)len, 0);
-					vhost_log_write(dev, guest_addr, len);
+					vhost_log_cache_write(dev, vq,
+							guest_addr, len);
 
 					remain -= len;
 					guest_addr += len;
@@ -776,7 +782,7 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct vhost_virtqueue *vq,
 			} else {
 				PRINT_PACKET(dev, (uintptr_t)hdr_addr,
 						dev->vhost_hlen, 0);
-				vhost_log_write(dev, hdr_phys_addr,
+				vhost_log_cache_write(dev, vq, hdr_phys_addr,
 						dev->vhost_hlen);
 			}
 
@@ -790,7 +796,8 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct vhost_virtqueue *vq,
 							desc_offset)),
 				rte_pktmbuf_mtod_offset(m, void *, mbuf_offset),
 				cpy_len);
-			vhost_log_write(dev, desc_gaddr + desc_offset, cpy_len);
+			vhost_log_cache_write(dev, vq, desc_gaddr + desc_offset,
+					cpy_len);
 			PRINT_PACKET(dev, (uintptr_t)(desc_addr + desc_offset),
 				cpy_len, 0);
 		} else {
@@ -1320,7 +1327,7 @@ update_used_ring(struct virtio_net *dev, struct vhost_virtqueue *vq,
 {
 	vq->used->ring[used_idx].id  = desc_idx;
 	vq->used->ring[used_idx].len = 0;
-	vhost_log_used_vring(dev, vq,
+	vhost_log_cache_used_vring(dev, vq,
 			offsetof(struct vring_used, ring[used_idx]),
 			sizeof(vq->used->ring[used_idx]));
 }
@@ -1335,6 +1342,8 @@ update_used_idx(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	rte_smp_wmb();
 	rte_smp_rmb();
 
+	vhost_log_cache_sync(dev, vq);
+
 	vq->used->idx += count;
 	vhost_log_used_vring(dev, vq, offsetof(struct vring_used, idx),
 			sizeof(vq->used->idx));
-- 
2.14.3



More information about the dev mailing list