[dpdk-dev] [PATCH] vhost: improve dirty pages logging performance

Tiwei Bie tiwei.bie at intel.com
Thu May 3 13:56:34 CEST 2018


On Mon, Apr 30, 2018 at 05:59:54PM +0200, Maxime Coquelin wrote:
> This patch caches all dirty pages logging until the used ring index
> is updated. These dirty pages won't be accessed by the guest as
> long as the host doesn't give them back to it by updating the
> index.

Below sentence in above commit message isn't the reason why
we can cache the dirty page logging. Right?

"""
These dirty pages won't be accessed by the guest as
long as the host doesn't give them back to it by updating the
index.
"""

> 
> The goal of this optimization is to fix a performance regression
> introduced when the vhost library started to use atomic operations
> to set bits in the shared dirty log map. While the fix was valid
> as previous implementation wasn't safe against concurent accesses,
> contention was induced.
> 
> With this patch, during migration, we have:
> 1. Less atomic operations as only a single atomic OR operation
> per 32 pages.

Why not do it per 64 pages?

> 2. Less atomic operations as during a burst, the same page will
> be marked dirty only once.
> 3. Less write memory barriers.
> 
> Fixes: 897f13a1f726 ("vhost: make page logging atomic")
> 
> Cc: stable at dpdk.org
> 
> Suggested-by: Michael S. Tsirkin <mst at redhat.com>
> Signed-off-by: Maxime Coquelin <maxime.coquelin at redhat.com>
> ---
> 
> Hi,
> 
> This series was tested with migrating a guest while running PVP
> benchmark at 1Mpps with both ovs-dpdk and testpmd as vswitch.

If the throughput is higher (e.g. by adding more cores
and queues), will the live migration fail due to the
higher dirty page generating speed?

> 
> With this patch we recover the packet drops regressions seen since
> the use of atomic operations to log dirty pages.
[...]
>  
> +static __rte_always_inline void
> +vhost_log_cache_sync(struct virtio_net *dev, struct vhost_virtqueue *vq)
> +{
> +	uint32_t *log_base;
> +	int i;
> +
> +	if (likely(((dev->features & (1ULL << VHOST_F_LOG_ALL)) == 0) ||
> +		   !dev->log_base))
> +		return;
> +
> +	log_base = (uint32_t *)(uintptr_t)dev->log_base;
> +
> +	/* To make sure guest memory updates are committed before logging */
> +	rte_smp_wmb();

It seems that __sync_fetch_and_or() can be considered a full
barrier [1]. So do we really need this rte_smp_wmb()?

Besides, based on the same doc [1], it seems that the __sync_
version is deprecated in favor of the __atomic_ one.

[1] https://gcc.gnu.org/onlinedocs/gcc/_005f_005fsync-Builtins.html

> +
> +	for (i = 0; i < vq->log_cache_nb_elem; i++) {
> +		struct log_cache_entry *elem = vq->log_cache + i;
> +
> +		__sync_fetch_and_or(log_base + elem->offset, elem->val);
> +	}
> +
> +	vq->log_cache_nb_elem = 0;
> +}
> +
[...]


More information about the dev mailing list