[dpdk-dev] [PATCH] vhost_user: protect active rings from async ring changes

Maxime Coquelin maxime.coquelin at redhat.com
Fri Dec 8 11:11:40 CET 2017


Hi Jianfeng,

On 12/08/2017 09:51 AM, Tan, Jianfeng wrote:
> 
> 
>> -----Original Message-----
>> From: Maxime Coquelin [mailto:maxime.coquelin at redhat.com]
>> Sent: Friday, December 8, 2017 4:36 PM
>> To: Tan, Jianfeng; Victor Kaplansky; dev at dpdk.org; yliu at fridaylinux.org; Bie,
>> Tiwei
>> Cc: stable at dpdk.org; jfreiman at redhat.com
>> Subject: Re: [PATCH] vhost_user: protect active rings from async ring
>> changes
>>
>>
>>
>> On 12/08/2017 03:14 AM, Tan, Jianfeng wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Maxime Coquelin [mailto:maxime.coquelin at redhat.com]
>>>> Sent: Thursday, December 7, 2017 6:02 PM
>>>> To: Tan, Jianfeng; Victor Kaplansky; dev at dpdk.org; yliu at fridaylinux.org;
>> Bie,
>>>> Tiwei
>>>> Cc: stable at dpdk.org; jfreiman at redhat.com
>>>> Subject: Re: [PATCH] vhost_user: protect active rings from async ring
>>>> changes
>>>>
>>>>
>>>>
>>>> On 12/07/2017 10:33 AM, Tan, Jianfeng wrote:
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Victor Kaplansky [mailto:vkaplans at redhat.com]
>>>>>> Sent: Wednesday, December 6, 2017 9:56 PM
>>>>>> To: dev at dpdk.org; yliu at fridaylinux.org; Bie, Tiwei; Tan, Jianfeng;
>>>>>> vkaplans at redhat.com
>>>>>> Cc: stable at dpdk.org; jfreiman at redhat.com; Maxime Coquelin
>>>>>> Subject: [PATCH] vhost_user: protect active rings from async ring
>> changes
>>>>>>
>>>>>> When performing live migration or memory hot-plugging,
>>>>>> the changes to the device and vrings made by message handler
>>>>>> done independently from vring usage by PMD threads.
>>>>>>
>>>>>> This causes for example segfauls during live-migration
>>>>>
>>>>> segfauls ->segfaults?
>>>>>
>>>>>> with MQ enable, but in general virtually any request
>>>>>> sent by qemu changing the state of device can cause
>>>>>> problems.
>>>>>>
>>>>>> These patches fixes all above issues by adding a spinlock
>>>>>> to every vring and requiring message handler to start operation
>>>>>> only after ensuring that all PMD threads related to the divece
>>>>>
>>>>> Another typo: divece.
>>>>>
>>>>>> are out of critical section accessing the vring data.
>>>>>>
>>>>>> Each vring has its own lock in order to not create contention
>>>>>> between PMD threads of different vrings and to prevent
>>>>>> performance degradation by scaling queue pair number.
>>>>>
>>>>> Also wonder how much overhead it brings.
>>>>>
>>>>> Instead of locking each vring, can we just, waiting a while (10us for
>> example)
>>>> after call destroy_device() callback so that every PMD thread has enough
>>>> time to skip out the criterial area?
>>>>
>>>> No, because we are not destroying the device when it is needed.
>>>> Actually, once destroy_device() is called, it is likely that the
>>>> application has taken care the ring aren't being processed anymore
>>>> before returning from the callback (This is at least the case with Vhost
>>>> PMD).
>>>
>>> OK, I did not put it right way as there are multiple cases above: migration
>> and memory hot plug. Let me try again:
>>>
>>> Whenever a vhost thread handles a message affecting PMD threads, (like
>> SET_MEM_TABLE, GET_VRING_BASE, etc) we can remove the dev flag -
>> VIRTIO_DEV_RUNNING, and wait for a while so that PMD threads skip out of
>> those criterial area. After message handling, reset the flag -
>> VIRTIO_DEV_RUNNING.
>>
>> I think you mean clearing vq's enabled flag, because PMD threads never
>> check the VIRTIO_DEV_RUNNING flag.
> 
> Ah, yes.
> 
>>
>>> I suppose it can work, basing on an assumption that PMD threads work in
>> polling mode and can skip criterial area quickly and inevitably.
>>
>> That sounds very fragile, because if the CPU aren't perfectly isolated,
>> your PMD thread can be preempted for interrupt handling for example.
>>
>> Or what if for some reason the PMD thread CPU stalls for a short while?
>>
>> The later is unlikely, but if it happens, it will be hard to debug.
>>
>> Let's see first the performance impact of using the spinlock. It might
>> not be that important because 99.9999% of the times, it will not even
>> spin.
> 
> Fair enough.

I did some benchmarks on my Broadwell test bench (see patch below), and
it seems that depending on the benchmark, perfs are on par, or better
with the spinlock! I guess it explains because with the spinlock there
is a better batching and less concurrent accesses on the rings, but I'm
not sure.

Please find my results below (CPU E5-2667 v4 @ 3.20GHz):

       Bench         v17.11     v17.11 + spinlock
  ---------------- ----------- -------------------
   PVP Bidir run1   19.29Mpps   19.26Mpps
   PVP Bidir run2   19.26Mpps   19.28Mpps
   TxOnly           18.47Mpps   18.83Mpps
   RxOnly           13.83Mpps   13.83Mpps
   IO Loopback      7.94Mpps    7.97Mpps



Patch:
---------------------------------------------------------

 From 7e18cefce682235558fed66a1fb87ab937ec9297 Mon Sep 17 00:00:00 2001
From: Maxime Coquelin <maxime.coquelin at redhat.com>
Date: Fri, 8 Dec 2017 04:21:25 -0500
Subject: [PATCH] vhost: spinlock benchmarking

Signed-off-by: Maxime Coquelin <maxime.coquelin at redhat.com>
---
  lib/librte_vhost/vhost.c      |  2 ++
  lib/librte_vhost/vhost.h      |  2 ++
  lib/librte_vhost/virtio_net.c | 12 ++++++++++++
  3 files changed, 16 insertions(+)

diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index 4f8b73a..d9752cf 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -219,6 +219,8 @@ struct virtio_net *
  	vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
  	vq->callfd = VIRTIO_UNINITIALIZED_EVENTFD;

+	rte_spinlock_init(&vq->access_lock);
+
  	vhost_user_iotlb_init(dev, vring_idx);
  	/* Backends are set to -1 indicating an inactive device. */
  	vq->backend = -1;
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 1cc81c1..56242a8 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -131,6 +131,8 @@ struct vhost_virtqueue {
  	struct batch_copy_elem	*batch_copy_elems;
  	uint16_t		batch_copy_nb_elems;

+	rte_spinlock_t access_lock;
+
  	rte_rwlock_t	iotlb_lock;
  	rte_rwlock_t	iotlb_pending_lock;
  	struct rte_mempool *iotlb_pool;
diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index 6fee16e..50cc3de 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -329,6 +329,8 @@
  	if (unlikely(vq->enabled == 0))
  		return 0;

+	rte_spinlock_lock(&vq->access_lock);
+
  	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
  		vhost_user_iotlb_rd_lock(vq);

@@ -419,6 +421,8 @@
  	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
  		vhost_user_iotlb_rd_unlock(vq);

+	rte_spinlock_unlock(&vq->access_lock);
+
  	return count;
  }

@@ -654,6 +658,8 @@
  	if (unlikely(vq->enabled == 0))
  		return 0;

+	rte_spinlock_lock(&vq->access_lock);
+
  	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
  		vhost_user_iotlb_rd_lock(vq);

@@ -715,6 +721,8 @@
  	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
  		vhost_user_iotlb_rd_unlock(vq);

+	rte_spinlock_unlock(&vq->access_lock);
+
  	return pkt_idx;
  }

@@ -1183,6 +1191,8 @@
  	if (unlikely(vq->enabled == 0))
  		return 0;

+	rte_spinlock_lock(&vq->access_lock);
+
  	vq->batch_copy_nb_elems = 0;

  	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
@@ -1356,6 +1366,8 @@
  	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
  		vhost_user_iotlb_rd_unlock(vq);

+	rte_spinlock_unlock(&vq->access_lock);
+
  	if (unlikely(rarp_mbuf != NULL)) {
  		/*
  		 * Inject it to the head of "pkts" array, so that switch's mac
-- 
1.8.3.1


More information about the dev mailing list