[dpdk-dev] [PATCH 03/21] vhost: protect virtio_net device struct

Maxime Coquelin maxime.coquelin at redhat.com
Tue Sep 5 13:00:42 CEST 2017



On 09/05/2017 12:07 PM, Tiwei Bie wrote:
> On Tue, Sep 05, 2017 at 11:24:14AM +0200, Maxime Coquelin wrote:
>> On 09/05/2017 06:45 AM, Tiwei Bie wrote:
>>> On Thu, Aug 31, 2017 at 11:50:05AM +0200, Maxime Coquelin wrote:
>>>> virtio_net device might be accessed while being reallocated
>>>> in case of NUMA awareness. This case might be theoretical,
>>>> but it will be needed anyway to protect vrings pages against
>>>> invalidation.
>>>>
>>>> The virtio_net devs are now protected with a readers/writers
>>>> lock, so that before reallocating the device, it is ensured
>>>> that it is not being referenced by the processing threads.
>>>>
>>> [...]
>>>> +struct virtio_net *
>>>> +get_device(int vid)
>>>> +{
>>>> +	struct virtio_net *dev;
>>>> +
>>>> +	rte_rwlock_read_lock(&vhost_devices[vid].lock);
>>>> +
>>>> +	dev = __get_device(vid);
>>>> +	if (unlikely(!dev))
>>>> +		rte_rwlock_read_unlock(&vhost_devices[vid].lock);
>>>> +
>>>> +	return dev;
>>>> +}
>>>> +
>>>> +void
>>>> +put_device(int vid)
>>>> +{
>>>> +	rte_rwlock_read_unlock(&vhost_devices[vid].lock);
>>>> +}
>>>> +
>>>
>>> This patch introduced a per-device rwlock which needs to be acquired
>>> unconditionally in the data path. So for each vhost device, the IO
>>> threads of different queues will need to acquire/release this lock
>>> during each enqueue and dequeue operation, which will cause cache
>>> contention when multiple queues are enabled and handled by different
>>> cores. With this patch alone, I saw ~7% performance drop when enabling
>>> 6 queues to do 64bytes iofwd loopback test. Is there any way to avoid
>>> introducing this lock to the data path?
>>
>> First, I'd like to thank you for running the MQ test.
>> I agree it may have a performance impact in this case.
>>
>> This lock has currently two purposes:
>> 1. Prevent referencing freed virtio_dev struct in case of numa_realloc.
>> 2. Protect vring pages against invalidation.
>>
>> For 2., it can be fixed by using the per-vq IOTLB lock (it was not the
>> case in my early prototypes that had per device IOTLB cache).
>>
>> For 1., this is an existing problem, so we might consider it is
>> acceptable to keep current state. Maybe it could be improved by only
>> reallocating in case VQ0 is not on the right NUMA node, the other VQs
>> not being initialized at this point.
>>
>> If we do this we might be able to get rid of this lock, I need some more
>> time though to ensure I'm not missing something.
>>
>> What do you think?
>>
> 
> Cool. So it's possible that the lock in the data path will be
> acquired only when the IOMMU feature is enabled. It will be
> great!
> 
> Besides, I just did a very simple MQ test to verify my thoughts.
> Lei (CC'ed in this mail) may do a thorough performance test for
> this patch set to evaluate the performance impacts.

I'll try to post v2 this week including the proposed change.
Maybe it'll be better Lei waits for the v2.

Thanks,
Maxime

> Best regards,
> Tiwei Bie
> 


More information about the dev mailing list