[dpdk-dev] [PATCH v2] vhost: batch used descs chains write-back with packed ring

Maxime Coquelin maxime.coquelin at redhat.com
Thu Dec 20 09:49:55 CET 2018



On 12/20/18 5:44 AM, Tiwei Bie wrote:
> On Wed, Dec 19, 2018 at 10:29:52AM +0100, Maxime Coquelin wrote:
>> Instead of writing back descriptors chains in order, let's
>> write the first chain flags last in order to improve batching.
>>
>> With Kernel's pktgen benchmark, ~3% performance gain is measured.
>>
>> Signed-off-by: Maxime Coquelin <maxime.coquelin at redhat.com>
>> ---
>>
>> V2:
>> Revert back to initial implementation to have a write
>> barrier before every descs flags store, but still
>> store first desc flags last. (Missing barrier reported
>> by Ilya)
>>
>>
>>   lib/librte_vhost/virtio_net.c | 19 ++++++++++++++++---
>>   1 file changed, 16 insertions(+), 3 deletions(-)
>>
>> diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
>> index 8c657a101..de436af79 100644
>> --- a/lib/librte_vhost/virtio_net.c
>> +++ b/lib/librte_vhost/virtio_net.c
>> @@ -97,6 +97,8 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
>>   {
>>   	int i;
>>   	uint16_t used_idx = vq->last_used_idx;
>> +	uint16_t head_idx = vq->last_used_idx;
>> +	uint16_t head_flags = 0;
>>   
>>   	/* Split loop in two to save memory barriers */
>>   	for (i = 0; i < vq->shadow_used_idx; i++) {
>> @@ -126,12 +128,17 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
>>   			flags &= ~VRING_DESC_F_AVAIL;
>>   		}
>>   
>> -		vq->desc_packed[vq->last_used_idx].flags = flags;
>> +		if (i > 0) {
>> +			vq->desc_packed[vq->last_used_idx].flags = flags;
>>   
>> -		vhost_log_cache_used_vring(dev, vq,
>> +			vhost_log_cache_used_vring(dev, vq,
>>   					vq->last_used_idx *
>>   					sizeof(struct vring_packed_desc),
>>   					sizeof(struct vring_packed_desc));
>> +		} else {
>> +			head_idx = vq->last_used_idx;
>> +			head_flags = flags;
>> +		}
>>   
>>   		vq->last_used_idx += vq->shadow_used_packed[i].count;
>>   		if (vq->last_used_idx >= vq->size) {
>> @@ -140,7 +147,13 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
>>   		}
>>   	}
>>   
>> -	rte_smp_wmb();
>> +	vq->desc_packed[head_idx].flags = head_flags;
>> +
>> +	vhost_log_cache_used_vring(dev, vq,
>> +				vq->last_used_idx *
> 
> Should be head_idx.

Oh yes, thanks for spotting this.

> 
>> +				sizeof(struct vring_packed_desc),
>> +				sizeof(struct vring_packed_desc));
>> +
>>   	vq->shadow_used_idx = 0;
> 
> A wmb() is needed before log_cache_sync?

I think you're right, I was wrong but thought we had a barrier in cache
sync function.
That's not very important for x86, but I think it should be preferable 
to do it in vhost_log_cache_sync(), if logging is enabled.

What do you think?

>>   	vhost_log_cache_sync(dev, vq);
>>   }
>> -- 
>> 2.17.2
>>


More information about the dev mailing list