[dpdk-dev] [PATCH v10 6/9] net/virtio: add vectorized packed ring Rx path

Maxime Coquelin maxime.coquelin at redhat.com
Tue Apr 28 16:50:27 CEST 2020



On 4/28/20 4:43 PM, Liu, Yong wrote:
> 
> 
>> -----Original Message-----
>> From: Maxime Coquelin <maxime.coquelin at redhat.com>
>> Sent: Tuesday, April 28, 2020 9:46 PM
>> To: Liu, Yong <yong.liu at intel.com>; Ye, Xiaolong <xiaolong.ye at intel.com>;
>> Wang, Zhihong <zhihong.wang at intel.com>
>> Cc: dev at dpdk.org; Honnappa Nagarahalli
>> <Honnappa.Nagarahalli at arm.com>; jerinj at marvell.com
>> Subject: Re: [PATCH v10 6/9] net/virtio: add vectorized packed ring Rx path
>>
>>
>>
>> On 4/28/20 3:01 PM, Liu, Yong wrote:
>>>>> Maxime,
>>>>> Thanks for point it out, it will add extra cache miss in datapath.
>>>>> And its impact on performance is around 1% in loopback case.
>>>> Ok, thanks for doing the test. I'll try to run some PVP benchmarks
>>>> on my side because when doing IO loopback, the cache pressure is
>>>> much less important.
>>>>
>>>>> While benefit of vectorized path will be more than that number.
>>>> Ok, but I disagree for two reasons:
>>>>  1. You have to keep in mind than non-vectorized is the default and
>>>> encouraged mode to use. Indeed, it takes a lot of shortcuts like not
>>>> checking header length (so no error stats), etc...
>>>>
>>> Ok, I will keep non-vectorized same as before.
>>>
>>>>  2. It's like saying it's OK it degrades by 5% on $CPU_VENDOR_A because
>>>> the gain is 20% on $CPU_VENDOR_B.
>>>>
>>>> In the case we see more degradation in real-world scenario, you might
>>>> want to consider using ifdefs to avoid adding padding in the non-
>>>> vectorized case, like you did to differentiate Virtio PMD to Virtio-user
>>>> PMD in patch 7.
>>>>
>>> Maxime,
>>> The performance difference is so slight, so I ignored for it look like a
>> sampling error.
>>
>> Agree for IO loopback, but it adds one more cache line access per burst,
>> which might be see in some real-life use cases.
>>
>>> It maybe not suitable to add new configuration for such setting which
>> only used inside driver.
>>
>> Wait, the Virtio-user #ifdef is based on the defconfig options? How can
>> it work since both Virtio PMD and Virtio-user PMD can be selected at the
>> same time?
>>
>> I thought it was a define set before the headers inclusion and unset
>> afterwards, but I didn't checked carefully.
>>
> 
> Maxime,
> The difference between virtio PMD and Virtio-user PMD addresses is handled by vq->offset. 
> 
> When virtio PMD is running, offset will be set to buf_iova.
> vq->offset = offsetof(struct rte_mbuf, buf_iova);
> 
> When virtio_user PMD is running, offset will be set to buf_addr.
> vq->offset = offsetof(struct rte_mbuf, buf_addr);

Ok, but below is a build time check:

+#ifdef RTE_VIRTIO_USER
+	__m128i flag_offset = _mm_set_epi64x(flags_temp, (uint64_t)vq->offset);
+#else
+	__m128i flag_offset = _mm_set_epi64x(flags_temp, 0);
+#endif

So how can it work for a single build for both Virtio and Virtio-user?

>>> Virtio driver can check whether virtqueue is using vectorized path when
>> initialization, will use padded structure if it is.
>>> I have added some tested code and now performance came back.  Since
>> code has changed in initialization process,  it need some time for regression
>> check.
>>
>> Ok, works for me.
>>
>> I am investigating a linkage issue with your series, which does not
>> happen systematically (see below, it happens also with clang). David
>> pointed me to some Intel patches removing the usage if __rte_weak,
>> could it be related?
>>
> 
> I checked David's patch, it only changed i40e driver. Meanwhile attribute __rte_weak should still be in virtio_rxtx.c. 
> I will follow David's patch, eliminate the usage of weak attribute. 

Yeah, I meant below issue could be linked to __rte_weak, not that i40e
patch was the cause of this problem.




More information about the dev mailing list