[dpdk-dev] [PATCH v3 1/2] virtio: one way barrier for packed vring desc avail flags
maxime.coquelin at redhat.com
Tue Sep 10 12:17:06 CEST 2019
Thanks Yinan for reporting the regresion and Gavin for the analysis.
On 9/10/19 11:48 AM, Gavin Hu (Arm Technology China) wrote:
> Hi Yinan,
> We have done a comparative analysis and found with the old code the if(weak_barriers) and else branches were saved on x86 as rte_smp_wmb and rte_cio_wmb are identical.
> For the new code, with Joyce's patches applied, the branches were not saved, which requir additional cpu cycles, this caused slight degradation on x86.
> The patches uplifted the performance on aarch64 about 9% as indicated in the cover letter. While I am thinking over a solution to the degradation on x86,could you help answer:
> 1. Is rte_cio_wmb is sufficient for the non weak-barrier case(HW offloading)?
> I got this question because I see in Intel NIC PMDs, it is almost never used, it is rte_wmb that is more widely used to notify the NIC device, any difference between the virtio ring compatible smartNIC device(or vDPA?) and i40e like devices?
> 2. If the rte_cio_wmb is not sufficient for this case and replaced by stronger barriers, like sfence, then the branches will not be saved by the compiler, then the problem becomes with the correct use of barriers, other than the degradation.
> Any comments are welcome!
It may we worth that Yinan tries with rte_wmb instead of rte_cio_wmb
without the series applied, just to confirm this is caused by the etra
> Best Regards,
>> -----Original Message-----
>> From: Wang, Yinan <yinan.wang at intel.com>
>> Sent: Tuesday, September 10, 2019 11:54 AM
>> To: Maxime Coquelin <maxime.coquelin at redhat.com>; Joyce Kong (Arm
>> Technology China) <Joyce.Kong at arm.com>; dev at dpdk.org
>> Cc: nd <nd at arm.com>; Bie, Tiwei <tiwei.bie at intel.com>; Wang, Zhihong
>> <zhihong.wang at intel.com>; amorenoz at redhat.com; Wang, Xiao W
>> <xiao.w.wang at intel.com>; Liu, Yong <yong.liu at intel.com>;
>> jfreimann at redhat.com; Honnappa Nagarahalli
>> <Honnappa.Nagarahalli at arm.com>; Gavin Hu (Arm Technology China)
>> <Gavin.Hu at arm.com>
>> Subject: RE: [dpdk-dev] [PATCH v3 1/2] virtio: one way barrier for packed vring
>> desc avail flags
>> Hi Joyce,
>> I just test performance impact of your patch set with code base commit id:
>> d03d8622db48918d14bfe805641b1766ecc40088, after applying your v3 patch
>> set , seven paths of vhost/virtio pvp test shows performance drop as below:
>> PVP vhost/virtio 1c1q test before apply patch apply patch
>> test_perf_pvp_inorder_mergeable 7.603 7.474
>> test_perf_pvp_inorder_no_mergeable 7.642 7.525
>> test_perf_pvp_mergeable 7.556 7.431
>> test_perf_pvp_normal 7.554 7.478
>> test_perf_pvp_vector_rx 7.581 7.469
>> test_perf_pvp_virtio11_mergeable 7.068 6.905
>> test_perf_pvp_virtio11_normal 7.088 6.888
>>> -----Original Message-----
>>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Maxime Coquelin
>>> Sent: 2019年9月9日 18:10
>>> To: Joyce Kong <joyce.kong at arm.com>; dev at dpdk.org
>>> Cc: nd at arm.com; Bie, Tiwei <tiwei.bie at intel.com>; Wang, Zhihong
>>> <zhihong.wang at intel.com>; amorenoz at redhat.com; Wang, Xiao W
>>> <xiao.w.wang at intel.com>; Liu, Yong <yong.liu at intel.com>;
>>> jfreimann at redhat.com; honnappa.nagarahalli at arm.com;
>> gavin.hu at arm.com
>>> Subject: Re: [dpdk-dev] [PATCH v3 1/2] virtio: one way barrier for packed
>>> desc avail flags
>>> On 9/9/19 11:14 AM, Joyce Kong wrote:
>>>> In case VIRTIO_F_ORDER_PLATFORM(36) is not negotiated, then the
>>>> frontend and backend are assumed to be implemented in software, that
>>>> is they can run on identical CPUs in an SMP configuration.
>>>> Thus a weak form of memory barriers like rte_smp_r/wmb, other than
>>>> rte_cio_r/wmb, is sufficient for this case(vq->hw->weak_barriers == 1)
>>>> and yields better performance.
>>>> For the above case, this patch helps yielding even better performance
>>>> by replacing the two-way barriers with C11 one-way barriers for avail
>>>> flags in packed ring.
>>>> Meanwhile, a read barrier is required to ensure ordering between
>>>> descriptor's flags and content reads. With C11, load-acquire can
>>>> enforce the ordering instead of rmb barrier.
>>>> Signed-off-by: Joyce Kong <joyce.kong at arm.com>
>>>> Reviewed-by: Gavin Hu <gavin.hu at arm.com>
>>>> Reviewed-by: Phil Yang <phil.yang at arm.com>
>>>> drivers/net/virtio/virtio_rxtx.c | 13 +++++++------
>>>> drivers/net/virtio/virtio_user/virtio_user_dev.c | 6 +++++-
>>>> drivers/net/virtio/virtqueue.h | 11 +++++++++++
>>>> lib/librte_vhost/vhost.h | 2 +-
>>>> lib/librte_vhost/virtio_net.c | 11 +++++------
>>>> 5 files changed, 29 insertions(+), 14 deletions(-)
>>> Reviewed-by: Maxime Coquelin <maxime.coquelin at redhat.com>
More information about the dev