[dpdk-dev] [PATCH v1 00/14] vhost packed ring performance optimization
Marvin Liu
yong.liu at intel.com
Thu Sep 5 18:14:07 CEST 2019
Packed ring has more compact ring format and thus can significantly
reduce the number of cache miss. It can lead to better performance.
This has been approved in virtio user driver, on normal E5 Xeon cpu
single core performance can raise 12%.
http://mails.dpdk.org/archives/dev/2018-April/095470.html
However vhost performance with packed ring performance was decreased.
Through analysis, mostly extra cost was from the calculating of each
descriptor flag which depended on ring wrap counter. Moreover, both
frontend and backend need to write same descriptors which will cause
cache contention. Especially when doing vhost enqueue function, virtio
refill packed ring function may write same cache line when vhost doing
enqueue function. This kind of extra cache cost will reduce the benefit
of reducing cache misses.
For optimizing vhost packed ring performance, vhost enqueue and dequeue
function will be splitted into fast and normal path.
Several methods will be taken in fast path:
Uroll burst loop function into more pieces.
Handle descriptors in one cache line simultaneously.
Prerequisite check that whether I/O space can copy directly into mbuf
space and vice versa.
Prerequisite check that whether descriptor mapping is successful.
Distinguish vhost descriptor update function by enqueue and dequeue
function.
Buffer dequeue used descriptors as many as possible.
Update enqueue used descriptors by cache line.
Cache memory region structure for fast conversion.
Disable sofware prefetch is hardware can do better.
After all these methods done, single core vhost PvP performance with 64B
packet on Xeon 8180 can boost 40%.
Marvin Liu (14):
vhost: add single packet enqueue function
vhost: add burst enqueue function for packed ring
vhost: add single packet dequeue function
vhost: add burst dequeue function
vhost: rename flush shadow used ring functions
vhost: flush vhost enqueue shadow ring by burst
vhost: add flush function for burst enqueue
vhost: buffer vhost dequeue shadow ring
vhost: split enqueue and dequeue flush functions
vhost: optimize Rx function of packed ring
vhost: add burst and single zero dequeue functions
vhost: optimize Tx function of packed ring
vhost: cache address translation result
vhost: check whether disable software pre-fetch
lib/librte_vhost/Makefile | 6 +
lib/librte_vhost/rte_vhost.h | 27 +
lib/librte_vhost/vhost.h | 13 +
lib/librte_vhost/virtio_net.c | 1094 +++++++++++++++++++++++++++------
4 files changed, 944 insertions(+), 196 deletions(-)
--
2.17.1
More information about the dev
mailing list