[dpdk-dev] [PATCH v5 0/7] virtio ring layout optimization and simple rx/tx processing

Tan, Jianfeng jianfeng.tan at intel.com
Tue Oct 27 02:44:09 CET 2015



> -----Original Message-----
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Huawei Xie
> Sent: Sunday, October 25, 2015 11:35 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v5 0/7] virtio ring layout optimization and simple
> rx/tx processing
> 
> Changes in v5:
> - Call __rte_pktmbuf_prefree_seg to check refcnt when free mbufs
> 
> Changes in v4:
> - Fix the error in virtio tx ring layout ascii chart in the commit message
> - Move virtio_xmit_cleanup ahead to free descriptors earlier
> - Test merge-able feature when select simple rx/tx functions
> 
> Changes in v3:
> - Remove unnecessary NULL test for rte_free
> - Remove unnecessary assign of local var after free
> - Remove return at the end of void function
> - Remove always_inline attribute for virtio_xmit_cleanup
> - Reword some commit messages
> - Add TODO in the commit message of simple tx patch
> 
> Changes in v2:
> - Remove the configure macro
> - Enable simple R/TX processing when user specifies simple txq flags
> - Reword some comments and commit messages
> 
> In DPDK based switching enviroment, mostly vhost runs on a dedicated core
> while virtio processing in guest VMs runs on other different cores.
> Take RX for example, with generic implementation, for each guest buffer,
> a) virtio driver allocates a descriptor from free descriptor list
> b) modify the entry of avail ring to point to allocated descriptor
> c) after packet is received, free the descriptor
> 
> When vhost fetches the avail ring, it need to fetch the modified L1 cache
> from virtio core, which is a heavy cost in current CPU implementation.
> 
> This idea of this optimization is:
>     allocate the fixed descriptor for each entry of avail ring, so avail ring will
> always be the same during the run.
> This removes L1M cache transfer from virtio core to vhost core for avail ring.
> (Note we couldn't avoid the cache transfer for descriptors).
> Besides, descriptor allocation and free operation is eliminated.
> This also makes vector procesing possible to further accelerate the
> processing.
> 
> This is the layout for the avail ring(take 256 ring entries for example), with
> each entry pointing to the descriptor with the same index.
>                     avail
>                     idx
>                     +
>                     |
> +----+----+---+-------------+------+
> | 0  | 1  | 2 | ... |  254  | 255  |  avail ring
> +-+--+-+--+-+-+---------+---+--+---+
>   |    |    |       |   |      |
>   |    |    |       |   |      |
>   v    v    v       |   v      v
> +-+--+-+--+-+-+---------+---+--+---+
> | 0  | 1  | 2 | ... |  254  | 255  |  desc ring
> +----+----+---+-------------+------+
>                     |
>                     |
> +----+----+---+-------------+------+
> | 0  | 1  | 2 |     |  254  | 255  |  used ring
> +----+----+---+-------------+------+
>                     |
>                     +
> 
> This is the ring layout for TX.
> As we need one virtio header for each xmit packet, we have 128 slots
> available.
> 
>                          ++
>                          ||
>                          ||
> +-----+-----+-----+--------------+------+------+------+
> |  0  |  1  | ... |  127 || 128  | 129  | ...  | 255  |   avail ring
> +--+--+--+--+-----+---+------+---+--+---+------+--+---+
>    |     |            |  ||  |      |             |
>    v     v            v  ||  v      v             v
> +--+--+--+--+-----+---+------+---+--+---+------+--+---+
> | 127 | 128 | ... |  255 || 127  | 128  | ...  | 255  |   desc ring for virtio_net_hdr
> +--+--+--+--+-----+---+------+---+--+---+------+--+---+
>    |     |            |  ||  |      |             |
>    v     v            v  ||  v      v             v
> +--+--+--+--+-----+---+------+---+--+---+------+--+---+
> |  0  |  1  | ... |  127 ||  0   |  1   | ...  | 127  |   desc ring for tx dat
> +-----+-----+-----+--------------+------+------+------+
>                          ||
>                          ||
>                          ++
> 
> 
> Performance boost could be observed only if the virtio backend isn't the
> bottleneck or in VM2VM case.
> There are also several vhost optimization patches to be submitted later.
> 
> 
> Huawei Xie (7):
>   virtio: add virtio_rxtx.h header file
>   virtio: add software rx ring, fake_buf into virtqueue
>   virtio: rx/tx ring layout optimization
>   virtio: fill RX avail ring with blank mbufs
>   virtio: virtio vec rx
>   virtio: simple tx routine
>   virtio: pick simple rx/tx func
> 
>  drivers/net/virtio/Makefile             |   2 +-
>  drivers/net/virtio/virtio_ethdev.c      |  12 +-
>  drivers/net/virtio/virtio_ethdev.h      |   5 +
>  drivers/net/virtio/virtio_rxtx.c        |  56 ++++-
>  drivers/net/virtio/virtio_rxtx.h        |  39 +++
>  drivers/net/virtio/virtio_rxtx_simple.c | 414
> ++++++++++++++++++++++++++++++++
>  drivers/net/virtio/virtqueue.h          |   5 +
>  7 files changed, 529 insertions(+), 4 deletions(-)  create mode 100644
> drivers/net/virtio/virtio_rxtx.h  create mode 100644
> drivers/net/virtio/virtio_rxtx_simple.c
> 
> --
> 1.8.1.4


Acked-by Jianfeng Tan <jianfeng.tan at intel.com>

Thanks,
Jianfeng


More information about the dev mailing list