[dpdk-dev] [PATCH v9 7/9] net/virtio: add vectorized packed ring Tx path

Maxime Coquelin maxime.coquelin at redhat.com
Fri Apr 24 15:35:48 CEST 2020



On 4/24/20 3:33 PM, Liu, Yong wrote:
> 
> 
>> -----Original Message-----
>> From: Maxime Coquelin <maxime.coquelin at redhat.com>
>> Sent: Friday, April 24, 2020 8:30 PM
>> To: Liu, Yong <yong.liu at intel.com>; Ye, Xiaolong <xiaolong.ye at intel.com>;
>> Wang, Zhihong <zhihong.wang at intel.com>
>> Cc: dev at dpdk.org; Van Haaren, Harry <harry.van.haaren at intel.com>
>> Subject: Re: [PATCH v9 7/9] net/virtio: add vectorized packed ring Tx path
>>
>>
>>
>> On 4/24/20 11:24 AM, Marvin Liu wrote:
>>> Optimize packed ring Tx path alike Rx path. Split Tx path into batch and
>>
>> s/alike/like/ ?
>>
>>> single Tx functions. Batch function is further optimized by AVX512
>>> instructions.
>>>
>>> Signed-off-by: Marvin Liu <yong.liu at intel.com>
>>>
>>> diff --git a/drivers/net/virtio/virtio_ethdev.h
>> b/drivers/net/virtio/virtio_ethdev.h
>>> index 5c112cac7..b7d52d497 100644
>>> --- a/drivers/net/virtio/virtio_ethdev.h
>>> +++ b/drivers/net/virtio/virtio_ethdev.h
>>> @@ -108,6 +108,9 @@ uint16_t virtio_recv_pkts_vec(void *rx_queue,
>> struct rte_mbuf **rx_pkts,
>>>  uint16_t virtio_recv_pkts_packed_vec(void *rx_queue, struct rte_mbuf
>> **rx_pkts,
>>>  		uint16_t nb_pkts);
>>>
>>> +uint16_t virtio_xmit_pkts_packed_vec(void *tx_queue, struct rte_mbuf
>> **tx_pkts,
>>> +		uint16_t nb_pkts);
>>> +
>>>  int eth_virtio_dev_init(struct rte_eth_dev *eth_dev);
>>>
>>>  void virtio_interrupt_handler(void *param);
>>> diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
>>> index cf18fe564..f82fe8d64 100644
>>> --- a/drivers/net/virtio/virtio_rxtx.c
>>> +++ b/drivers/net/virtio/virtio_rxtx.c
>>> @@ -2175,3 +2175,11 @@ virtio_recv_pkts_packed_vec(void *rx_queue
>> __rte_unused,
>>>  {
>>>  	return 0;
>>>  }
>>> +
>>> +__rte_weak uint16_t
>>> +virtio_xmit_pkts_packed_vec(void *tx_queue __rte_unused,
>>> +			    struct rte_mbuf **tx_pkts __rte_unused,
>>> +			    uint16_t nb_pkts __rte_unused)
>>> +{
>>> +	return 0;
>>> +}
>>> diff --git a/drivers/net/virtio/virtio_rxtx_packed_avx.c
>> b/drivers/net/virtio/virtio_rxtx_packed_avx.c
>>> index 8a7b459eb..c023ace4e 100644
>>> --- a/drivers/net/virtio/virtio_rxtx_packed_avx.c
>>> +++ b/drivers/net/virtio/virtio_rxtx_packed_avx.c
>>> @@ -23,6 +23,24 @@
>>>  #define PACKED_FLAGS_MASK ((0ULL |
>> VRING_PACKED_DESC_F_AVAIL_USED) << \
>>>  	FLAGS_BITS_OFFSET)
>>>
>>> +/* reference count offset in mbuf rearm data */
>>> +#define REFCNT_BITS_OFFSET ((offsetof(struct rte_mbuf, refcnt) - \
>>> +	offsetof(struct rte_mbuf, rearm_data)) * BYTE_SIZE)
>>> +/* segment number offset in mbuf rearm data */
>>> +#define SEG_NUM_BITS_OFFSET ((offsetof(struct rte_mbuf, nb_segs) - \
>>> +	offsetof(struct rte_mbuf, rearm_data)) * BYTE_SIZE)
>>> +
>>> +/* default rearm data */
>>> +#define DEFAULT_REARM_DATA (1ULL << SEG_NUM_BITS_OFFSET | \
>>> +	1ULL << REFCNT_BITS_OFFSET)
>>> +
>>> +/* id bits offset in packed ring desc higher 64bits */
>>> +#define ID_BITS_OFFSET ((offsetof(struct vring_packed_desc, id) - \
>>> +	offsetof(struct vring_packed_desc, len)) * BYTE_SIZE)
>>> +
>>> +/* net hdr short size mask */
>>> +#define NET_HDR_MASK 0x3F
>>> +
>>>  #define PACKED_BATCH_SIZE (RTE_CACHE_LINE_SIZE / \
>>>  	sizeof(struct vring_packed_desc))
>>>  #define PACKED_BATCH_MASK (PACKED_BATCH_SIZE - 1)
>>> @@ -47,6 +65,48 @@
>>>  	for (iter = val; iter < num; iter++)
>>>  #endif
>>>
>>> +static inline void
>>> +virtio_xmit_cleanup_packed_vec(struct virtqueue *vq)
>>> +{
>>> +	struct vring_packed_desc *desc = vq->vq_packed.ring.desc;
>>> +	struct vq_desc_extra *dxp;
>>> +	uint16_t used_idx, id, curr_id, free_cnt = 0;
>>> +	uint16_t size = vq->vq_nentries;
>>> +	struct rte_mbuf *mbufs[size];
>>> +	uint16_t nb_mbuf = 0, i;
>>> +
>>> +	used_idx = vq->vq_used_cons_idx;
>>> +
>>> +	if (!desc_is_used(&desc[used_idx], vq))
>>> +		return;
>>> +
>>> +	id = desc[used_idx].id;
>>> +
>>> +	do {
>>> +		curr_id = used_idx;
>>> +		dxp = &vq->vq_descx[used_idx];
>>> +		used_idx += dxp->ndescs;
>>> +		free_cnt += dxp->ndescs;
>>> +
>>> +		if (dxp->cookie != NULL) {
>>> +			mbufs[nb_mbuf] = dxp->cookie;
>>> +			dxp->cookie = NULL;
>>> +			nb_mbuf++;
>>> +		}
>>> +
>>> +		if (used_idx >= size) {
>>> +			used_idx -= size;
>>> +			vq->vq_packed.used_wrap_counter ^= 1;
>>> +		}
>>> +	} while (curr_id != id);
>>> +
>>> +	for (i = 0; i < nb_mbuf; i++)
>>> +		rte_pktmbuf_free(mbufs[i]);
>>> +
>>> +	vq->vq_used_cons_idx = used_idx;
>>> +	vq->vq_free_cnt += free_cnt;
>>> +}
>>> +
>>
>>
>> I think you can re-use the inlined non-vectorized cleanup function here.
>> Or use your implementation in non-vectorized path.
>> BTW, do you know we have to pass the num argument in non-vectorized
>> case? I'm not sure to remember.
>>
> 
> Maxime,
> This is simple version of xmit clean up function. It is based on the concept that backend will update used id in burst which also match frontend's requirement.

And what the backend doesn't follow that concept?
It is just slower or broken?

> I just found original version work better in loopback case. Will adapt it in next version. 
> 
> Thanks,
> Marvin
> 
>> Maxime
> 



More information about the dev mailing list