[dpdk-dev] [PATCH 0/4] Support DMA-accelerated Tx operations for vhost-user PMD

Maxime Coquelin maxime.coquelin at redhat.com
Thu Mar 26 09:47:53 CET 2020



On 3/26/20 9:25 AM, Hu, Jiayu wrote:
> Hi Maxime,
> 
>> -----Original Message-----
>> From: Maxime Coquelin <maxime.coquelin at redhat.com>
>> Sent: Thursday, March 26, 2020 3:53 PM
>> To: Hu, Jiayu <jiayu.hu at intel.com>; dev at dpdk.org
>> Cc: Ye, Xiaolong <xiaolong.ye at intel.com>; Wang, Zhihong
>> <zhihong.wang at intel.com>
>> Subject: Re: [PATCH 0/4] Support DMA-accelerated Tx operations for vhost-
>> user PMD
>>
>> Hi Jiayu,
>>
>> On 3/19/20 12:47 PM, Hu, Jiayu wrote:
>>
>>>>
>>>> Ok, so what about:
>>>>
>>>> Introducing a pair of callbacks in struct virtio_net for DMA enqueue and
>>>> dequeue.
>>>>
>>>> lib/librte_vhost/ioat.c which would implement dma_enqueue and
>>>> dma_dequeue callback for IOAT. As it will live in the vhost lib
>>>> directory, it will be easy to refactor the code to share as much as
>>>> possible and so avoid code duplication.
>>>>
>>>> In rte_vhost_enqueue/dequeue_burst, if the dma callback is set, then
>>>> call it instead of the SW datapath. It adds a few cycle, but this is
>>>> much more sane IMHO.
>>>
>>> The problem is that current semantics of rte_vhost_enqueue/dequeue API
>>> are conflict with I/OAT accelerated data path. To improve the performance,
>>> the I/OAT works in an asynchronous manner, where the CPU just submits
>>> copy jobs to the I/OAT without waiting for its copy completion. For
>>> rte_vhost_enqueue_burst, users cannot reuse enqueued pktmbufs when
>> it
>>> returns, as the I/OAT may still use them. For rte_vhost_dequeue_burst,
>>> users will not get incoming packets as the I/OAT is still performing packet
>>> copies. As you can see, when enabling I/OAT acceleration, the semantics of
>>> the two API are changed. If we keep the same API name but changing their
>>> semantic, this may confuse users, IMHO.
>>
>> Ok, so it is basically the same as zero-copy for dequeue path, right?
>> If a new API is necessary, then it would be better to add it in Vhost
>> library for async enqueue/dequeue.
>> It could be used also for Tx zero-copy, and so the sync version would
>> save some cycles as we could remove the zero-copy support there.
>>
>> What do you think?
> 
> Yes, you are right. The better way is to provide new API with asynchronous
> semantics in vhost library. In addition, the vhost library better provides DMA
> operation callbacks to avoid using vender specific API. The asynchronous API may
> look like rte_vhost_try_enqueue_burst() and rte_vhost_get_completed_packets().
> The first one is to perform enqueue logic, and the second one is to return
> pktmbufs whose all copies are completed to users. How do you think?

That looks good to me, great!
The only think is the naming of the API. I need t think more about it,
but it does not prevent to start working on the implementation.

Regarding the initialization, I was thinking we could introduce new
flags to rte_vhost_driver_register:
- RTE_VHOST_USER_TX_DMA
- RTE_VHOST_USER_RX_DMA

Well, only Tx can be implemented for now, but the Rx flag can be
reserved.

The thing I'm not clear is when no DMA is available, how do we fallback
to the sync API.

Should the user still call rte_vhost_try_enqueue_burst(), but if no DMA,
it will call the rte_vhost_enqueue_burst() directly and then
rte_vhost_get_completed_packets() will return all the mbufs?

Thanks,
Maxime

> Thanks,
> Jiayu
> 
>>
>> I really object to implement vring handling into the Vhost PMD, this is
>> the role of the Vhost library.
>>
>> Thanks,
>> Maxime
> 



More information about the dev mailing list