[dpdk-users] Run-to-completion or Pipe-line for QAT PMD in DPDK

Trahe, Fiona fiona.trahe at intel.com
Fri Jan 18 14:13:43 CET 2019

Hi Alex,

> -----Original Message-----
> From: users [mailto:users-bounces at dpdk.org] On Behalf Of Changchun Zhang
> Sent: Thursday, January 17, 2019 11:01 PM
> To: users at dpdk.org
> Subject: [dpdk-users] Run-to-completion or Pipe-line for QAT PMD in DPDK
> Hi,
> I have user question on using the QAT device in the DPDK.
> In the real design, after calling enqueuer_burst() on the specified queue pair at one of the lcore,
> usually which one is usually done?
> 1.     should we do run-to-completion to call dequeuer_burst() waiting for the device finishing the
> crypto operation,
> 2.     or should we do pipe-line, in which we return right after enqueuer_burst() and release the CPU.
> And call dequeuer_burst() on other thread function?
> Option 1 is more like synchronous and can be seen on all the DPDK crypto examples, while option 2 is
> asynchronous which I have never seen in any reference design if I missed anything.
Option 2 is not possible with QAT - the dequeue must be called in the same thread as the enqueue. This is
optimised without atomics for best performance - if this is a problem let us know. 
However best performance is not quite using option 1 and not a synchronous blocking method. 
If you enqueue and then go straight to dequeue, you're not getting the best advantage from the
cycles freed up by  offloading. 
i.e. best to enqueue a burst, then go do some other work, like maybe collecting more requests for 
next enqueue or other processing, then dequeue. Take and process whatever ops are dequeued - this
will not necessarily match up with the number you've enqueued - depends on how quickly you call the dequeue.
Don't wait until all the enqueued ops are dequeued before enqueuing the next batch.
SO it's asynchronous. But in the same thread.
You'll get best throughput when you keep the input filled up so the device has operations to work on and
regularly dequeue a burst. Dequeuing too often will waste cycles in the overhead calling the API, dequeuing too
slowly will cause the device to back up. Ideally tune for your application to find the sweet spot in
between these 2 extremes.  

More information about the users mailing list