[dpdk-dev] [RFC] Accelerator API to chain packet processing functions

Coyle, David david.coyle at intel.com
Fri Feb 7 13:38:09 CET 2020

Hi Jerin, see below

> On Thu, Feb 6, 2020 at 10:01 PM Coyle, David <david.coyle at intel.com>
> wrote:
> Hi David,
> > >
> > >
> > > > > > - XGS-PON MAC: Crypto-CRC-BIP
> > > > > >         - Order:
> > > > > >                 - Downstream: CRC, Encrypt, BIP
> > > > >
> > > > > I understand if the chain has two operations then it may
> > > > > possible to have handcrafted SW code to do both operations in one
> pass.
> > > > > I understand the spec is agnostic on a number of passes it does
> > > > > require to enable the xfrom but To understand the SW/HW
> > > > > capability, In the above case, "CRC, Encrypt, BIP", It is done
> > > > > in one pass in SW or three passes in SW or one pass using HW?
> > > >
> > > > [DC] The CRC, Encrypt, BIP is also currently done as 1 pass in
> > > > AESNI MB
> > > library SW.
> > > > However, this could also be performed as a single pass in a HW
> > > > accelerator
> > >
> > > As a specification, cascading the xform chains make sense.
> > > Do we have any HW that does support chaining the xforms more than
> "two"
> > > in one pass?
> > > i.e real chaining function where two blocks of HWs work hand in hand
> > > for chaining.
> > > If none, it may be better to abstract as synonymous API(No dequeue,
> > > no
> > > enqueue) for the CPU use case.
> >
> > [DC] I'm not aware of any HW that supports this at the moment, but that's
> not to say it couldn't in the future - if anyone else has any examples though,
> please feel free to share.
> > Regardless, I don't see why we would introduce a different API for SW
> devices and HW devices.
> There is a risk in drafting API that meant for HW without any HW exists.
> Because there could be inefficiency on the metadata and fast path API for
> both models.
> For example, In the case of CPU based scheme, it will be pure overhead
> emulate the "queue"(the enqueue and dequeue) for the sake of abstraction
> where CPU works better in the synchronous model and I have doubt that the
> session-based scheme will work for HW or not as both difference  HW needs
> to work hand in hand(IOMMU aspects for two PCI device)

[DC] I understand what you are saying about the overhead of emulating the "sw queue" but this same model is already used in many of the existing device PMDs.
In the case of SW devices, such as AESNI-MB or NULL for crypto or zlib for compression, the enqueue/dequeue in the PMD is emulated through an rte_ring which is very efficient.
The accelerator API will use the existing device PMDs so keeping the same model seems like a sensible approach.

From an application's point of view, this abstraction of the underlying device type is important for usability and maintainability -  the application doesn't need to know
the device type as such and therefore doesn't need to make different API calls. 

The enqueue/dequeue type API was also used with QAT in mind. While QAT HW doesn't support these xform chains at the moment, it could potentially do so in the future.
As a side note, as part of the work of adding the accelerator API, the QAT PMD will be updated to support the DOCSIS Crypto-CRC accelerator xform chain, where the Crypto
is done on QAT HW and the CRC will be done in SW, most likely through a call to the optimized rte_net_crc library. This will give a consistent API for the DOCSIS-MAC data-plane
pipeline prototype we have developed, which uses both AESNI-MB and QAT for benchmarks.

We will take your feedback on the enqueue/dequeue approach for SW devices into consideration though during development.

Finally, I'm unsure what you mean by this line:

	"I have doubt that the session-based scheme will work for HW or not as both difference  HW needs to work hand in hand(IOMMU aspects for two PCI device)"

What do mean by different HW working "hand in hand" and "two PCI device"?
The intention is that 1 HW device (or it's PMD) would have to support the accel xform chain

> Having said that, I agree with the need for use case and API for CPU case. Till
> we find a HW spec, we need to make the solution as CPU specific and latter
> extend based on HW metadata required.
> Accelerator API sounds like HW accelerator and there is no HW support then
> it may not good. We can change the API that works for the use cases that we
> know how it works efficiently.
> > It would be up to each underlying PMD to decide if/how it supports a
> > particular accelerator xform chain, but from an application's point of
> > view, the accelerator API is always the same
> >
> >

More information about the dev mailing list