[dpdk-dev] [RFC] Accelerator API to chain packet processing functions
jerinjacobk at gmail.com
Tue Feb 18 06:30:59 CET 2020
On Thu, Feb 13, 2020 at 5:14 PM Doherty, Declan
<declan.doherty at intel.com> wrote:
> On 06/02/2020 5:13 PM, Jerin Jacob wrote:
> > On Thu, Feb 6, 2020 at 10:01 PM Coyle, David <david.coyle at intel.com> wrote:
> > Hi David,
> >>>>>> - XGS-PON MAC: Crypto-CRC-BIP
> >>>>>> - Order:
> >>>>>> - Downstream: CRC, Encrypt, BIP
> >>>>> I understand if the chain has two operations then it may possible to
> >>>>> have handcrafted SW code to do both operations in one pass.
> >>>>> I understand the spec is agnostic on a number of passes it does
> >>>>> require to enable the xfrom but To understand the SW/HW capability,
> >>>>> In the above case, "CRC, Encrypt, BIP", It is done in one pass in SW
> >>>>> or three passes in SW or one pass using HW?
> >>>> [DC] The CRC, Encrypt, BIP is also currently done as 1 pass in AESNI MB
> >>> library SW.
> >>>> However, this could also be performed as a single pass in a HW
> >>>> accelerator
> >>> As a specification, cascading the xform chains make sense.
> >>> Do we have any HW that does support chaining the xforms more than "two"
> >>> in one pass?
> >>> i.e real chaining function where two blocks of HWs work hand in hand for
> >>> chaining.
> >>> If none, it may be better to abstract as synonymous API(No dequeue, no
> >>> enqueue) for the CPU use case.
> >> [DC] I'm not aware of any HW that supports this at the moment, but that's not to say it couldn't in the future - if anyone else has any examples though, please feel free to share.
> >> Regardless, I don't see why we would introduce a different API for SW devices and HW devices.
> > There is a risk in drafting API that meant for HW without any HW
> > exists. Because there could be inefficiency on the metadata and fast
> > path API for both models.
> > For example, In the case of CPU based scheme, it will be pure overhead
> > emulate the "queue"(the enqueue and dequeue) for the sake of
> > abstraction where
> > CPU works better in the synchronous model and I have doubt that the
> > session-based scheme will work for HW or not as both difference HW
> > needs to work hand in hand(IOMMU aspects for two PCI device)
> We do have some proto-types in hardware which can do operation chaining
> but in the case we have looked at, it is a single accelerator device
> with multi-function which means the orchestration (order, passing of
> data etc) of the chained operations is handled within the device itself,
> meaning that we didn't see issues with shared session data or handling
> moving data along discrete independent stage of a hardware pipeline
> wasn't an issue.
> Although if you wanted to offer this type of chained offload, I think we
> would need the driver to handle this for the user, rather than the
> application needing to understand how the hardware pipeline is interacting.
Yes. The application should not understand the specifics.
The only question how to make this generic so that any hardware/SW
pipeline can work.
Currently, we have rte_security, which works on ethdev and cryptodev.
This new spec
is going to work on rte_cryptodev and rte_compressdev. If so, we need another
pipeline which needs to work with rte_cryptodev, rte_compressdev and ethdev then
we need to invent a new library.
I agree with the need for the hardware/SW pipeline.
As Stephen suggested, Why not look for general abstraction for HW/SW
Marvell had a similar problem in abstracting various HW/SW pipeline,
Here is a proposal
for a generic HW/SW pipeline.
If the focus only for a specific case, say "CRC + something else",
better to have API for that and
better to not call the accelerator for the packet processing pipeline
as it has a big scope.
Just my 2c.
> > Having said that, I agree with the need for use case and API for CPU
> > case. Till we find a HW spec, we need to make the solution as CPU
> > specific and latter extend based on HW metadata required.
> > Accelerator API sounds like HW accelerator and there is no HW support
> > then it may not good. We can change the API that works for the use
> > cases that we know how it works efficiently.
> >> It would be up to each underlying PMD to decide if/how it supports a particular accelerator xform chain, but from an application's point of view, the accelerator API is always the same
More information about the dev