[dpdk-dev] [RFC] Accelerator API to chain packet processing functions
declan.doherty at intel.com
Thu Feb 13 12:44:38 CET 2020
On 06/02/2020 5:13 PM, Jerin Jacob wrote:
> On Thu, Feb 6, 2020 at 10:01 PM Coyle, David <david.coyle at intel.com> wrote:
> Hi David,
>>>>>> - XGS-PON MAC: Crypto-CRC-BIP
>>>>>> - Order:
>>>>>> - Downstream: CRC, Encrypt, BIP
>>>>> I understand if the chain has two operations then it may possible to
>>>>> have handcrafted SW code to do both operations in one pass.
>>>>> I understand the spec is agnostic on a number of passes it does
>>>>> require to enable the xfrom but To understand the SW/HW capability,
>>>>> In the above case, "CRC, Encrypt, BIP", It is done in one pass in SW
>>>>> or three passes in SW or one pass using HW?
>>>> [DC] The CRC, Encrypt, BIP is also currently done as 1 pass in AESNI MB
>>> library SW.
>>>> However, this could also be performed as a single pass in a HW
>>> As a specification, cascading the xform chains make sense.
>>> Do we have any HW that does support chaining the xforms more than "two"
>>> in one pass?
>>> i.e real chaining function where two blocks of HWs work hand in hand for
>>> If none, it may be better to abstract as synonymous API(No dequeue, no
>>> enqueue) for the CPU use case.
>> [DC] I'm not aware of any HW that supports this at the moment, but that's not to say it couldn't in the future - if anyone else has any examples though, please feel free to share.
>> Regardless, I don't see why we would introduce a different API for SW devices and HW devices.
> There is a risk in drafting API that meant for HW without any HW
> exists. Because there could be inefficiency on the metadata and fast
> path API for both models.
> For example, In the case of CPU based scheme, it will be pure overhead
> emulate the "queue"(the enqueue and dequeue) for the sake of
> abstraction where
> CPU works better in the synchronous model and I have doubt that the
> session-based scheme will work for HW or not as both difference HW
> needs to work hand in hand(IOMMU aspects for two PCI device)
We do have some proto-types in hardware which can do operation chaining
but in the case we have looked at, it is a single accelerator device
with multi-function which means the orchestration (order, passing of
data etc) of the chained operations is handled within the device itself,
meaning that we didn't see issues with shared session data or handling
moving data along discrete independent stage of a hardware pipeline
wasn't an issue.
Although if you wanted to offer this type of chained offload, I think we
would need the driver to handle this for the user, rather than the
application needing to understand how the hardware pipeline is interacting.
> Having said that, I agree with the need for use case and API for CPU
> case. Till we find a HW spec, we need to make the solution as CPU
> specific and latter extend based on HW metadata required.
> Accelerator API sounds like HW accelerator and there is no HW support
> then it may not good. We can change the API that works for the use
> cases that we know how it works efficiently.
>> It would be up to each underlying PMD to decide if/how it supports a particular accelerator xform chain, but from an application's point of view, the accelerator API is always the same
More information about the dev