[dpdk-dev] [RFC] tunnel endpoint hw acceleration enablement

Doherty, Declan declan.doherty at intel.com
Tue Jan 23 16:35:58 CET 2018


On 16/01/2018 8:22 AM, Shahaf Shuler wrote:
> Thursday, January 11, 2018 11:45 PM, John Daley:
>> Hi Declan and Shahaf,
>>
>>> I can't see how the existing
>>> ethdev API could be used for statistics as a single ethdev could be
>>> supporting may concurrent TEPs, therefore we would either need to use
>>> the extended stats with many entries, one for each TEP, or if we treat
>>> a TEP as an attribute of a port in a similar manner to the way
>>> rte_security manages an IPsec SA, the state of each TEP can be
>>> monitored and managed independently of both the overall port or the
>> flows being transported on that endpoint.
>>
>> Assuming we can define one rte_flow rule per TEP, does what you propose
>> give us anything more than just using the COUNT action?
> 
> I agree with John here, and I also not sure we need such assumption.
> 
> If I get it right, the API proposed here is to have a tunnel endpoint which is a logical port on top of ethdev port. the TEP is able to receive and monitor some specific tunneled traffic, for example VXLAN, GENEVE and more.
> For example, VXLAN TEP can have multiple flows with different VNIs all under the same context.
> 
> Now, with the current rte_flow APIs, we can do exactly the same and give the application the full flexibility to group the tunnel flows into logical TEP.
> On this suggestion application will:
> 1. Create rte_flow rules for the pattern it want to receive.
> 2. In case it is interested in counting, a COUNT action will be added to the flow.
> 3. In case header manipulation is required, a DECAP/ENCAP/REWRITE action will be added to the flow.
> 4. Grouping of flows into a logical TEP will be done on the application layer simply by keeping the relevant rte_flow rules in some dedicated struct. With it, create/destroy TEP can be translated to create/destroy the flow rules. Statistics query can be done be querying each flow count and sum. Note that some devices can support the same counter for multiple flows. Even though it is not yet exposed in rte_flow this can be an interesting optimization.

As I responsed in John's mail I think this approach fails in devices 
which support switching offload also. As the flows never hit the host 
application configuring the TEP and flows there is no easy way to sum 
those statistics, also flows are transitory in terms of runtime so it 
would not be possible to keep accurate statistics over a period of time.


> 
>>>
>>>> As for the capabilities - what specifically you had in mind? The
>>>> current
>>> usage you show with tep is with rte_flow rules. There are no
>>> capabilities currently for rte_flow supported actions/pattern. To
>>> check such capabilities application uses rte_flow_validate.
>>>
>>> I envisaged that the application should be able to see if an ethdev
>>> can support TEP in the rx/tx offloads, and then the
>>> rte_tep_capabilities would allow applications to query what tunnel
>>> endpoint protocols are supported etc. I would like a simple mechanism
>>> to allow users to see if a particular tunnel endpoint type is
>>> supported without having to build actual flows to validate.
>>
>> I can see the value of that, but in the end wouldn't the API call
>> rte_flow_validate anyways? Maybe we don't add the layer now or maybe it
>> doesn't really belong in DPDK? I'm in favor of deferring the capabilities API
>> until we know it's really needed.  I hate to see special capabilities APIs start
>> sneaking in after we decided to go the rte_flow_validate route and users are
>> starting to get used to it.
> 
> I don't see how it is different from any other rte_flow creation.
> We don't hold caps for device ability to filter packets according to VXLAN or GENEVE items. Why we should start now?

I don't know, possibly if it makes adoption of the features easier for 
the end user.

> 
> We have already the rte_flow_veirfy. I think part of the reasons for it was that the number of different capabilities possible with rte_flow is huge. I think this also the case with the TEP capabilities (even though It is still not clear to me what exactly they will include).

It may be that only need advertise that we are capable of encap/decap 
services, but it would be good to have input from downstream users what 
they would like to see.

> 
>>>
>>>> Regarding the creation/destroy of tep. Why not simply use rte_flow
>>>> API
>>> and avoid this extra control?
>>>> For example - with 17.11 APIs, application can put the port in
>>>> isolate mode,
>>> and insert a flow_rule to catch only IPv4 VXLAN traffic and direct to
>>> some queue/do RSS. Such operation, per my understanding, will create a
>>> tunnel endpoint. What are the down sides of doing it with the current
>> APIs?
>>>
>>> That doesn't enable encapsulation and decapsulation of the outer
>>> tunnel endpoint in the hw as far as I know. Apart from the inability
>>> to monitor the endpoint statistics I mentioned above. It would also
>>> require that you redefine the endpoints parameters ever time to you
>>> wish to add a new flow to it. I think the having the rte_tep object
>>> semantics should also simplify the ability to enable a full vswitch
>>> offload of TEP where the hw is handling both encap/decap and switching to
>> a particular port.
>>
>> If we have the ingress/decap and egress/encap actions and 1 rte_flow rule
>> per TEP and use the COUNT action, I think we get all but the last bit. For that,
>> perhaps the application could keep  ingress and egress rte_flow template for
>> each tunnel type (VxLAN, GRE, ..). Then copying the template and filling in
>> the outer packet info and tunnel Id is all that would be required. We could
>> also define these in rte_flow.h?
>>
>>>
>>>>
>>>>>
>>>>>
>>>>> To direct traffic flows to hw terminated tunnel endpoint the
>>>>> rte_flow API is enhanced to add a new flow item type. This contains
>>>>> a pointer to the TEP context as well as the overlay flow id to
>>>>> which the traffic flow is
>>> associated.
>>>>>
>>>>> struct rte_flow_item_tep {
>>>>>                  struct rte_tep *tep;
>>>>>                  uint32_t flow_id;
>>>>> }
>>>>
>>>> Can you provide more detailed definition about the flow id ? to
>>>> which field
>>> from the packet headers it refers to?
>>>> On your below examples it looks like it is to match the VXLAN vni in
>>>> case of
>>> VXLAN, what about the other protocols? And also, why not using the
>>> already exists VXLAN item?
>>>
>>> I have only been looking initially at couple of the tunnel endpoint
>>> procotols, namely Geneve, NvGRE, and VxLAN, but the idea here is to
>>> allow the user to define the VNI in the case of Geneve and VxLAN and
>>> the VSID in the case of NvGRE on a per flow basis, as per my
>>> understanding these are used to identify the source/destination hosts
>>> on the overlay network independently from the endpoint there are
>> transported across.
>>>
>>> The VxLAN item is used in the creation of the TEP object, using the
>>> TEP object just removes the need for the user to constantly redefine
>>> all the tunnel parameters and also I think dependent on the hw
>>> implementation it may simplify the drivers work if it know the exact
>>> endpoint the actions is for instead of having to look it up on each flow
>> addition.
>>>
>>>>
>>>> Generally I like the idea of separating the encap/decap context from
>>>> the
>>> action. However looks like the rte_flow_item has double meaning on
>>> this RFC, once for the classification and once for the action.
>>>>   From the top of my head I would think of an API which separate
>>>> those, and
>>> re-use the existing flow items. Something like:
>>>>
>>>>    struct rte_flow_item pattern[] = {
>>>>                   { set of already exists pattern  },
>>>>                   { ... },
>>>>                   { .type = RTE_FLOW_ITEM_TYPE_END } };
>>>>
>>>> encap_ctx = create_enacap_context(pattern)
>>>>
>>>> rte_flow_action actions[] = {
>>>> 	{ .type RTE_FLOW_ITEM_ENCAP, .conf = encap_ctx} }
>>>
>>> I not sure I fully understand what you're asking here, but in general
>>> for encap you only would define the inner part of the packet in the
>>> match pattern criteria and the actual outer tunnel headers would be
>> defined in the action.
>>>
>>> I guess there is some replication in the decap side as proposed, as
>>> the TEP object is used in both the pattern and the action, possibly
>>> you could get away with having no TEP object defined in the action
>>> data, but I prefer keeping the API symmetrical for encap/decap actions
>>> at the shake of some extra verbosity.
>>>
>>>>
>>> ...
>>>>
> 



More information about the dev mailing list