[dpdk-dev] [RFC] Generic flow director/filtering/classification API

John Fastabend john.fastabend at gmail.com
Wed Aug 10 18:46:27 CEST 2016


On 16-08-10 06:37 AM, Adrien Mazarguil wrote:
> On Tue, Aug 09, 2016 at 02:47:44PM -0700, John Fastabend wrote:
>> On 16-08-04 06:24 AM, Adrien Mazarguil wrote:
>>> On Wed, Aug 03, 2016 at 12:11:56PM -0700, John Fastabend wrote:
> [...]
>>>> The problem is keeping priorities in order and/or possibly breaking
>>>> rules apart (e.g. you have an L2 table and an L3 table) becomes very
>>>> complex to manage at driver level. I think its easier for the
>>>> application which has some context to do this. The application "knows"
>>>> if its a router for example will likely be able to pack rules better
>>>> than a PMD will.
>>>
>>> I don't think most applications know they are L2 or L3 routers. They may not
>>> know more than the pattern provided to the PMD, which may indeed end at a L2
>>> or L3 protocol. If the application simply chooses a table based on this
>>> information, then the PMD could have easily done the same.
>>>
>>
>> But when we start thinking about encap/decap then its natural to start
>> using this interface to implement various forwarding dataplanes. And one
>> common way to organize a switch is into a TEP, router, switch
>> (mac/vlan), ACL tables, etc. In fact we see this topology starting to
>> show up in the NICs now.
>>
>> Further each table may be "managed" by a different entity. In which
>> case the software will want to manage the physical and virtual networks
>> separately.
>>
>> It doesn't make sense to me to require a software aggregator object to
>> marshal the rules into a flat table then for a PMD to split them apart
>> again.
> 
> OK, my point was mostly about handling basic cases easily and making sure
> applications do not have to bother with petty HW details when they do not
> want to, yet still get maximum performance by having the PMD make the most
> appropriate choices automatically.
> 
> You've convinced me that in many cases PMDs won't be able to optimize
> efficiently and that conscious applications will know better. The API has to
> provide the ability to do so. I think it's fine as long as it is not
> mandatory.
> 

Great. I also agree making table feature _not_ mandatory for many use
cases will be helpful. I'm just making sure we get all the use cases I
know of covered.

>>> I understand the issue is what happens when applications really want to
>>> define e.g. L2/L3/L2 rules in this specific order (or any ordering that
>>> cannot be satisfied by HW due to table constraints).
>>>
>>> By exposing tables, in such a case applications should move all rules from
>>> L2 to a L3 table themselves (assuming this is even supported) to guarantee
>>> ordering between rules, or fail to add them. This is basically what the PMD
>>> could have done, possibly in a more efficient manner in my opinion.
>>
>> I disagree with the more efficient comment :)
>>
>> If the software layer is working on L2/TEP/ACL/router layers merging
>> them just to pull them back apart is not going to be more efficient.
> 
> Moving flow rules around cannot be efficient by definition, however I think
> that attempting to describe table capabilities may be as complicated as
> describing HW bit-masking features. Applications may get it wrong as a
> result while a PMD would not make any mistake.
> 
> Your use case is valid though, if the application already groups rules, then
> sharing this information with the PMD would make sense from a performance
> standpoint.
> 
>>> Let's assume two opposite scenarios for this discussion:
>>>
>>> - App #1 is a command-line interface directly mapped to flow rules, which
>>>   basically gets slow random input from users depending on how they want to
>>>   configure their traffic. All rules differ considerably (L2, L3, L4, some
>>>   with incomplete bit-masks, etc). All in all, few but complex rules with
>>>   specific priorities.
>>>
>>
>> Agree with this and in this case the application should be behind any
>> network physical/virtual and not giving rules like encap/decap/etc. This
>> application either sits on the physical function and "owns" the hardware
>> resource or sits behind a virtual switch.
>>
>>
>>> - App #2 is something like OVS, creating and deleting a large number of very
>>>   specific (without incomplete bit-masks) and mostly identical
>>>   single-priority rules automatically and very frequently.
>>>
>>
>> Maybe for OVS but not all virtual switches are built with flat tables
>> at the bottom like this. Nor is it optimal it necessarily optimal.
>>
>> Another application (the one I'm concerned about :) would be build as
>> a pipeline, something like
>>
>> 	ACL -> TEP -> ACL -> VEB -> ACL
>>
>> If I have hardware that supports a TEP hardware block an ACL hardware
>> block and a VEB  block for example I don't want to merge my control
>> plane into a single table. The merging in this case is just pure
>> overhead/complexity for no gain.
> 
> It could be done by dedicating priority ranges for each item in the
> pipeline but then it would be clunky. OK then, let's discuss the best
> approach to implement this.
> 
> [...]
>>>> Its not about mask vs no mask. The devices with multiple tables that I
>>>> have don't have this mask limitations. Its about how to optimally pack
>>>> the rules and who implements that logic. I think its best done in the
>>>> application where I have the context.
>>>>
>>>> Is there a way to omit the table field if the PMD is expected to do
>>>> a best effort and add the table field if the user wants explicit
>>>> control over table mgmt. This would support both models. I at least
>>>> would like to have explicit control over rule population in my pipeline
>>>> for use cases where I'm building a pipeline on top of the hardware.
>>>
>>> Yes that's a possibility. Perhaps the table ID to use could be specified as
>>> a meta pattern item? We'd still need methods to report how many tables exist
>>> and perhaps some way to report their limitations, these could be later
>>> through a separate set of functions.
>>
>> Sure I think a meta pattern item would be fine or put it in the API call
>> directly, something like
>>
>>   rte_flow_create(port_id, pattern, actions);
>>   rte_flow_create_table(port_id, table_id, pattern, actions);
> 
> I suggest using a common method for both cases, either seems fine to me, as
> long as a default table value can be provided (zero) when applications do
> not care.
> 

Works for me just use zero as the default when the application has no
preference and expects PMD to do the table mapping.

> Now about tables management, I think there is no need to not expose table
> capabilities (in case they have different capabilities) but instead provide
> guidelines as part of the specification to encourage applications writers to
> group similar rules in tables. A previously discussed, flow rules priorities
> would be specific to the table they are affected to.

This seems sufficient to me.

> 
> Like flow rules, table priorities could be handled through their index with
> index 0 having the highest priority. Like flow rule priorities, table
> indices wouldn't have to be contiguous.
> 
> If this works for you, how about renaming "tables" to "groups"?
> 

Works for me. And actually I like renaming them "groups" as this seems
more neutral to how the hardware actually implements a group. For
example I've worked on hardware with multiple Tunnel Endpoint engines
but we exposed it as a single "group" to simplify the user interface.

.John



More information about the dev mailing list