[dpdk-dev] [RFC] Generic flow director/filtering/classification API

John Fastabend john.fastabend at gmail.com
Tue Aug 9 23:47:44 CEST 2016


On 16-08-04 06:24 AM, Adrien Mazarguil wrote:
> On Wed, Aug 03, 2016 at 12:11:56PM -0700, John Fastabend wrote:
>> [...]
>>
>>>>>>>> The proposal looks very good.  It satisfies most of the features
>>>>>>>> supported by Chelsio NICs.  We are looking for suggestions on exposing
>>>>>>>> more additional features supported by Chelsio NICs via this API.
>>>>>>>>
>>>>>>>> Chelsio NICs have two regions in which filters can be placed -
>>>>>>>> Maskfull and Maskless regions.  As their names imply, maskfull region
>>>>>>>> can accept masks to match a range of values; whereas, maskless region
>>>>>>>> don't accept any masks and hence perform a more strict exact-matches.
>>>>>>>> Filters without masks can also be placed in maskfull region.  By
>>>>>>>> default, maskless region have higher priority over the maskfull region.
>>>>>>>> However, the priority between the two regions is configurable.
>>>>>>>
>>>>>>> I understand this configuration affects the entire device. Just to be clear,
>>>>>>> assuming some filters are already configured, are they affected by a change
>>>>>>> of region priority later?
>>>>>>>
>>>>>>
>>>>>> Both the regions exist at the same time in the device.  Each filter can
>>>>>> either belong to maskfull or the maskless region.
>>>>>>
>>>>>> The priority is configured at time of filter creation for every
>>>>>> individual filter and cannot be changed while the filter is still
>>>>>> active. If priority needs to be changed for a particular filter then,
>>>>>> it needs to be deleted first and re-created.
>>>>>
>>>>> Could you model this as two tables and add a table_id to the API? This
>>>>> way user space could populate the table it chooses. We would have to add
>>>>> some capabilities attributes to "learn" if tables support masks or not
>>>>> though.
>>>>>
>>>>
>>>> This approach sounds interesting.
>>>
>>> Now I understand the idea behind these tables, however from an application
>>> point of view I still think it's better if the PMD could take care of flow
>>> rules optimizations automatically. Think about it, PMDs have exactly a
>>> single kind of device they know perfectly well to manage, while applications
>>> want the best possible performance out of any device in the most generic
>>> fashion.
>>
>> The problem is keeping priorities in order and/or possibly breaking
>> rules apart (e.g. you have an L2 table and an L3 table) becomes very
>> complex to manage at driver level. I think its easier for the
>> application which has some context to do this. The application "knows"
>> if its a router for example will likely be able to pack rules better
>> than a PMD will.
> 
> I don't think most applications know they are L2 or L3 routers. They may not
> know more than the pattern provided to the PMD, which may indeed end at a L2
> or L3 protocol. If the application simply chooses a table based on this
> information, then the PMD could have easily done the same.
> 

But when we start thinking about encap/decap then its natural to start
using this interface to implement various forwarding dataplanes. And one
common way to organize a switch is into a TEP, router, switch
(mac/vlan), ACL tables, etc. In fact we see this topology starting to
show up in the NICs now.

Further each table may be "managed" by a different entity. In which
case the software will want to manage the physical and virtual networks
separately.

It doesn't make sense to me to require a software aggregator object to
marshal the rules into a flat table then for a PMD to split them apart
again.

> I understand the issue is what happens when applications really want to
> define e.g. L2/L3/L2 rules in this specific order (or any ordering that
> cannot be satisfied by HW due to table constraints).
> 
> By exposing tables, in such a case applications should move all rules from
> L2 to a L3 table themselves (assuming this is even supported) to guarantee
> ordering between rules, or fail to add them. This is basically what the PMD
> could have done, possibly in a more efficient manner in my opinion.

I disagree with the more efficient comment :)

If the software layer is working on L2/TEP/ACL/router layers merging
them just to pull them back apart is not going to be more efficient.

> 
> Let's assume two opposite scenarios for this discussion:
> 
> - App #1 is a command-line interface directly mapped to flow rules, which
>   basically gets slow random input from users depending on how they want to
>   configure their traffic. All rules differ considerably (L2, L3, L4, some
>   with incomplete bit-masks, etc). All in all, few but complex rules with
>   specific priorities.
> 

Agree with this and in this case the application should be behind any
network physical/virtual and not giving rules like encap/decap/etc. This
application either sits on the physical function and "owns" the hardware
resource or sits behind a virtual switch.


> - App #2 is something like OVS, creating and deleting a large number of very
>   specific (without incomplete bit-masks) and mostly identical
>   single-priority rules automatically and very frequently.
> 

Maybe for OVS but not all virtual switches are built with flat tables
at the bottom like this. Nor is it optimal it necessarily optimal.

Another application (the one I'm concerned about :) would be build as
a pipeline, something like

	ACL -> TEP -> ACL -> VEB -> ACL

If I have hardware that supports a TEP hardware block an ACL hardware
block and a VEB  block for example I don't want to merge my control
plane into a single table. The merging in this case is just pure
overhead/complexity for no gain.

> Actual applications will certainly be a mix of both.
> 
> For app #1, users would have to be aware of these tables and base their
> filtering decisions according to them. Reporting tables capabilities, making
> sure priorities between tables are well configured will be their
> responsibility. Obviously applications may take care of these details for
> them, but the end result will be the same. At some point, some combination
> won't be possible. Getting there was only more complicated from
> users/applications point of view.
> 
> For app #2 if the first rule can be created then subsequent rules shouldn't
> be a problem until their number reaches device limits. Selecting the proper
> table to use for these can easily be done by the PMD.
> 

But it requires rewriting my pipeline software to be useful and this I
want to avoid. Using my TEP example again I'll need something in
software to catch every VEB/ACL rule and append the rest of the rule
creating wide rules. For my use cases its not a very user friendly API.

>>>>> I don't see how the PMD can sort this out in any meaningful way and it
>>>>> has to be exposed to the application that has the intelligence to 'know'
>>>>> priorities between masks and non-masks filters. I'm sure you could come
>>>>> up with something but it would be less than ideal in many cases I would
>>>>> guess and we can't have the driver getting priorities wrong or we may
>>>>> not get the correct behavior.
>>>
>>> It may be solved by having the PMD maintain a SW state to quickly know which
>>> rules are currently created and in what state the device is so basically the
>>> application doesn't have to perform this work.
>>>
>>> This API allows applications to express basic needs such as "redirect
>>> packets matching this pattern to that queue". It must not deal with HW
>>> details and limitations in my opinion. If a request cannot be satisfied,
>>> then the rule cannot be created. No help from the application must be
>>> expected by PMDs, otherwise it opens the door to the same issues as the
>>> legacy filtering APIs.
>>
>> This depends on the application and what/how it wants to manage the
>> device. If the application manages a pipeline with some set of tables,
>> then mapping this down to a single table, which then the PMD has to
>> unwind back to a multi-table topology to me seems like a waste.
> 
> Of course, only I am not sure applications will behave differently if they
> are aware of HW tables. I fear it will make things more complicated for
> them and they will just stick with the most capable table all the time, but
> I agree it should be easier for PMDs.
> 

On the other side if the API doesn't match my software pipeline the
complexity/overhead of merging it just to tear it apart again may
prohibit use of the API in these cases.

>>> [...]
>>>>>> Unfortunately, our maskfull region is extremely small too compared to
>>>>>> maskless region.
>>>>>>
>>>>>
>>>>> To me this means a userspace application would want to pack it
>>>>> carefully to get the full benefit. So you need some mechanism to specify
>>>>> the "region" hence the above table proposal.
>>>>>
>>>>
>>>> Right. Makes sense.
>>>
>>> I do not agree, applications should not be aware of it. Note this case can
>>> be handled differently, so that rules do not have to be moved back and forth
>>> between both tables. If the first created rule requires a maskfull entry,
>>> then all subsequent rules will be entered into that table. Otherwise no
>>> maskfull entry can be created as long as there is one maskless entry. When
>>> either table is full, no more rules may be added. Would that work for you?
>>>
>>
>> Its not about mask vs no mask. The devices with multiple tables that I
>> have don't have this mask limitations. Its about how to optimally pack
>> the rules and who implements that logic. I think its best done in the
>> application where I have the context.
>>
>> Is there a way to omit the table field if the PMD is expected to do
>> a best effort and add the table field if the user wants explicit
>> control over table mgmt. This would support both models. I at least
>> would like to have explicit control over rule population in my pipeline
>> for use cases where I'm building a pipeline on top of the hardware.
> 
> Yes that's a possibility. Perhaps the table ID to use could be specified as
> a meta pattern item? We'd still need methods to report how many tables exist
> and perhaps some way to report their limitations, these could be later
> through a separate set of functions.

Sure I think a meta pattern item would be fine or put it in the API call
directly, something like

  rte_flow_create(port_id, pattern, actions);
  rte_flow_create_table(port_id, table_id, pattern, actions);


> 
> [...]
>>>>> For this adding a meta-data item seems simplest to me. And if you want
>>>>> to make the default to be only a single port that would maybe make it
>>>>> easier for existing apps to port from flow director. Then if an
>>>>> application cares it can create a list of ports if needed.
>>>>>
>>>>
>>>> Agreed.
>>>
>>> However although I'm not opposed to adding dedicated meta items, remember
>>> applications will not automatically benefit from the increased performance
>>> if a single PMD implements this feature, their maintainers will probably not
>>> bother with it.
>>>
>>
>> Unless as we noted in other thread the application is closely bound to
>> its hardware for capability reasons. In this case it would make sense
>> to implement.
> 
> Sure.
> 
> [...]
> 



More information about the dev mailing list