[dpdk-dev] [RFC] Generic flow director/filtering/classification API

Adrien Mazarguil adrien.mazarguil at 6wind.com
Tue Jul 19 15:12:19 CEST 2016


On Tue, Jul 19, 2016 at 08:11:48AM +0000, Lu, Wenzhuo wrote:
> Hi Adrien,
> Thanks for your clarification.  Most of my questions are clear, but still something may need to be discussed, comment below.

Hi Wenzhuo,

Please see below.

[...]
> > > > Requirements for a new API:
> > > >
> > > > - Flexible and extensible without causing API/ABI problems for existing
> > > >   applications.
> > > > - Should be unambiguous and easy to use.
> > > > - Support existing filtering features and actions listed in `Filter types`_.
> > > > - Support packet alteration.
> > > > - In case of overlapping filters, their priority should be well documented.
> > > Does that mean we don't guarantee the consistent of priority? The priority can
> > be different on different NICs. So the behavior of the actions  can be different.
> > Right?
> > 
> > No, the intent is precisely to define what happens in order to get a consistent
> > result across different devices, and document cases with undefined behavior.
> > There must be no room left for interpretation.
> > 
> > For example, the API must describe what happens when two overlapping filters
> > (e.g. one matching an Ethernet header, another one matching an IP header)
> > match a given packet at a given priority level.
> > 
> > It is documented in section 4.1.1 (priorities) as "undefined behavior".
> > Applications remain free to do it and deal with consequences, at least they know
> > they cannot expect a consistent outcome, unless they use different priority
> > levels for both rules, see also 4.4.5 (flow rules priority).
> > 
> > > Seems the users still need to aware the some details of the HW? Do we need
> > to add the negotiation for the priority?
> > 
> > Priorities as defined in this document may not be directly mappable to HW
> > capabilities (e.g. HW does not support enough priorities, or that some corner
> > case make them not work as described), in which case the PMD may choose to
> > simulate priorities (again 4.4.5), as long as the end result follows the
> > specification.
> > 
> > So users must not be aware of some HW details, the PMD does and must
> > perform the needed workarounds to suit their expectations. Users may only be
> > impacted by errors while attempting to create rules that are either unsupported
> > or would cause them (or existing rules) to diverge from the spec.
> The problem is sometime the priority of the filters is fixed according to
> > HW's implementation. For example, on ixgbe, n-tuple has a higher
> > priority than flow director.

As a side note I did not know that N-tuple had a higher priority than flow
director on ixgbe, priorities among filter types do not seem to be
documented at all in DPDK. This is one of the reasons I think we need a
generic API to handle flow configuration.

So, today an application cannot combine N-tuple and FDIR flow rules and get
a reliable outcome, unless it is designed for specific devices with a known
behavior.

> What's the right behavior of PMD if APP want to create a flow director rule which has a higher or even equal priority than an existing n-tuple rule? Should PMD return fail? 

First remember applications only deal with the generic API, PMDs are
responsible for choosing the most appropriate HW implementation to use
according to the requested flow rules (FDIR, N-tuple or anything else).

For the specific case of FDIR vs N-tuple, if the underlying HW supports both
I do not see why the PMD would create a N-tuple rule. Doesn't FDIR support
everything N-tuple can do and much more?

Assuming such a thing happened anyway, that the PMD had to create a rule
using a high priority filter type and that the application requests the
creation of a rule that can only be done using a lower priority filter type,
but also requested a higher priority for that rule, then yes, it should
obviously fail.

That is, unless the PMD can perform some kind of workaround to have both.

> If so, do we need more fail reasons? According to this RFC, I think we need return " EEXIST: collision with an existing rule. ", but it's not very clear, APP doesn't know the problem is priority, maybe more detailed reason is helpful.

Possibly, I've defined a basic set of errors, there are quite a number of
errno values to choose from. However I think we should not define too many
values. In my opinion the basic set covers every possible failure:

- EINVAL: invalid format, rule is broken or cannot be understood by the PMD
  anyhow.

- ENOTSUP: pattern/actions look fine but something in the requested rule is
  not supported and thus cannot be applied.

- EEXIST: pattern/actions are fine and could have been applied if only some
  other rule did not prevent the PMD to do it (I see it as the closest thing
  to "ETOOBAD" which unfortunately does not exist).

- ENOMEM: like EEXIST, except it is due to the lack of resources not because
  of another rule. I wasn't sure which of ENOMEM or ENOSPC was better but
  settled on ENOMEM as it is well known. Still open to debate.

Errno values are only useful to get a rough idea of the reason, and another
mechanism is needed to pinpoint the exact problem for debugging/reporting
purposes, something like:

 enum rte_flow_error_type {
     RTE_FLOW_ERROR_TYPE_NONE,
     RTE_FLOW_ERROR_TYPE_UNKNOWN,
     RTE_FLOW_ERROR_TYPE_PRIORITY,
     RTE_FLOW_ERROR_TYPE_PATTERN,
     RTE_FLOW_ERROR_TYPE_ACTION,
 };

 struct rte_flow_error {
     enum rte_flow_error_type type;
     void *offset; /* Points to the exact pattern item or action. */
     const char *message;
 };

Then either provide an optional struct rte_flow_error pointer to
rte_flow_validate(), or a separate function (rte_flow_analyze()?), since
processing this may be quite expensive and applications may not care about
the exact reason.

What do you suggest?

> > > > Behavior
> > > > --------
> > > >
> > > > - API operations are synchronous and blocking (``EAGAIN`` cannot be
> > > >   returned).
> > > >
> > > > - There is no provision for reentrancy/multi-thread safety, although nothing
> > > >   should prevent different devices from being configured at the same
> > > >   time. PMDs may protect their control path functions accordingly.
> > > >
> > > > - Stopping the data path (TX/RX) should not be necessary when managing
> > flow
> > > >   rules. If this cannot be achieved naturally or with workarounds (such as
> > > >   temporarily replacing the burst function pointers), an appropriate error
> > > >   code must be returned (``EBUSY``).
> > > PMD cannot stop the data path without adding lock. So I think if some rules
> > cannot be applied without stopping rx/tx, PMD has to return fail.
> > > Or let the APP to stop the data path.
> > 
> > Agreed, that is the intent. If the PMD cannot touch flow rules for some reason
> > even after trying really hard, then it just returns EBUSY.
> > 
> > Perhaps we should write down that applications may get a different outcome
> > after stopping the data path if they get EBUSY?
> Agree, it's better to describe more about the APP. BTW, I checked the behavior of ixgbe/igb, I think we can add/delete filters during runtime. Hopefully we'll not hit too many EBUSY problems on other NICs :)

OK, I will add it.

> > > > - PMDs, not applications, are responsible for maintaining flow rules
> > > >   configuration when stopping and restarting a port or performing other
> > > >   actions which may affect them. They can only be destroyed explicitly.
> > > Don’t understand " They can only be destroyed explicitly."
> > 
> > This part says that as long as an application has not called
> > rte_flow_destroy() on a flow rule, it never disappears, whatever happens to the
> > port (stopped, restarted). The application is not responsible for re-creating rules
> > after that.
> > 
> > Note that according to the specification, this may translate to not being able to
> > stop a port as long as a flow rule is present, depending on how nice the PMD
> > intends to be with applications. Implementation can be done in small steps with
> > minimal amount of code on the PMD side.
> Does it mean PMD should store and maintain all the rules? Why not let rte do that? I think if PMD maintain all the rules, it means every kind of NIC should have a copy of code for the rules. But if rte do that, only one copy of code need to be maintained, right?

I've considered having rules stored in a common format understood at the RTE
level and not specific to each PMD and decided that the opaque rte_flow
pointer was a better choice for the following reasons: 

- Even though flow rules management is done in the control path, processing
  must be as fast as possible. Letting PMDs store flow rules using their own
  internal representation gives them the chance to achieve better
  performance.

- An opaque context managed by PMDs would probably have to be stored
  somewhere as well anyway.

- PMDs may not need to allocate/store anything at all if they exclusively
  rely on HW state for everything. In my opinion, the generic API has enough
  constraints for this to work and maintain consistency between flow
  rules. Note this is currently how most PMDs implement FDIR and other
  filter types.

- RTE can (and will) provide helpers to avoid most of the code redundancy,
  PMDs are free to use them or manage everything by themselves.

- Given that the opaque rte_flow pointer associated with a flow rule is to
  be stored by the application, PMDs do not even have to keep references to
  them.

- The flow rules format described in this specification (pattern / actions)
  will be used by applications directly, and will be free to arrange them in
  lists, trees or in any other way if they need to keep flow specifications
  around for further processing.

> When the port is stopped and restarted, rte can reconfigure the rules. Is the concern that PMD may adjust the sequence of the rules according to the priority, so every NIC has a different list of rules? But PMD can adjust them again when rte reconfiguring the rules.

What about PMDs able to stop and restart ports without destroying their own
flow rules? If we assume flow rules must be destroyed when stopping a port,
these PMDs are needlessly penalized with slower stop/start cycles. Think
about it assuming thousands of flow rules.

Thus from an application point of view, whatever happens when stopping and
restarting a port should not matter. If a flow rule was present before, it
must still be present afterwards. If the PMD had to destroy flow rules and
re-create them, it does not actually matter if they differ slightly at the
HW level, as long as:

- Existing opaque flow rule pointers (rte_flow) are still valid to the PMD
  and refer to the same rules.

- The overall behavior of all rules is the same.

The list of rules you think of (patterns / actions) is maintained by
applications (not RTE), and only if they need them. RTE would needlessly
duplicate this.

-- 
Adrien Mazarguil
6WIND


More information about the dev mailing list