[dpdk-dev] [RFC] Generic flow director/filtering/classification API

Adrien Mazarguil adrien.mazarguil at 6wind.com
Thu Jul 7 12:26:50 CEST 2016
Previous message: [dpdk-dev] [RFC] Generic flow director/filtering/classification API
Next message: [dpdk-dev] [RFC] Generic flow director/filtering/classification API
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi Lu Wenzhuo,

Thanks for your feedback, I'm replying below as well.

On Thu, Jul 07, 2016 at 07:14:18AM +0000, Lu, Wenzhuo wrote:
> Hi Adrien,
> I have some questions, please see inline, thanks.
> 
> > -----Original Message-----
> > From: Adrien Mazarguil [mailto:adrien.mazarguil at 6wind.com]
> > Sent: Wednesday, July 6, 2016 2:17 AM
> > To: dev at dpdk.org
> > Cc: Thomas Monjalon; Zhang, Helin; Wu, Jingjing; Rasesh Mody; Ajit Khaparde;
> > Rahul Lakkireddy; Lu, Wenzhuo; Jan Medala; John Daley; Chen, Jing D; Ananyev,
> > Konstantin; Matej Vido; Alejandro Lucero; Sony Chacko; Jerin Jacob; De Lara
> > Guarch, Pablo; Olga Shern
> > Subject: [RFC] Generic flow director/filtering/classification API
> > 
> > 
> > Requirements for a new API:
> > 
> > - Flexible and extensible without causing API/ABI problems for existing
> >   applications.
> > - Should be unambiguous and easy to use.
> > - Support existing filtering features and actions listed in `Filter types`_.
> > - Support packet alteration.
> > - In case of overlapping filters, their priority should be well documented.
> Does that mean we don't guarantee the consistent of priority? The priority can be different on different NICs. So the behavior of the actions  can be different. Right?

No, the intent is precisely to define what happens in order to get a
consistent result across different devices, and document cases with
undefined behavior. There must be no room left for interpretation.

For example, the API must describe what happens when two overlapping filters
(e.g. one matching an Ethernet header, another one matching an IP header)
match a given packet at a given priority level.

It is documented in section 4.1.1 (priorities) as "undefined behavior".
Applications remain free to do it and deal with consequences, at least they
know they cannot expect a consistent outcome, unless they use different
priority levels for both rules, see also 4.4.5 (flow rules priority).

> Seems the users still need to aware the some details of the HW? Do we need to add the negotiation for the priority?

Priorities as defined in this document may not be directly mappable to HW
capabilities (e.g. HW does not support enough priorities, or that some
corner case make them not work as described), in which case the PMD may
choose to simulate priorities (again 4.4.5), as long as the end result
follows the specification.

So users must not be aware of some HW details, the PMD does and must perform
the needed workarounds to suit their expectations. Users may only be
impacted by errors while attempting to create rules that are either
unsupported or would cause them (or existing rules) to diverge from the
spec.

> > Flow rules can have several distinct actions (such as counting,
> > encapsulating, decapsulating before redirecting packets to a particular
> > queue, etc.), instead of relying on several rules to achieve this and having
> > applications deal with hardware implementation details regarding their
> > order.
> I think normally HW doesn't support several actions in one rule. If a rule has several actions, seems HW has to split it to several rules. The order can still be a problem.

Note that, except for a very small subset of pattern items and actions,
supporting multiple actions for a given rule is not mandatory, and can be
emulated as you said by having to split them into several rules each with
its own priority if possible (see 3.3 "high level design").

Also, a rule "action" as defined in this API can be just about anything, for
example combining a queue redirection with 32-bit tagging. FDIR supports
many cases of what can be described as several actions, see 5.7 "FDIR to
most item types → QUEUE, DROP, PASSTHRU".

If you were thinking about having two queue targets for a given rule, then
I agree with you - that is why a rule cannot have more than a single action
of a given type (see 4.1.5 actions), to avoid such abuse from applications.

Applications must use several pass-through rules with different priority
levels if they want to perform a given action several times on a given
packet. Again, PMDs support is not mandatory as pass-through is optional.

> > ``ETH``
> > ^^^^^^^
> > 
> > Matches an Ethernet header.
> > 
> > - ``dst``: destination MAC.
> > - ``src``: source MAC.
> > - ``type``: EtherType.
> > - ``tags``: number of 802.1Q/ad tags defined.
> > - ``tag[]``: 802.1Q/ad tag definitions, innermost first. For each one:
> > 
> >  - ``tpid``: Tag protocol identifier.
> >  - ``tci``: Tag control information.
> "ETH" means all the parameters, dst, src, type... need to be matched? The same question for IPv4, IPv6 ...

Yes, it's basically the description of all Ethernet header fields including
VLAN tags (same for other protocols). Please see the linked draft header
file which should make all of this easier to understand:

 https://raw.githubusercontent.com/6WIND/rte_flow/master/rte_flow.h

> > ``UDP``
> > ^^^^^^^
> > 
> > Matches a UDP header.
> > 
> > - ``sport``: source port.
> > - ``dport``: destination port.
> > - ``length``: UDP length.
> > - ``checksum``: UDP checksum.
> Why checksum? Do we need to filter the packets by checksum value?

Well, I've decided to include all protocol header fields for completeness
(so the ABI does not need to be broken later then they become necessary, or
require another pattern item), not that all of them make sense in a pattern.

In this specific case, all PMDs I know of must reject a pattern
specification with a nonzero mask for the checksum field, because none of
them support it.

> > ``VOID`` (action)
> > ^^^^^^^^^^^^^^^^^
> > 
> > Used as a placeholder for convenience. It is ignored and simply discarded by
> > PMDs.
> Don't understand why we need VOID. If it’s about the format. Why not guarantee it in rte layer?

I'm not sure to understand your question about rte layer, but this type is
fully managed by the PMD and is not supposed to be translated to a hardware
action.

I think it may come handy in some cases (like the VOID pattern item), so it
is defined just in case. Should be relatively trivial to support.

Applications may find a use for it when they want to statically define
templates for flow rules, when they need room for some reason.

> > Behavior
> > --------
> > 
> > - API operations are synchronous and blocking (``EAGAIN`` cannot be
> >   returned).
> > 
> > - There is no provision for reentrancy/multi-thread safety, although nothing
> >   should prevent different devices from being configured at the same
> >   time. PMDs may protect their control path functions accordingly.
> > 
> > - Stopping the data path (TX/RX) should not be necessary when managing flow
> >   rules. If this cannot be achieved naturally or with workarounds (such as
> >   temporarily replacing the burst function pointers), an appropriate error
> >   code must be returned (``EBUSY``).
> PMD cannot stop the data path without adding lock. So I think if some rules cannot be applied without stopping rx/tx, PMD has to return fail.
> Or let the APP to stop the data path.

Agreed, that is the intent. If the PMD cannot touch flow rules for some
reason even after trying really hard, then it just returns EBUSY.

Perhaps we should write down that applications may get a different outcome
after stopping the data path if they get EBUSY?

> > - PMDs, not applications, are responsible for maintaining flow rules
> >   configuration when stopping and restarting a port or performing other
> >   actions which may affect them. They can only be destroyed explicitly.
> Don’t understand " They can only be destroyed explicitly."

This part says that as long as an application has not called
rte_flow_destroy() on a flow rule, it never disappears, whatever happens to
the port (stopped, restarted). The application is not responsible for
re-creating rules after that.

Note that according to the specification, this may translate to not being
able to stop a port as long as a flow rule is present, depending on how nice
the PMD intends to be with applications. Implementation can be done in small
steps with minimal amount of code on the PMD side.

> If a new rule has conflict with an old one, what should we do? Return fail?

That should not happen. If say 100 rules have been created with various
priorities and the port is happily running with them, stopping the port may
require the PMD to destroy them, it then has to re-create all 100 of them
exactly as they were automatically when restarting the port.

If re-creating them is not possible for some reason, the port cannot be
restarted as long as rules that cannot be added back haven't been destroyed
by the application. Frankly, this should not happen.

To manage this case, I suggest preventing applications from doing things
that conflict with existing flow rules while the port is stopped (just like
when it is not stopped, as described in 5.7 "FDIR to most item types").

> > ``ANY`` pattern item
> > ~~~~~~~~~~~~~~~~~~~~
> > 
> > This pattern item stands for anything, which can be difficult to translate
> > to something hardware would understand, particularly if followed by more
> > specific types.
> > 
> > Consider the following pattern:
> > 
> > +---+--------------------------------+
> > | 0 | ETHER                          |
> > +---+--------------------------------+
> > | 1 | ANY (``min`` = 1, ``max`` = 1) |
> > +---+--------------------------------+
> > | 2 | TCP                            |
> > +---+--------------------------------+
> > 
> > Knowing that TCP does not make sense with something other than IPv4 and IPv6
> > as L3, such a pattern may be translated to two flow rules instead:
> > 
> > +---+--------------------+
> > | 0 | ETHER              |
> > +---+--------------------+
> > | 1 | IPV4 (zeroed mask) |
> > +---+--------------------+
> > | 2 | TCP                |
> > +---+--------------------+
> > 
> > +---+--------------------+
> > | 0 | ETHER              |
> > +---+--------------------+
> > | 1 | IPV6 (zeroed mask) |
> > +---+--------------------+
> > | 2 | TCP                |
> > +---+--------------------+
> > 
> > Note that as soon as a ANY rule covers several layers, this approach may
> > yield a large number of hidden flow rules. It is thus suggested to only
> > support the most common scenarios (anything as L2 and/or L3).
> I think "any" may make things confusing.  How about if the NIC doesn't support IPv6? Should we return fail for this rule?

In a sense you are right, ANY relies on HW capabilities so you cannot know
that it won't match unsupported protocols. The above example would be
somewhat useless for a conscious application which should really have
created two separate flow rules (and gotten an error on the IPv6 one).

So an ANY flow rule only able to match v4 packets won't return an error.

ANY can be useful to match outer packets when only a tunnel header and the
inner packet are meaningful to the application. HW that does not recognize
the outer packet is not able to recognize the inner one anyway.

This section only says that PMDs should do their best to make HW match what
they can when faced with ANY.

Also once again, ANY support is not mandatory.

> > Flow rules priority
> > ~~~~~~~~~~~~~~~~~~~
> > 
> > While it would naturally make sense, flow rules cannot be assumed to be
> > processed by hardware in the same order as their creation for several
> > reasons:
> > 
> > - They may be managed internally as a tree or a hash table instead of a
> >   list.
> > - Removing a flow rule before adding another one can either put the new rule
> >   at the end of the list or reuse a freed entry.
> > - Duplication may occur when packets are matched by several rules.
> > 
> > For overlapping rules (particularly in order to use the `PASSTHRU`_ action)
> > predictable behavior is only guaranteed by using different priority levels.
> > 
> > Priority levels are not necessarily implemented in hardware, or may be
> > severely limited (e.g. a single priority bit).
> > 
> > For these reasons, priority levels may be implemented purely in software by
> > PMDs.
> > 
> > - For devices expecting flow rules to be added in the correct order, PMDs
> >   may destroy and re-create existing rules after adding a new one with
> >   a higher priority.
> > 
> > - A configurable number of dummy or empty rules can be created at
> >   initialization time to save high priority slots for later.
> > 
> > - In order to save priority levels, PMDs may evaluate whether rules are
> >   likely to collide and adjust their priority accordingly.
> If there's 3 rules, r1, r2,r3. The rules say the priority is r1 > r2 > r3. If PMD can only support r1 > r3 > r2, or doesn't support r2. Should PMD apply r1 and r3 or totally not support them all?

Remember that the API lets applications create only one rule at a time. If
all 3 rules are not supported together but individually are, the answer
depends on what the application does:

1. r1 OK, r2 FAIL => application chooses to stop here, thus only r1 works as
  expected (may roll back and remove r1 as a result).

2. r1 OK, r2 FAIL, r3 OK => application chooses to ignore the fact r2 failed
  and added r3 anyway, so it should end up with r1 > r3.

Applications should do as described in 1, they need to check for errors if
they want consistency.

This document describes only the basic functions, but may be extended later
with methods to add several flow rules at once, so rules that depend on
others can be added together and a single failure is returned without the
need for a rollback at the application level.

> A generic question, is the parsing supposed to be done by rte or PMD?

Actually, a bit of both. EAL will certainly at least provide helpers to
assist PMDs. This specification defines only the public-facing API for now,
but our goal is really to have something that is not too difficult to
implement both for applications and PMDs.

These helpers can be defined later with the first implementation.

-- 
Adrien Mazarguil
6WIND
Previous message: [dpdk-dev] [RFC] Generic flow director/filtering/classification API
Next message: [dpdk-dev] [RFC] Generic flow director/filtering/classification API
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the dev mailing list