[dpdk-dev] [PATCH v4 02/25] doc: add rte_flow prog guide

Adrien Mazarguil adrien.mazarguil at 6wind.com
Tue Dec 20 19:42:19 CET 2016


This documentation is based on the latest RFC submission, subsequently
updated according to feedback from the community.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil at 6wind.com>
Acked-by: Olga Shern <olgas at mellanox.com>
---
 doc/guides/prog_guide/index.rst    |    1 +
 doc/guides/prog_guide/rte_flow.rst | 2042 +++++++++++++++++++++++++++++++
 2 files changed, 2043 insertions(+)

diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst
index e5a50a8..ed7f770 100644
--- a/doc/guides/prog_guide/index.rst
+++ b/doc/guides/prog_guide/index.rst
@@ -42,6 +42,7 @@ Programmer's Guide
     mempool_lib
     mbuf_lib
     poll_mode_drv
+    rte_flow
     cryptodev_lib
     link_bonding_poll_mode_drv_lib
     timer_lib
diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
new file mode 100644
index 0000000..98c672e
--- /dev/null
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -0,0 +1,2042 @@
+..  BSD LICENSE
+    Copyright 2016 6WIND S.A.
+    Copyright 2016 Mellanox.
+
+    Redistribution and use in source and binary forms, with or without
+    modification, are permitted provided that the following conditions
+    are met:
+
+    * Redistributions of source code must retain the above copyright
+    notice, this list of conditions and the following disclaimer.
+    * Redistributions in binary form must reproduce the above copyright
+    notice, this list of conditions and the following disclaimer in
+    the documentation and/or other materials provided with the
+    distribution.
+    * Neither the name of 6WIND S.A. nor the names of its
+    contributors may be used to endorse or promote products derived
+    from this software without specific prior written permission.
+
+    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+.. _Generic_flow_API:
+
+Generic flow API (rte_flow)
+===========================
+
+Overview
+--------
+
+This API provides a generic means to configure hardware to match specific
+ingress or egress traffic, alter its fate and query related counters
+according to any number of user-defined rules.
+
+It is named *rte_flow* after the prefix used for all its symbols, and is
+defined in ``rte_flow.h``.
+
+- Matching can be performed on packet data (protocol headers, payload) and
+  properties (e.g. associated physical port, virtual device function ID).
+
+- Possible operations include dropping traffic, diverting it to specific
+  queues, to virtual/physical device functions or ports, performing tunnel
+  offloads, adding marks and so on.
+
+It is slightly higher-level than the legacy filtering framework which it
+encompasses and supersedes (including all functions and filter types) in
+order to expose a single interface with an unambiguous behavior that is
+common to all poll-mode drivers (PMDs).
+
+Several methods to migrate existing applications are described in `API
+migration`_.
+
+Flow rule
+---------
+
+Description
+~~~~~~~~~~~
+
+A flow rule is the combination of attributes with a matching pattern and a
+list of actions. Flow rules form the basis of this API.
+
+Flow rules can have several distinct actions (such as counting,
+encapsulating, decapsulating before redirecting packets to a particular
+queue, etc.), instead of relying on several rules to achieve this and having
+applications deal with hardware implementation details regarding their
+order.
+
+Support for different priority levels on a rule basis is provided, for
+example in order to force a more specific rule to come before a more generic
+one for packets matched by both. However hardware support for more than a
+single priority level cannot be guaranteed. When supported, the number of
+available priority levels is usually low, which is why they can also be
+implemented in software by PMDs (e.g. missing priority levels may be
+emulated by reordering rules).
+
+In order to remain as hardware-agnostic as possible, by default all rules
+are considered to have the same priority, which means that the order between
+overlapping rules (when a packet is matched by several filters) is
+undefined.
+
+PMDs may refuse to create overlapping rules at a given priority level when
+they can be detected (e.g. if a pattern matches an existing filter).
+
+Thus predictable results for a given priority level can only be achieved
+with non-overlapping rules, using perfect matching on all protocol layers.
+
+Flow rules can also be grouped, the flow rule priority is specific to the
+group they belong to. All flow rules in a given group are thus processed
+either before or after another group.
+
+Support for multiple actions per rule may be implemented internally on top
+of non-default hardware priorities, as a result both features may not be
+simultaneously available to applications.
+
+Considering that allowed pattern/actions combinations cannot be known in
+advance and would result in an impractically large number of capabilities to
+expose, a method is provided to validate a given rule from the current
+device configuration state.
+
+This enables applications to check if the rule types they need is supported
+at initialization time, before starting their data path. This method can be
+used anytime, its only requirement being that the resources needed by a rule
+should exist (e.g. a target RX queue should be configured first).
+
+Each defined rule is associated with an opaque handle managed by the PMD,
+applications are responsible for keeping it. These can be used for queries
+and rules management, such as retrieving counters or other data and
+destroying them.
+
+To avoid resource leaks on the PMD side, handles must be explicitly
+destroyed by the application before releasing associated resources such as
+queues and ports.
+
+The following sections cover:
+
+- **Attributes** (represented by ``struct rte_flow_attr``): properties of a
+  flow rule such as its direction (ingress or egress) and priority.
+
+- **Pattern item** (represented by ``struct rte_flow_item``): part of a
+  matching pattern that either matches specific packet data or traffic
+  properties. It can also describe properties of the pattern itself, such as
+  inverted matching.
+
+- **Matching pattern**: traffic properties to look for, a combination of any
+  number of items.
+
+- **Actions** (represented by ``struct rte_flow_action``): operations to
+  perform whenever a packet is matched by a pattern.
+
+Attributes
+~~~~~~~~~~
+
+Attribute: Group
+^^^^^^^^^^^^^^^^
+
+Flow rules can be grouped by assigning them a common group number. Lower
+values have higher priority. Group 0 has the highest priority.
+
+Although optional, applications are encouraged to group similar rules as
+much as possible to fully take advantage of hardware capabilities
+(e.g. optimized matching) and work around limitations (e.g. a single pattern
+type possibly allowed in a given group).
+
+Note that support for more than a single group is not guaranteed.
+
+Attribute: Priority
+^^^^^^^^^^^^^^^^^^^
+
+A priority level can be assigned to a flow rule. Like groups, lower values
+denote higher priority, with 0 as the maximum.
+
+A rule with priority 0 in group 8 is always matched after a rule with
+priority 8 in group 0.
+
+Group and priority levels are arbitrary and up to the application, they do
+not need to be contiguous nor start from 0, however the maximum number
+varies between devices and may be affected by existing flow rules.
+
+If a packet is matched by several rules of a given group for a given
+priority level, the outcome is undefined. It can take any path, may be
+duplicated or even cause unrecoverable errors.
+
+Note that support for more than a single priority level is not guaranteed.
+
+Attribute: Traffic direction
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Flow rules can apply to inbound and/or outbound traffic (ingress/egress).
+
+Several pattern items and actions are valid and can be used in both
+directions. At least one direction must be specified.
+
+Specifying both directions at once for a given rule is not recommended but
+may be valid in a few cases (e.g. shared counters).
+
+Pattern item
+~~~~~~~~~~~~
+
+Pattern items fall in two categories:
+
+- Matching protocol headers and packet data (ANY, RAW, ETH, VLAN, IPV4,
+  IPV6, ICMP, UDP, TCP, SCTP, VXLAN and so on), usually associated with a
+  specification structure.
+
+- Matching meta-data or affecting pattern processing (END, VOID, INVERT, PF,
+  VF, PORT and so on), often without a specification structure.
+
+Item specification structures are used to match specific values among
+protocol fields (or item properties). Documentation describes for each item
+whether they are associated with one and their type name if so.
+
+Up to three structures of the same type can be set for a given item:
+
+- ``spec``: values to match (e.g. a given IPv4 address).
+
+- ``last``: upper bound for an inclusive range with corresponding fields in
+  ``spec``.
+
+- ``mask``: bit-mask applied to both ``spec`` and ``last`` whose purpose is
+  to distinguish the values to take into account and/or partially mask them
+  out (e.g. in order to match an IPv4 address prefix).
+
+Usage restrictions and expected behavior:
+
+- Setting either ``mask`` or ``last`` without ``spec`` is an error.
+
+- Field values in ``last`` which are either 0 or equal to the corresponding
+  values in ``spec`` are ignored; they do not generate a range. Nonzero
+  values lower than those in ``spec`` are not supported.
+
+- Setting ``spec`` and optionally ``last`` without ``mask`` causes the PMD
+  to only take the fields it can recognize into account. There is no error
+  checking for unsupported fields.
+
+- Not setting any of them (assuming item type allows it) uses default
+  parameters that depend on the item type. Most of the time, particularly
+  for protocol header items, it is equivalent to providing an empty (zeroed)
+  ``mask``.
+
+- ``mask`` is a simple bit-mask applied before interpreting the contents of
+  ``spec`` and ``last``, which may yield unexpected results if not used
+  carefully. For example, if for an IPv4 address field, ``spec`` provides
+  *10.1.2.3*, ``last`` provides *10.3.4.5* and ``mask`` provides
+  *255.255.0.0*, the effective range becomes *10.1.0.0* to *10.3.255.255*.
+
+Example of an item specification matching an Ethernet header:
+
+.. _table_rte_flow_pattern_item_example:
+
+.. table:: Ethernet item
+
+   +----------+----------+--------------------+
+   | Field    | Subfield | Value              |
+   +==========+==========+====================+
+   | ``spec`` | ``src``  | ``00:01:02:03:04`` |
+   |          +----------+--------------------+
+   |          | ``dst``  | ``00:2a:66:00:01`` |
+   |          +----------+--------------------+
+   |          | ``type`` | ``0x22aa``         |
+   +----------+----------+--------------------+
+   | ``last`` | unspecified                   |
+   +----------+----------+--------------------+
+   | ``mask`` | ``src``  | ``00:ff:ff:ff:00`` |
+   |          +----------+--------------------+
+   |          | ``dst``  | ``00:00:00:00:ff`` |
+   |          +----------+--------------------+
+   |          | ``type`` | ``0x0000``         |
+   +----------+----------+--------------------+
+
+Non-masked bits stand for any value (shown as ``?`` below), Ethernet headers
+with the following properties are thus matched:
+
+- ``src``: ``??:01:02:03:??``
+- ``dst``: ``??:??:??:??:01``
+- ``type``: ``0x????``
+
+Matching pattern
+~~~~~~~~~~~~~~~~
+
+A pattern is formed by stacking items starting from the lowest protocol
+layer to match. This stacking restriction does not apply to meta items which
+can be placed anywhere in the stack without affecting the meaning of the
+resulting pattern.
+
+Patterns are terminated by END items.
+
+Examples:
+
+.. _table_rte_flow_tcpv4_as_l4:
+
+.. table:: TCPv4 as L4
+
+   +-------+----------+
+   | Index | Item     |
+   +=======+==========+
+   | 0     | Ethernet |
+   +-------+----------+
+   | 1     | IPv4     |
+   +-------+----------+
+   | 2     | TCP      |
+   +-------+----------+
+   | 3     | END      |
+   +-------+----------+
+
+|
+
+.. _table_rte_flow_tcpv6_in_vxlan:
+
+.. table:: TCPv6 in VXLAN
+
+   +-------+------------+
+   | Index | Item       |
+   +=======+============+
+   | 0     | Ethernet   |
+   +-------+------------+
+   | 1     | IPv4       |
+   +-------+------------+
+   | 2     | UDP        |
+   +-------+------------+
+   | 3     | VXLAN      |
+   +-------+------------+
+   | 4     | Ethernet   |
+   +-------+------------+
+   | 5     | IPv6       |
+   +-------+------------+
+   | 6     | TCP        |
+   +-------+------------+
+   | 7     | END        |
+   +-------+------------+
+
+|
+
+.. _table_rte_flow_tcpv4_as_l4_meta:
+
+.. table:: TCPv4 as L4 with meta items
+
+   +-------+----------+
+   | Index | Item     |
+   +=======+==========+
+   | 0     | VOID     |
+   +-------+----------+
+   | 1     | Ethernet |
+   +-------+----------+
+   | 2     | VOID     |
+   +-------+----------+
+   | 3     | IPv4     |
+   +-------+----------+
+   | 4     | TCP      |
+   +-------+----------+
+   | 5     | VOID     |
+   +-------+----------+
+   | 6     | VOID     |
+   +-------+----------+
+   | 7     | END      |
+   +-------+----------+
+
+The above example shows how meta items do not affect packet data matching
+items, as long as those remain stacked properly. The resulting matching
+pattern is identical to "TCPv4 as L4".
+
+.. _table_rte_flow_udpv6_anywhere:
+
+.. table:: UDPv6 anywhere
+
+   +-------+------+
+   | Index | Item |
+   +=======+======+
+   | 0     | IPv6 |
+   +-------+------+
+   | 1     | UDP  |
+   +-------+------+
+   | 2     | END  |
+   +-------+------+
+
+If supported by the PMD, omitting one or several protocol layers at the
+bottom of the stack as in the above example (missing an Ethernet
+specification) enables looking up anywhere in packets.
+
+It is unspecified whether the payload of supported encapsulations
+(e.g. VXLAN payload) is matched by such a pattern, which may apply to inner,
+outer or both packets.
+
+.. _table_rte_flow_invalid_l3:
+
+.. table:: Invalid, missing L3
+
+   +-------+----------+
+   | Index | Item     |
+   +=======+==========+
+   | 0     | Ethernet |
+   +-------+----------+
+   | 1     | UDP      |
+   +-------+----------+
+   | 2     | END      |
+   +-------+----------+
+
+The above pattern is invalid due to a missing L3 specification between L2
+(Ethernet) and L4 (UDP). Doing so is only allowed at the bottom and at the
+top of the stack.
+
+Meta item types
+~~~~~~~~~~~~~~~
+
+They match meta-data or affect pattern processing instead of matching packet
+data directly, most of them do not need a specification structure. This
+particularity allows them to be specified anywhere in the stack without
+causing any side effect.
+
+Item: ``END``
+^^^^^^^^^^^^^
+
+End marker for item lists. Prevents further processing of items, thereby
+ending the pattern.
+
+- Its numeric value is 0 for convenience.
+- PMD support is mandatory.
+- ``spec``, ``last`` and ``mask`` are ignored.
+
+.. _table_rte_flow_item_end:
+
+.. table:: END
+
+   +----------+---------+
+   | Field    | Value   |
+   +==========+=========+
+   | ``spec`` | ignored |
+   +----------+---------+
+   | ``last`` | ignored |
+   +----------+---------+
+   | ``mask`` | ignored |
+   +----------+---------+
+
+Item: ``VOID``
+^^^^^^^^^^^^^^
+
+Used as a placeholder for convenience. It is ignored and simply discarded by
+PMDs.
+
+- PMD support is mandatory.
+- ``spec``, ``last`` and ``mask`` are ignored.
+
+.. _table_rte_flow_item_void:
+
+.. table:: VOID
+
+   +----------+---------+
+   | Field    | Value   |
+   +==========+=========+
+   | ``spec`` | ignored |
+   +----------+---------+
+   | ``last`` | ignored |
+   +----------+---------+
+   | ``mask`` | ignored |
+   +----------+---------+
+
+One usage example for this type is generating rules that share a common
+prefix quickly without reallocating memory, only by updating item types:
+
+.. _table_rte_flow_item_void_example:
+
+.. table:: TCP, UDP or ICMP as L4
+
+   +-------+--------------------+
+   | Index | Item               |
+   +=======+====================+
+   | 0     | Ethernet           |
+   +-------+--------------------+
+   | 1     | IPv4               |
+   +-------+------+------+------+
+   | 2     | UDP  | VOID | VOID |
+   +-------+------+------+------+
+   | 3     | VOID | TCP  | VOID |
+   +-------+------+------+------+
+   | 4     | VOID | VOID | ICMP |
+   +-------+------+------+------+
+   | 5     | END                |
+   +-------+--------------------+
+
+Item: ``INVERT``
+^^^^^^^^^^^^^^^^
+
+Inverted matching, i.e. process packets that do not match the pattern.
+
+- ``spec``, ``last`` and ``mask`` are ignored.
+
+.. _table_rte_flow_item_invert:
+
+.. table:: INVERT
+
+   +----------+---------+
+   | Field    | Value   |
+   +==========+=========+
+   | ``spec`` | ignored |
+   +----------+---------+
+   | ``last`` | ignored |
+   +----------+---------+
+   | ``mask`` | ignored |
+   +----------+---------+
+
+Usage example, matching non-TCPv4 packets only:
+
+.. _table_rte_flow_item_invert_example:
+
+.. table:: Anything but TCPv4
+
+   +-------+----------+
+   | Index | Item     |
+   +=======+==========+
+   | 0     | INVERT   |
+   +-------+----------+
+   | 1     | Ethernet |
+   +-------+----------+
+   | 2     | IPv4     |
+   +-------+----------+
+   | 3     | TCP      |
+   +-------+----------+
+   | 4     | END      |
+   +-------+----------+
+
+Item: ``PF``
+^^^^^^^^^^^^
+
+Matches packets addressed to the physical function of the device.
+
+If the underlying device function differs from the one that would normally
+receive the matched traffic, specifying this item prevents it from reaching
+that device unless the flow rule contains a `Action: PF`_. Packets are not
+duplicated between device instances by default.
+
+- Likely to return an error or never match any traffic if applied to a VF
+  device.
+- Can be combined with any number of `Item: VF`_ to match both PF and VF
+  traffic.
+- ``spec``, ``last`` and ``mask`` must not be set.
+
+.. _table_rte_flow_item_pf:
+
+.. table:: PF
+
+   +----------+-------+
+   | Field    | Value |
+   +==========+=======+
+   | ``spec`` | unset |
+   +----------+-------+
+   | ``last`` | unset |
+   +----------+-------+
+   | ``mask`` | unset |
+   +----------+-------+
+
+Item: ``VF``
+^^^^^^^^^^^^
+
+Matches packets addressed to a virtual function ID of the device.
+
+If the underlying device function differs from the one that would normally
+receive the matched traffic, specifying this item prevents it from reaching
+that device unless the flow rule contains a `Action: VF`_. Packets are not
+duplicated between device instances by default.
+
+- Likely to return an error or never match any traffic if this causes a VF
+  device to match traffic addressed to a different VF.
+- Can be specified multiple times to match traffic addressed to several VF
+  IDs.
+- Can be combined with a PF item to match both PF and VF traffic.
+
+.. _table_rte_flow_item_vf:
+
+.. table:: VF
+
+   +----------+----------+---------------------------+
+   | Field    | Subfield | Value                     |
+   +==========+==========+===========================+
+   | ``spec`` | ``id``   | destination VF ID         |
+   +----------+----------+---------------------------+
+   | ``last`` | ``id``   | upper range value         |
+   +----------+----------+---------------------------+
+   | ``mask`` | ``id``   | zeroed to match any VF ID |
+   +----------+----------+---------------------------+
+
+Item: ``PORT``
+^^^^^^^^^^^^^^
+
+Matches packets coming from the specified physical port of the underlying
+device.
+
+The first PORT item overrides the physical port normally associated with the
+specified DPDK input port (port_id). This item can be provided several times
+to match additional physical ports.
+
+Note that physical ports are not necessarily tied to DPDK input ports
+(port_id) when those are not under DPDK control. Possible values are
+specific to each device, they are not necessarily indexed from zero and may
+not be contiguous.
+
+As a device property, the list of allowed values as well as the value
+associated with a port_id should be retrieved by other means.
+
+.. _table_rte_flow_item_port:
+
+.. table:: PORT
+
+   +----------+-----------+--------------------------------+
+   | Field    | Subfield  | Value                          |
+   +==========+===========+================================+
+   | ``spec`` | ``index`` | physical port index            |
+   +----------+-----------+--------------------------------+
+   | ``last`` | ``index`` | upper range value              |
+   +----------+-----------+--------------------------------+
+   | ``mask`` | ``index`` | zeroed to match any port index |
+   +----------+-----------+--------------------------------+
+
+Data matching item types
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+Most of these are basically protocol header definitions with associated
+bit-masks. They must be specified (stacked) from lowest to highest protocol
+layer to form a matching pattern.
+
+The following list is not exhaustive, new protocols will be added in the
+future.
+
+Item: ``ANY``
+^^^^^^^^^^^^^
+
+Matches any protocol in place of the current layer, a single ANY may also
+stand for several protocol layers.
+
+This is usually specified as the first pattern item when looking for a
+protocol anywhere in a packet.
+
+.. _table_rte_flow_item_any:
+
+.. table:: ANY
+
+   +----------+----------+--------------------------------------+
+   | Field    | Subfield | Value                                |
+   +==========+==========+======================================+
+   | ``spec`` | ``num``  | number of layers covered             |
+   +----------+----------+--------------------------------------+
+   | ``last`` | ``num``  | upper range value                    |
+   +----------+----------+--------------------------------------+
+   | ``mask`` | ``num``  | zeroed to cover any number of layers |
+   +----------+----------+--------------------------------------+
+
+Example for VXLAN TCP payload matching regardless of outer L3 (IPv4 or IPv6)
+and L4 (UDP) both matched by the first ANY specification, and inner L3 (IPv4
+or IPv6) matched by the second ANY specification:
+
+.. _table_rte_flow_item_any_example:
+
+.. table:: TCP in VXLAN with wildcards
+
+   +-------+------+----------+----------+-------+
+   | Index | Item | Field    | Subfield | Value |
+   +=======+======+==========+==========+=======+
+   | 0     | Ethernet                           |
+   +-------+------+----------+----------+-------+
+   | 1     | ANY  | ``spec`` | ``num``  | 2     |
+   +-------+------+----------+----------+-------+
+   | 2     | VXLAN                              |
+   +-------+------------------------------------+
+   | 3     | Ethernet                           |
+   +-------+------+----------+----------+-------+
+   | 4     | ANY  | ``spec`` | ``num``  | 1     |
+   +-------+------+----------+----------+-------+
+   | 5     | TCP                                |
+   +-------+------------------------------------+
+   | 6     | END                                |
+   +-------+------------------------------------+
+
+Item: ``RAW``
+^^^^^^^^^^^^^
+
+Matches a byte string of a given length at a given offset.
+
+Offset is either absolute (using the start of the packet) or relative to the
+end of the previous matched item in the stack, in which case negative values
+are allowed.
+
+If search is enabled, offset is used as the starting point. The search area
+can be delimited by setting limit to a nonzero value, which is the maximum
+number of bytes after offset where the pattern may start.
+
+Matching a zero-length pattern is allowed, doing so resets the relative
+offset for subsequent items.
+
+- This type does not support ranges (``last`` field).
+
+.. _table_rte_flow_item_raw:
+
+.. table:: RAW
+
+   +----------+--------------+-------------------------------------------------+
+   | Field    | Subfield     | Value                                           |
+   +==========+==============+=================================================+
+   | ``spec`` | ``relative`` | look for pattern after the previous item        |
+   |          +--------------+-------------------------------------------------+
+   |          | ``search``   | search pattern from offset (see also ``limit``) |
+   |          +--------------+-------------------------------------------------+
+   |          | ``reserved`` | reserved, must be set to zero                   |
+   |          +--------------+-------------------------------------------------+
+   |          | ``offset``   | absolute or relative offset for ``pattern``     |
+   |          +--------------+-------------------------------------------------+
+   |          | ``limit``    | search area limit for start of ``pattern``      |
+   |          +--------------+-------------------------------------------------+
+   |          | ``length``   | ``pattern`` length                              |
+   |          +--------------+-------------------------------------------------+
+   |          | ``pattern``  | byte string to look for                         |
+   +----------+--------------+-------------------------------------------------+
+   | ``last`` | if specified, either all 0 or with the same values as ``spec`` |
+   +----------+----------------------------------------------------------------+
+   | ``mask`` | bit-mask applied to ``spec`` values with usual behavior        |
+   +----------+----------------------------------------------------------------+
+
+Example pattern looking for several strings at various offsets of a UDP
+payload, using combined RAW items:
+
+.. _table_rte_flow_item_raw_example:
+
+.. table:: UDP payload matching
+
+   +-------+------+----------+--------------+-------+
+   | Index | Item | Field    | Subfield     | Value |
+   +=======+======+==========+==============+=======+
+   | 0     | Ethernet                               |
+   +-------+----------------------------------------+
+   | 1     | IPv4                                   |
+   +-------+----------------------------------------+
+   | 2     | UDP                                    |
+   +-------+------+----------+--------------+-------+
+   | 3     | RAW  | ``spec`` | ``relative`` | 1     |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``search``   | 1     |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``offset``   | 10    |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``limit``    | 0     |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``length``   | 3     |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``pattern``  | "foo" |
+   +-------+------+----------+--------------+-------+
+   | 4     | RAW  | ``spec`` | ``relative`` | 1     |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``search``   | 0     |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``offset``   | 20    |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``limit``    | 0     |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``length``   | 3     |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``pattern``  | "bar" |
+   +-------+------+----------+--------------+-------+
+   | 5     | RAW  | ``spec`` | ``relative`` | 1     |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``search``   | 0     |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``offset``   | -29   |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``limit``    | 0     |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``length``   | 3     |
+   |       |      |          +--------------+-------+
+   |       |      |          | ``pattern``  | "baz" |
+   +-------+------+----------+--------------+-------+
+   | 6     | END                                    |
+   +-------+----------------------------------------+
+
+This translates to:
+
+- Locate "foo" at least 10 bytes deep inside UDP payload.
+- Locate "bar" after "foo" plus 20 bytes.
+- Locate "baz" after "bar" minus 29 bytes.
+
+Such a packet may be represented as follows (not to scale)::
+
+ 0                     >= 10 B           == 20 B
+ |                  |<--------->|     |<--------->|
+ |                  |           |     |           |
+ |-----|------|-----|-----|-----|-----|-----------|-----|------|
+ | ETH | IPv4 | UDP | ... | baz | foo | ......... | bar | .... |
+ |-----|------|-----|-----|-----|-----|-----------|-----|------|
+                          |                             |
+                          |<--------------------------->|
+                                      == 29 B
+
+Note that matching subsequent pattern items would resume after "baz", not
+"bar" since matching is always performed after the previous item of the
+stack.
+
+Item: ``ETH``
+^^^^^^^^^^^^^
+
+Matches an Ethernet header.
+
+- ``dst``: destination MAC.
+- ``src``: source MAC.
+- ``type``: EtherType.
+
+Item: ``VLAN``
+^^^^^^^^^^^^^^
+
+Matches an 802.1Q/ad VLAN tag.
+
+- ``tpid``: tag protocol identifier.
+- ``tci``: tag control information.
+
+Item: ``IPV4``
+^^^^^^^^^^^^^^
+
+Matches an IPv4 header.
+
+Note: IPv4 options are handled by dedicated pattern items.
+
+- ``hdr``: IPv4 header definition (``rte_ip.h``).
+
+Item: ``IPV6``
+^^^^^^^^^^^^^^
+
+Matches an IPv6 header.
+
+Note: IPv6 options are handled by dedicated pattern items.
+
+- ``hdr``: IPv6 header definition (``rte_ip.h``).
+
+Item: ``ICMP``
+^^^^^^^^^^^^^^
+
+Matches an ICMP header.
+
+- ``hdr``: ICMP header definition (``rte_icmp.h``).
+
+Item: ``UDP``
+^^^^^^^^^^^^^
+
+Matches a UDP header.
+
+- ``hdr``: UDP header definition (``rte_udp.h``).
+
+Item: ``TCP``
+^^^^^^^^^^^^^
+
+Matches a TCP header.
+
+- ``hdr``: TCP header definition (``rte_tcp.h``).
+
+Item: ``SCTP``
+^^^^^^^^^^^^^^
+
+Matches a SCTP header.
+
+- ``hdr``: SCTP header definition (``rte_sctp.h``).
+
+Item: ``VXLAN``
+^^^^^^^^^^^^^^^
+
+Matches a VXLAN header (RFC 7348).
+
+- ``flags``: normally 0x08 (I flag).
+- ``rsvd0``: reserved, normally 0x000000.
+- ``vni``: VXLAN network identifier.
+- ``rsvd1``: reserved, normally 0x00.
+
+Actions
+~~~~~~~
+
+Each possible action is represented by a type. Some have associated
+configuration structures. Several actions combined in a list can be affected
+to a flow rule. That list is not ordered.
+
+They fall in three categories:
+
+- Terminating actions (such as QUEUE, DROP, RSS, PF, VF) that prevent
+  processing matched packets by subsequent flow rules, unless overridden
+  with PASSTHRU.
+
+- Non-terminating actions (PASSTHRU, DUP) that leave matched packets up for
+  additional processing by subsequent flow rules.
+
+- Other non-terminating meta actions that do not affect the fate of packets
+  (END, VOID, MARK, FLAG, COUNT).
+
+When several actions are combined in a flow rule, they should all have
+different types (e.g. dropping a packet twice is not possible).
+
+Only the last action of a given type is taken into account. PMDs still
+perform error checking on the entire list.
+
+Like matching patterns, action lists are terminated by END items.
+
+*Note that PASSTHRU is the only action able to override a terminating rule.*
+
+Example of action that redirects packets to queue index 10:
+
+.. _table_rte_flow_action_example:
+
+.. table:: Queue action
+
+   +-----------+-------+
+   | Field     | Value |
+   +===========+=======+
+   | ``index`` | 10    |
+   +-----------+-------+
+
+Action lists examples, their order is not significant, applications must
+consider all actions to be performed simultaneously:
+
+.. _table_rte_flow_count_and_drop:
+
+.. table:: Count and drop
+
+   +-------+--------+
+   | Index | Action |
+   +=======+========+
+   | 0     | COUNT  |
+   +-------+--------+
+   | 1     | DROP   |
+   +-------+--------+
+   | 2     | END    |
+   +-------+--------+
+
+|
+
+.. _table_rte_flow_mark_count_redirect:
+
+.. table:: Mark, count and redirect
+
+   +-------+--------+-----------+-------+
+   | Index | Action | Field     | Value |
+   +=======+========+===========+=======+
+   | 0     | MARK   | ``mark``  | 0x2a  |
+   +-------+--------+-----------+-------+
+   | 1     | COUNT                      |
+   +-------+--------+-----------+-------+
+   | 2     | QUEUE  | ``queue`` | 10    |
+   +-------+--------+-----------+-------+
+   | 3     | END                        |
+   +-------+----------------------------+
+
+|
+
+.. _table_rte_flow_redirect_queue_5:
+
+.. table:: Redirect to queue 5
+
+   +-------+--------+-----------+-------+
+   | Index | Action | Field     | Value |
+   +=======+========+===========+=======+
+   | 0     | DROP                       |
+   +-------+--------+-----------+-------+
+   | 1     | QUEUE  | ``queue`` | 5     |
+   +-------+--------+-----------+-------+
+   | 2     | END                        |
+   +-------+----------------------------+
+
+In the above example, considering both actions are performed simultaneously,
+the end result is that only QUEUE has any effect.
+
+.. _table_rte_flow_redirect_queue_3:
+
+.. table:: Redirect to queue 3
+
+   +-------+--------+-----------+-------+
+   | Index | Action | Field     | Value |
+   +=======+========+===========+=======+
+   | 0     | QUEUE  | ``queue`` | 5     |
+   +-------+--------+-----------+-------+
+   | 1     | VOID                       |
+   +-------+--------+-----------+-------+
+   | 2     | QUEUE  | ``queue`` | 3     |
+   +-------+--------+-----------+-------+
+   | 3     | END                        |
+   +-------+----------------------------+
+
+As previously described, only the last action of a given type found in the
+list is taken into account. The above example also shows that VOID is
+ignored.
+
+Action types
+~~~~~~~~~~~~
+
+Common action types are described in this section. Like pattern item types,
+this list is not exhaustive as new actions will be added in the future.
+
+Action: ``END``
+^^^^^^^^^^^^^^^
+
+End marker for action lists. Prevents further processing of actions, thereby
+ending the list.
+
+- Its numeric value is 0 for convenience.
+- PMD support is mandatory.
+- No configurable properties.
+
+.. _table_rte_flow_action_end:
+
+.. table:: END
+
+   +---------------+
+   | Field         |
+   +===============+
+   | no properties |
+   +---------------+
+
+Action: ``VOID``
+^^^^^^^^^^^^^^^^
+
+Used as a placeholder for convenience. It is ignored and simply discarded by
+PMDs.
+
+- PMD support is mandatory.
+- No configurable properties.
+
+.. _table_rte_flow_action_void:
+
+.. table:: VOID
+
+   +---------------+
+   | Field         |
+   +===============+
+   | no properties |
+   +---------------+
+
+Action: ``PASSTHRU``
+^^^^^^^^^^^^^^^^^^^^
+
+Leaves packets up for additional processing by subsequent flow rules. This
+is the default when a rule does not contain a terminating action, but can be
+specified to force a rule to become non-terminating.
+
+- No configurable properties.
+
+.. _table_rte_flow_action_passthru:
+
+.. table:: PASSTHRU
+
+   +---------------+
+   | Field         |
+   +===============+
+   | no properties |
+   +---------------+
+
+Example to copy a packet to a queue and continue processing by subsequent
+flow rules:
+
+.. _table_rte_flow_action_passthru_example:
+
+.. table:: Copy to queue 8
+
+   +-------+--------+-----------+-------+
+   | Index | Action | Field     | Value |
+   +=======+========+===========+=======+
+   | 0     | PASSTHRU                   |
+   +-------+--------+-----------+-------+
+   | 1     | QUEUE  | ``queue`` | 8     |
+   +-------+--------+-----------+-------+
+   | 2     | END                        |
+   +-------+----------------------------+
+
+Action: ``MARK``
+^^^^^^^^^^^^^^^^
+
+Attaches a 32 bit value to packets.
+
+This value is arbitrary and application-defined. For compatibility with FDIR
+it is returned in the ``hash.fdir.hi`` mbuf field. ``PKT_RX_FDIR_ID`` is
+also set in ``ol_flags``.
+
+.. _table_rte_flow_action_mark:
+
+.. table:: MARK
+
+   +--------+-------------------------------------+
+   | Field  | Value                               |
+   +========+=====================================+
+   | ``id`` | 32 bit value to return with packets |
+   +--------+-------------------------------------+
+
+Action: ``FLAG``
+^^^^^^^^^^^^^^^^
+
+Flag packets. Similar to `Action: MARK`_ but only affects ``ol_flags``.
+
+- No configurable properties.
+
+Note: a distinctive flag must be defined for it.
+
+.. _table_rte_flow_action_flag:
+
+.. table:: FLAG
+
+   +---------------+
+   | Field         |
+   +===============+
+   | no properties |
+   +---------------+
+
+Action: ``QUEUE``
+^^^^^^^^^^^^^^^^^
+
+Assigns packets to a given queue index.
+
+- Terminating by default.
+
+.. _table_rte_flow_action_queue:
+
+.. table:: QUEUE
+
+   +-----------+--------------------+
+   | Field     | Value              |
+   +===========+====================+
+   | ``index`` | queue index to use |
+   +-----------+--------------------+
+
+Action: ``DROP``
+^^^^^^^^^^^^^^^^
+
+Drop packets.
+
+- No configurable properties.
+- Terminating by default.
+- PASSTHRU overrides this action if both are specified.
+
+.. _table_rte_flow_action_drop:
+
+.. table:: DROP
+
+   +---------------+
+   | Field         |
+   +===============+
+   | no properties |
+   +---------------+
+
+Action: ``COUNT``
+^^^^^^^^^^^^^^^^^
+
+Enables counters for this rule.
+
+These counters can be retrieved and reset through ``rte_flow_query()``, see
+``struct rte_flow_query_count``.
+
+- Counters can be retrieved with ``rte_flow_query()``.
+- No configurable properties.
+
+.. _table_rte_flow_action_count:
+
+.. table:: COUNT
+
+   +---------------+
+   | Field         |
+   +===============+
+   | no properties |
+   +---------------+
+
+Query structure to retrieve and reset flow rule counters:
+
+.. _table_rte_flow_query_count:
+
+.. table:: COUNT query
+
+   +---------------+-----+-----------------------------------+
+   | Field         | I/O | Value                             |
+   +===============+=====+===================================+
+   | ``reset``     | in  | reset counter after query         |
+   +---------------+-----+-----------------------------------+
+   | ``hits_set``  | out | ``hits`` field is set             |
+   +---------------+-----+-----------------------------------+
+   | ``bytes_set`` | out | ``bytes`` field is set            |
+   +---------------+-----+-----------------------------------+
+   | ``hits``      | out | number of hits for this rule      |
+   +---------------+-----+-----------------------------------+
+   | ``bytes``     | out | number of bytes through this rule |
+   +---------------+-----+-----------------------------------+
+
+Action: ``DUP``
+^^^^^^^^^^^^^^^
+
+Duplicates packets to a given queue index.
+
+This is normally combined with QUEUE, however when used alone, it is
+actually similar to QUEUE + PASSTHRU.
+
+- Non-terminating by default.
+
+.. _table_rte_flow_action_dup:
+
+.. table:: DUP
+
+   +-----------+------------------------------------+
+   | Field     | Value                              |
+   +===========+====================================+
+   | ``index`` | queue index to duplicate packet to |
+   +-----------+------------------------------------+
+
+Action: ``RSS``
+^^^^^^^^^^^^^^^
+
+Similar to QUEUE, except RSS is additionally performed on packets to spread
+them among several queues according to the provided parameters.
+
+Note: RSS hash result is normally stored in the ``hash.rss`` mbuf field,
+however it conflicts with `Action: MARK`_ as they share the same space. When
+both actions are specified, the RSS hash is discarded and
+``PKT_RX_RSS_HASH`` is not set in ``ol_flags``. MARK has priority. The mbuf
+structure should eventually evolve to store both.
+
+- Terminating by default.
+
+.. _table_rte_flow_action_rss:
+
+.. table:: RSS
+
+   +--------------+------------------------------+
+   | Field        | Value                        |
+   +==============+==============================+
+   | ``rss_conf`` | RSS parameters               |
+   +--------------+------------------------------+
+   | ``num``      | number of entries in queue[] |
+   +--------------+------------------------------+
+   | ``queue[]``  | queue indices to use         |
+   +--------------+------------------------------+
+
+Action: ``PF``
+^^^^^^^^^^^^^^
+
+Redirects packets to the physical function (PF) of the current device.
+
+- No configurable properties.
+- Terminating by default.
+
+.. _table_rte_flow_action_pf:
+
+.. table:: PF
+
+   +---------------+
+   | Field         |
+   +===============+
+   | no properties |
+   +---------------+
+
+Action: ``VF``
+^^^^^^^^^^^^^^
+
+Redirects packets to a virtual function (VF) of the current device.
+
+Packets matched by a VF pattern item can be redirected to their original VF
+ID instead of the specified one. This parameter may not be available and is
+not guaranteed to work properly if the VF part is matched by a prior flow
+rule or if packets are not addressed to a VF in the first place.
+
+- Terminating by default.
+
+.. _table_rte_flow_action_vf:
+
+.. table:: VF
+
+   +--------------+--------------------------------+
+   | Field        | Value                          |
+   +==============+================================+
+   | ``original`` | use original VF ID if possible |
+   +--------------+--------------------------------+
+   | ``vf``       | VF ID to redirect packets to   |
+   +--------------+--------------------------------+
+
+Negative types
+~~~~~~~~~~~~~~
+
+All specified pattern items (``enum rte_flow_item_type``) and actions
+(``enum rte_flow_action_type``) use positive identifiers.
+
+The negative space is reserved for dynamic types generated by PMDs during
+run-time. PMDs may encounter them as a result but must not accept negative
+identifiers they are not aware of.
+
+A method to generate them remains to be defined.
+
+Planned types
+~~~~~~~~~~~~~
+
+Pattern item types will be added as new protocols are implemented.
+
+Variable headers support through dedicated pattern items, for example in
+order to match specific IPv4 options and IPv6 extension headers would be
+stacked after IPv4/IPv6 items.
+
+Other action types are planned but are not defined yet. These include the
+ability to alter packet data in several ways, such as performing
+encapsulation/decapsulation of tunnel headers.
+
+Rules management
+----------------
+
+A rather simple API with few functions is provided to fully manage flow
+rules.
+
+Each created flow rule is associated with an opaque, PMD-specific handle
+pointer. The application is responsible for keeping it until the rule is
+destroyed.
+
+Flows rules are represented by ``struct rte_flow`` objects.
+
+Validation
+~~~~~~~~~~
+
+Given that expressing a definite set of device capabilities is not
+practical, a dedicated function is provided to check if a flow rule is
+supported and can be created.
+
+.. code-block:: c
+
+   int
+   rte_flow_validate(uint8_t port_id,
+                     const struct rte_flow_attr *attr,
+                     const struct rte_flow_item pattern[],
+                     const struct rte_flow_action actions[],
+                     struct rte_flow_error *error);
+
+While this function has no effect on the target device, the flow rule is
+validated against its current configuration state and the returned value
+should be considered valid by the caller for that state only.
+
+The returned value is guaranteed to remain valid only as long as no
+successful calls to ``rte_flow_create()`` or ``rte_flow_destroy()`` are made
+in the meantime and no device parameter affecting flow rules in any way are
+modified, due to possible collisions or resource limitations (although in
+such cases ``EINVAL`` should not be returned).
+
+Arguments:
+
+- ``port_id``: port identifier of Ethernet device.
+- ``attr``: flow rule attributes.
+- ``pattern``: pattern specification (list terminated by the END pattern
+  item).
+- ``actions``: associated actions (list terminated by the END action).
+- ``error``: perform verbose error reporting if not NULL. PMDs initialize
+  this structure in case of error only.
+
+Return values:
+
+- 0 if flow rule is valid and can be created. A negative errno value
+  otherwise (``rte_errno`` is also set), the following errors are defined.
+- ``-ENOSYS``: underlying device does not support this functionality.
+- ``-EINVAL``: unknown or invalid rule specification.
+- ``-ENOTSUP``: valid but unsupported rule specification (e.g. partial
+  bit-masks are unsupported).
+- ``-EEXIST``: collision with an existing rule.
+- ``-ENOMEM``: not enough resources.
+- ``-EBUSY``: action cannot be performed due to busy device resources, may
+  succeed if the affected queues or even the entire port are in a stopped
+  state (see ``rte_eth_dev_rx_queue_stop()`` and ``rte_eth_dev_stop()``).
+
+Creation
+~~~~~~~~
+
+Creating a flow rule is similar to validating one, except the rule is
+actually created and a handle returned.
+
+.. code-block:: c
+
+   struct rte_flow *
+   rte_flow_create(uint8_t port_id,
+                   const struct rte_flow_attr *attr,
+                   const struct rte_flow_item pattern[],
+                   const struct rte_flow_action *actions[],
+                   struct rte_flow_error *error);
+
+Arguments:
+
+- ``port_id``: port identifier of Ethernet device.
+- ``attr``: flow rule attributes.
+- ``pattern``: pattern specification (list terminated by the END pattern
+  item).
+- ``actions``: associated actions (list terminated by the END action).
+- ``error``: perform verbose error reporting if not NULL. PMDs initialize
+  this structure in case of error only.
+
+Return values:
+
+A valid handle in case of success, NULL otherwise and ``rte_errno`` is set
+to the positive version of one of the error codes defined for
+``rte_flow_validate()``.
+
+Destruction
+~~~~~~~~~~~
+
+Flow rules destruction is not automatic, and a queue or a port should not be
+released if any are still attached to them. Applications must take care of
+performing this step before releasing resources.
+
+.. code-block:: c
+
+   int
+   rte_flow_destroy(uint8_t port_id,
+                    struct rte_flow *flow,
+                    struct rte_flow_error *error);
+
+
+Failure to destroy a flow rule handle may occur when other flow rules depend
+on it, and destroying it would result in an inconsistent state.
+
+This function is only guaranteed to succeed if handles are destroyed in
+reverse order of their creation.
+
+Arguments:
+
+- ``port_id``: port identifier of Ethernet device.
+- ``flow``: flow rule handle to destroy.
+- ``error``: perform verbose error reporting if not NULL. PMDs initialize
+  this structure in case of error only.
+
+Return values:
+
+- 0 on success, a negative errno value otherwise and ``rte_errno`` is set.
+
+Flush
+~~~~~
+
+Convenience function to destroy all flow rule handles associated with a
+port. They are released as with successive calls to ``rte_flow_destroy()``.
+
+.. code-block:: c
+
+   int
+   rte_flow_flush(uint8_t port_id,
+                  struct rte_flow_error *error);
+
+In the unlikely event of failure, handles are still considered destroyed and
+no longer valid but the port must be assumed to be in an inconsistent state.
+
+Arguments:
+
+- ``port_id``: port identifier of Ethernet device.
+- ``error``: perform verbose error reporting if not NULL. PMDs initialize
+  this structure in case of error only.
+
+Return values:
+
+- 0 on success, a negative errno value otherwise and ``rte_errno`` is set.
+
+Query
+~~~~~
+
+Query an existing flow rule.
+
+This function allows retrieving flow-specific data such as counters. Data
+is gathered by special actions which must be present in the flow rule
+definition.
+
+.. code-block:: c
+
+   int
+   rte_flow_query(uint8_t port_id,
+                  struct rte_flow *flow,
+                  enum rte_flow_action_type action,
+                  void *data,
+                  struct rte_flow_error *error);
+
+Arguments:
+
+- ``port_id``: port identifier of Ethernet device.
+- ``flow``: flow rule handle to query.
+- ``action``: action type to query.
+- ``data``: pointer to storage for the associated query data type.
+- ``error``: perform verbose error reporting if not NULL. PMDs initialize
+  this structure in case of error only.
+
+Return values:
+
+- 0 on success, a negative errno value otherwise and ``rte_errno`` is set.
+
+Verbose error reporting
+-----------------------
+
+The defined *errno* values may not be accurate enough for users or
+application developers who want to investigate issues related to flow rules
+management. A dedicated error object is defined for this purpose:
+
+.. code-block:: c
+
+   enum rte_flow_error_type {
+       RTE_FLOW_ERROR_TYPE_NONE, /**< No error. */
+       RTE_FLOW_ERROR_TYPE_UNSPECIFIED, /**< Cause unspecified. */
+       RTE_FLOW_ERROR_TYPE_HANDLE, /**< Flow rule (handle). */
+       RTE_FLOW_ERROR_TYPE_ATTR_GROUP, /**< Group field. */
+       RTE_FLOW_ERROR_TYPE_ATTR_PRIORITY, /**< Priority field. */
+       RTE_FLOW_ERROR_TYPE_ATTR_INGRESS, /**< Ingress field. */
+       RTE_FLOW_ERROR_TYPE_ATTR_EGRESS, /**< Egress field. */
+       RTE_FLOW_ERROR_TYPE_ATTR, /**< Attributes structure. */
+       RTE_FLOW_ERROR_TYPE_ITEM_NUM, /**< Pattern length. */
+       RTE_FLOW_ERROR_TYPE_ITEM, /**< Specific pattern item. */
+       RTE_FLOW_ERROR_TYPE_ACTION_NUM, /**< Number of actions. */
+       RTE_FLOW_ERROR_TYPE_ACTION, /**< Specific action. */
+   };
+
+   struct rte_flow_error {
+       enum rte_flow_error_type type; /**< Cause field and error types. */
+       const void *cause; /**< Object responsible for the error. */
+       const char *message; /**< Human-readable error message. */
+   };
+
+Error type ``RTE_FLOW_ERROR_TYPE_NONE`` stands for no error, in which case
+remaining fields can be ignored. Other error types describe the type of the
+object pointed by ``cause``.
+
+If non-NULL, ``cause`` points to the object responsible for the error. For a
+flow rule, this may be a pattern item or an individual action.
+
+If non-NULL, ``message`` provides a human-readable error message.
+
+This object is normally allocated by applications and set by PMDs in case of
+error, the message points to a constant string which does not need to be
+freed by the application, however its pointer can be considered valid only
+as long as its associated DPDK port remains configured. Closing the
+underlying device or unloading the PMD invalidates it.
+
+Caveats
+-------
+
+- DPDK does not keep track of flow rules definitions or flow rule objects
+  automatically. Applications may keep track of the former and must keep
+  track of the latter. PMDs may also do it for internal needs, however this
+  must not be relied on by applications.
+
+- Flow rules are not maintained between successive port initializations. An
+  application exiting without releasing them and restarting must re-create
+  them from scratch.
+
+- API operations are synchronous and blocking (``EAGAIN`` cannot be
+  returned).
+
+- There is no provision for reentrancy/multi-thread safety, although nothing
+  should prevent different devices from being configured at the same
+  time. PMDs may protect their control path functions accordingly.
+
+- Stopping the data path (TX/RX) should not be necessary when managing flow
+  rules. If this cannot be achieved naturally or with workarounds (such as
+  temporarily replacing the burst function pointers), an appropriate error
+  code must be returned (``EBUSY``).
+
+- PMDs, not applications, are responsible for maintaining flow rules
+  configuration when stopping and restarting a port or performing other
+  actions which may affect them. They can only be destroyed explicitly by
+  applications.
+
+For devices exposing multiple ports sharing global settings affected by flow
+rules:
+
+- All ports under DPDK control must behave consistently, PMDs are
+  responsible for making sure that existing flow rules on a port are not
+  affected by other ports.
+
+- Ports not under DPDK control (unaffected or handled by other applications)
+  are user's responsibility. They may affect existing flow rules and cause
+  undefined behavior. PMDs aware of this may prevent flow rules creation
+  altogether in such cases.
+
+PMD interface
+-------------
+
+The PMD interface is defined in ``rte_flow_driver.h``. It is not subject to
+API/ABI versioning constraints as it is not exposed to applications and may
+evolve independently.
+
+It is currently implemented on top of the legacy filtering framework through
+filter type *RTE_ETH_FILTER_GENERIC* that accepts the single operation
+*RTE_ETH_FILTER_GET* to return PMD-specific *rte_flow* callbacks wrapped
+inside ``struct rte_flow_ops``.
+
+This overhead is temporarily necessary in order to keep compatibility with
+the legacy filtering framework, which should eventually disappear.
+
+- PMD callbacks implement exactly the interface described in `Rules
+  management`_, except for the port ID argument which has already been
+  converted to a pointer to the underlying ``struct rte_eth_dev``.
+
+- Public API functions do not process flow rules definitions at all before
+  calling PMD functions (no basic error checking, no validation
+  whatsoever). They only make sure these callbacks are non-NULL or return
+  the ``ENOSYS`` (function not supported) error.
+
+This interface additionally defines the following helper functions:
+
+- ``rte_flow_ops_get()``: get generic flow operations structure from a
+  port.
+
+- ``rte_flow_error_set()``: initialize generic flow error structure.
+
+More will be added over time.
+
+Device compatibility
+--------------------
+
+No known implementation supports all the described features.
+
+Unsupported features or combinations are not expected to be fully emulated
+in software by PMDs for performance reasons. Partially supported features
+may be completed in software as long as hardware performs most of the work
+(such as queue redirection and packet recognition).
+
+However PMDs are expected to do their best to satisfy application requests
+by working around hardware limitations as long as doing so does not affect
+the behavior of existing flow rules.
+
+The following sections provide a few examples of such cases and describe how
+PMDs should handle them, they are based on limitations built into the
+previous APIs.
+
+Global bit-masks
+~~~~~~~~~~~~~~~~
+
+Each flow rule comes with its own, per-layer bit-masks, while hardware may
+support only a single, device-wide bit-mask for a given layer type, so that
+two IPv4 rules cannot use different bit-masks.
+
+The expected behavior in this case is that PMDs automatically configure
+global bit-masks according to the needs of the first flow rule created.
+
+Subsequent rules are allowed only if their bit-masks match those, the
+``EEXIST`` error code should be returned otherwise.
+
+Unsupported layer types
+~~~~~~~~~~~~~~~~~~~~~~~
+
+Many protocols can be simulated by crafting patterns with the `Item: RAW`_
+type.
+
+PMDs can rely on this capability to simulate support for protocols with
+headers not directly recognized by hardware.
+
+``ANY`` pattern item
+~~~~~~~~~~~~~~~~~~~~
+
+This pattern item stands for anything, which can be difficult to translate
+to something hardware would understand, particularly if followed by more
+specific types.
+
+Consider the following pattern:
+
+.. _table_rte_flow_unsupported_any:
+
+.. table:: Pattern with ANY as L3
+
+   +-------+-----------------------+
+   | Index | Item                  |
+   +=======+=======================+
+   | 0     | ETHER                 |
+   +-------+-----+---------+-------+
+   | 1     | ANY | ``num`` | ``1`` |
+   +-------+-----+---------+-------+
+   | 2     | TCP                   |
+   +-------+-----------------------+
+   | 3     | END                   |
+   +-------+-----------------------+
+
+Knowing that TCP does not make sense with something other than IPv4 and IPv6
+as L3, such a pattern may be translated to two flow rules instead:
+
+.. _table_rte_flow_unsupported_any_ipv4:
+
+.. table:: ANY replaced with IPV4
+
+   +-------+--------------------+
+   | Index | Item               |
+   +=======+====================+
+   | 0     | ETHER              |
+   +-------+--------------------+
+   | 1     | IPV4 (zeroed mask) |
+   +-------+--------------------+
+   | 2     | TCP                |
+   +-------+--------------------+
+   | 3     | END                |
+   +-------+--------------------+
+
+|
+
+.. _table_rte_flow_unsupported_any_ipv6:
+
+.. table:: ANY replaced with IPV6
+
+   +-------+--------------------+
+   | Index | Item               |
+   +=======+====================+
+   | 0     | ETHER              |
+   +-------+--------------------+
+   | 1     | IPV6 (zeroed mask) |
+   +-------+--------------------+
+   | 2     | TCP                |
+   +-------+--------------------+
+   | 3     | END                |
+   +-------+--------------------+
+
+Note that as soon as a ANY rule covers several layers, this approach may
+yield a large number of hidden flow rules. It is thus suggested to only
+support the most common scenarios (anything as L2 and/or L3).
+
+Unsupported actions
+~~~~~~~~~~~~~~~~~~~
+
+- When combined with `Action: QUEUE`_, packet counting (`Action: COUNT`_)
+  and tagging (`Action: MARK`_ or `Action: FLAG`_) may be implemented in
+  software as long as the target queue is used by a single rule.
+
+- A rule specifying both `Action: DUP`_ + `Action: QUEUE`_ may be translated
+  to two hidden rules combining `Action: QUEUE`_ and `Action: PASSTHRU`_.
+
+- When a single target queue is provided, `Action: RSS`_ can also be
+  implemented through `Action: QUEUE`_.
+
+Flow rules priority
+~~~~~~~~~~~~~~~~~~~
+
+While it would naturally make sense, flow rules cannot be assumed to be
+processed by hardware in the same order as their creation for several
+reasons:
+
+- They may be managed internally as a tree or a hash table instead of a
+  list.
+- Removing a flow rule before adding another one can either put the new rule
+  at the end of the list or reuse a freed entry.
+- Duplication may occur when packets are matched by several rules.
+
+For overlapping rules (particularly in order to use `Action: PASSTHRU`_)
+predictable behavior is only guaranteed by using different priority levels.
+
+Priority levels are not necessarily implemented in hardware, or may be
+severely limited (e.g. a single priority bit).
+
+For these reasons, priority levels may be implemented purely in software by
+PMDs.
+
+- For devices expecting flow rules to be added in the correct order, PMDs
+  may destroy and re-create existing rules after adding a new one with
+  a higher priority.
+
+- A configurable number of dummy or empty rules can be created at
+  initialization time to save high priority slots for later.
+
+- In order to save priority levels, PMDs may evaluate whether rules are
+  likely to collide and adjust their priority accordingly.
+
+Future evolutions
+-----------------
+
+- A device profile selection function which could be used to force a
+  permanent profile instead of relying on its automatic configuration based
+  on existing flow rules.
+
+- A method to optimize *rte_flow* rules with specific pattern items and
+  action types generated on the fly by PMDs. DPDK should assign negative
+  numbers to these in order to not collide with the existing types. See
+  `Negative types`_.
+
+- Adding specific egress pattern items and actions as described in
+  `Attribute: Traffic direction`_.
+
+- Optional software fallback when PMDs are unable to handle requested flow
+  rules so applications do not have to implement their own.
+
+API migration
+-------------
+
+Exhaustive list of deprecated filter types (normally prefixed with
+*RTE_ETH_FILTER_*) found in ``rte_eth_ctrl.h`` and methods to convert them
+to *rte_flow* rules.
+
+``MACVLAN`` to ``ETH`` → ``VF``, ``PF``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+*MACVLAN* can be translated to a basic `Item: ETH`_ flow rule with a
+terminating `Action: VF`_ or `Action: PF`_.
+
+.. _table_rte_flow_migration_macvlan:
+
+.. table:: MACVLAN conversion
+
+   +--------------------------+---------+
+   | Pattern                  | Actions |
+   +===+=====+==========+=====+=========+
+   | 0 | ETH | ``spec`` | any | VF,     |
+   |   |     +----------+-----+ PF      |
+   |   |     | ``last`` | N/A |         |
+   |   |     +----------+-----+         |
+   |   |     | ``mask`` | any |         |
+   +---+-----+----------+-----+---------+
+   | 1 | END                  | END     |
+   +---+----------------------+---------+
+
+``ETHERTYPE`` to ``ETH`` → ``QUEUE``, ``DROP``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+*ETHERTYPE* is basically an `Item: ETH`_ flow rule with a terminating
+`Action: QUEUE`_ or `Action: DROP`_.
+
+.. _table_rte_flow_migration_ethertype:
+
+.. table:: ETHERTYPE conversion
+
+   +--------------------------+---------+
+   | Pattern                  | Actions |
+   +===+=====+==========+=====+=========+
+   | 0 | ETH | ``spec`` | any | QUEUE,  |
+   |   |     +----------+-----+ DROP    |
+   |   |     | ``last`` | N/A |         |
+   |   |     +----------+-----+         |
+   |   |     | ``mask`` | any |         |
+   +---+-----+----------+-----+---------+
+   | 1 | END                  | END     |
+   +---+----------------------+---------+
+
+``FLEXIBLE`` to ``RAW`` → ``QUEUE``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+*FLEXIBLE* can be translated to one `Item: RAW`_ pattern with a terminating
+`Action: QUEUE`_ and a defined priority level.
+
+.. _table_rte_flow_migration_flexible:
+
+.. table:: FLEXIBLE conversion
+
+   +--------------------------+---------+
+   | Pattern                  | Actions |
+   +===+=====+==========+=====+=========+
+   | 0 | RAW | ``spec`` | any | QUEUE   |
+   |   |     +----------+-----+         |
+   |   |     | ``last`` | N/A |         |
+   |   |     +----------+-----+         |
+   |   |     | ``mask`` | any |         |
+   +---+-----+----------+-----+---------+
+   | 1 | END                  | END     |
+   +---+----------------------+---------+
+
+``SYN`` to ``TCP`` → ``QUEUE``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+*SYN* is a `Item: TCP`_ rule with only the ``syn`` bit enabled and masked,
+and a terminating `Action: QUEUE`_.
+
+Priority level can be set to simulate the high priority bit.
+
+.. _table_rte_flow_migration_syn:
+
+.. table:: SYN conversion
+
+   +-----------------------------------+---------+
+   | Pattern                           | Actions |
+   +===+======+==========+=============+=========+
+   | 0 | ETH  | ``spec`` | unset       | QUEUE   |
+   |   |      +----------+-------------+         |
+   |   |      | ``last`` | unset       |         |
+   |   |      +----------+-------------+         |
+   |   |      | ``mask`` | unset       |         |
+   +---+------+----------+-------------+---------+
+   | 1 | IPV4 | ``spec`` | unset       | END     |
+   |   |      +----------+-------------+         |
+   |   |      | ``mask`` | unset       |         |
+   |   |      +----------+-------------+         |
+   |   |      | ``mask`` | unset       |         |
+   +---+------+----------+---------+---+         |
+   | 2 | TCP  | ``spec`` | ``syn`` | 1 |         |
+   |   |      +----------+---------+---+         |
+   |   |      | ``mask`` | ``syn`` | 1 |         |
+   +---+------+----------+---------+---+         |
+   | 3 | END                           |         |
+   +---+-------------------------------+---------+
+
+``NTUPLE`` to ``IPV4``, ``TCP``, ``UDP`` → ``QUEUE``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+*NTUPLE* is similar to specifying an empty L2, `Item: IPV4`_ as L3 with
+`Item: TCP`_ or `Item: UDP`_ as L4 and a terminating `Action: QUEUE`_.
+
+A priority level can be specified as well.
+
+.. _table_rte_flow_migration_ntuple:
+
+.. table:: NTUPLE conversion
+
+   +-----------------------------+---------+
+   | Pattern                     | Actions |
+   +===+======+==========+=======+=========+
+   | 0 | ETH  | ``spec`` | unset | QUEUE   |
+   |   |      +----------+-------+         |
+   |   |      | ``last`` | unset |         |
+   |   |      +----------+-------+         |
+   |   |      | ``mask`` | unset |         |
+   +---+------+----------+-------+---------+
+   | 1 | IPV4 | ``spec`` | any   | END     |
+   |   |      +----------+-------+         |
+   |   |      | ``last`` | unset |         |
+   |   |      +----------+-------+         |
+   |   |      | ``mask`` | any   |         |
+   +---+------+----------+-------+         |
+   | 2 | TCP, | ``spec`` | any   |         |
+   |   | UDP  +----------+-------+         |
+   |   |      | ``last`` | unset |         |
+   |   |      +----------+-------+         |
+   |   |      | ``mask`` | any   |         |
+   +---+------+----------+-------+         |
+   | 3 | END                     |         |
+   +---+-------------------------+---------+
+
+``TUNNEL`` to ``ETH``, ``IPV4``, ``IPV6``, ``VXLAN`` (or other) → ``QUEUE``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+*TUNNEL* matches common IPv4 and IPv6 L3/L4-based tunnel types.
+
+In the following table, `Item: ANY`_ is used to cover the optional L4.
+
+.. _table_rte_flow_migration_tunnel:
+
+.. table:: TUNNEL conversion
+
+   +-------------------------------------------------------+---------+
+   | Pattern                                               | Actions |
+   +===+==========================+==========+=============+=========+
+   | 0 | ETH                      | ``spec`` | any         | QUEUE   |
+   |   |                          +----------+-------------+         |
+   |   |                          | ``last`` | unset       |         |
+   |   |                          +----------+-------------+         |
+   |   |                          | ``mask`` | any         |         |
+   +---+--------------------------+----------+-------------+---------+
+   | 1 | IPV4, IPV6               | ``spec`` | any         | END     |
+   |   |                          +----------+-------------+         |
+   |   |                          | ``last`` | unset       |         |
+   |   |                          +----------+-------------+         |
+   |   |                          | ``mask`` | any         |         |
+   +---+--------------------------+----------+-------------+         |
+   | 2 | ANY                      | ``spec`` | any         |         |
+   |   |                          +----------+-------------+         |
+   |   |                          | ``last`` | unset       |         |
+   |   |                          +----------+---------+---+         |
+   |   |                          | ``mask`` | ``num`` | 0 |         |
+   +---+--------------------------+----------+---------+---+         |
+   | 3 | VXLAN, GENEVE, TEREDO,   | ``spec`` | any         |         |
+   |   | NVGRE, GRE, ...          +----------+-------------+         |
+   |   |                          | ``last`` | unset       |         |
+   |   |                          +----------+-------------+         |
+   |   |                          | ``mask`` | any         |         |
+   +---+--------------------------+----------+-------------+         |
+   | 4 | END                                               |         |
+   +---+---------------------------------------------------+---------+
+
+``FDIR`` to most item types → ``QUEUE``, ``DROP``, ``PASSTHRU``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+*FDIR* is more complex than any other type, there are several methods to
+emulate its functionality. It is summarized for the most part in the table
+below.
+
+A few features are intentionally not supported:
+
+- The ability to configure the matching input set and masks for the entire
+  device, PMDs should take care of it automatically according to the
+  requested flow rules.
+
+  For example if a device supports only one bit-mask per protocol type,
+  source/address IPv4 bit-masks can be made immutable by the first created
+  rule. Subsequent IPv4 or TCPv4 rules can only be created if they are
+  compatible.
+
+  Note that only protocol bit-masks affected by existing flow rules are
+  immutable, others can be changed later. They become mutable again after
+  the related flow rules are destroyed.
+
+- Returning four or eight bytes of matched data when using flex bytes
+  filtering. Although a specific action could implement it, it conflicts
+  with the much more useful 32 bits tagging on devices that support it.
+
+- Side effects on RSS processing of the entire device. Flow rules that
+  conflict with the current device configuration should not be
+  allowed. Similarly, device configuration should not be allowed when it
+  affects existing flow rules.
+
+- Device modes of operation. "none" is unsupported since filtering cannot be
+  disabled as long as a flow rule is present.
+
+- "MAC VLAN" or "tunnel" perfect matching modes should be automatically set
+  according to the created flow rules.
+
+- Signature mode of operation is not defined but could be handled through a
+  specific item type if needed.
+
+.. _table_rte_flow_migration_fdir:
+
+.. table:: FDIR conversion
+
+   +---------------------------------+------------+
+   | Pattern                         | Actions    |
+   +===+============+==========+=====+============+
+   | 0 | ETH,       | ``spec`` | any | QUEUE,     |
+   |   | RAW        +----------+-----+ DROP,      |
+   |   |            | ``last`` | N/A | PASSTHRU   |
+   |   |            +----------+-----+            |
+   |   |            | ``mask`` | any |            |
+   +---+------------+----------+-----+------------+
+   | 1 | IPV4,      | ``spec`` | any | MARK       |
+   |   | IPV6       +----------+-----+            |
+   |   |            | ``last`` | N/A |            |
+   |   |            +----------+-----+            |
+   |   |            | ``mask`` | any |            |
+   +---+------------+----------+-----+------------+
+   | 2 | TCP,       | ``spec`` | any | END        |
+   |   | UDP,       +----------+-----+            |
+   |   | SCTP       | ``last`` | N/A |            |
+   |   |            +----------+-----+            |
+   |   |            | ``mask`` | any |            |
+   +---+------------+----------+-----+            |
+   | 3 | VF,        | ``spec`` | any |            |
+   |   | PF         +----------+-----+            |
+   |   | (optional) | ``last`` | N/A |            |
+   |   |            +----------+-----+            |
+   |   |            | ``mask`` | any |            |
+   +---+------------+----------+-----+            |
+   | 4 | END                         |            |
+   +---+-----------------------------+------------+
+
+``HASH``
+~~~~~~~~
+
+There is no counterpart to this filter type because it translates to a
+global device setting instead of a pattern item. Device settings are
+automatically set according to the created flow rules.
+
+``L2_TUNNEL`` to ``VOID`` → ``VXLAN`` (or others)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+All packets are matched. This type alters incoming packets to encapsulate
+them in a chosen tunnel type, optionally redirect them to a VF as well.
+
+The destination pool for tag based forwarding can be emulated with other
+flow rules using `Action: DUP`_.
+
+.. _table_rte_flow_migration_l2tunnel:
+
+.. table:: L2_TUNNEL conversion
+
+   +---------------------------+------------+
+   | Pattern                   | Actions    |
+   +===+======+==========+=====+============+
+   | 0 | VOID | ``spec`` | N/A | VXLAN,     |
+   |   |      |          |     | GENEVE,    |
+   |   |      |          |     | ...        |
+   |   |      +----------+-----+            |
+   |   |      | ``last`` | N/A |            |
+   |   |      +----------+-----+            |
+   |   |      | ``mask`` | N/A |            |
+   |   |      |          |     |            |
+   +---+------+----------+-----+------------+
+   | 1 | END                   | VF         |
+   |   |                       | (optional) |
+   +---+                       +------------+
+   | 2 |                       | END        |
+   +---+-----------------------+------------+
-- 
2.1.4



More information about the dev mailing list