[dpdk-dev] [RFC] - Offloading tunnel ports

Oz Shlomo ozsh at mellanox.com
Thu Jul 2 13:43:06 CEST 2020



On 7/2/2020 2:34 PM, Sriharsha Basavapatna wrote:
> On Tue, Jun 9, 2020 at 8:37 PM Oz Shlomo <ozsh at mellanox.com> wrote:
>>
>> Rte_flow API provides the building blocks for vendor agnostic flow
>> classification offloads.  The rte_flow match and action primitives are fine
>> grained, thus enabling DPDK applications the flexibility to offload network
>> stacks and complex pipelines.
>>
>> Applications wishing to offload complex data structures (e.g. tunnel virtual
>> ports) are required to use the rte_flow primitives, such as group, meta, mark,
>> tag and others to model their high level objects.
>>
>> The hardware model design for high level software objects is not trivial.
>> Furthermore, an optimal design is often vendor specific.
>>
>> The goal of this RFC is to provide applications with the hardware offload
>> model for common high level software objects which is optimal in regards
>> to the underlying hardware.
>>
>> Tunnel ports are the first of such objects.
>>
>> Tunnel ports
>> ------------
>> Ingress processing of tunneled traffic requires the classification
>> of the tunnel type followed by a decap action.
>>
>> In software, once a packet is decapsulated the in_port field is changed
>> to a virtual port representing the tunnel type. The outer header fields
>> are stored as packet metadata members and may be matched by proceeding
>> flows.
>>
>> Openvswitch, for example, uses two flows:
>> 1. classification flow - setting the virtual port representing the tunnel type
>> For example: match on udp port 4789 actions=tnl_pop(vxlan_vport)
>> 2. steering flow according to outer and inner header matches
>> match on in_port=vxlan_vport and outer/inner header matches actions=forward to port X
>> The benefits of multi-flow tables are described in [1].
> 
> You probably missed to add a link to this reference [1] ? I couldn't
> find it in this email.
> 
> Thanks,
> -Harsha

Right, sorry about that. Here is the reference:
[1] - https://www.opennetworking.org/wp-content/uploads/2014/10/TR_Multiple_Flow_Tables_and_TTPs.pdf

>>
>> Offloading tunnel ports
>> -----------------------
>> Tunnel ports introduce a new stateless field that can be matched on.
>> Currently the rte_flow library provides an API to encap, decap and match
>> on tunnel headers. However, there is no rte_flow primitive to set and
>> match tunnel virtual ports.
>>
>> There are several possible hardware models for offloading virtual tunnel port
>> flows including, but not limited to, the following:
>> 1. Setting the virtual port on a hw register using the rte_flow_action_mark/
>> rte_flow_action_tag/rte_flow_set_meta objects.
>> 2. Mapping a virtual port to an rte_flow group
>> 3. Avoiding the need to match on transient objects by merging multi-table
>> flows to a single rte_flow rule.
>>
>> Every approach has its pros and cons.
>> The preferred approach should take into account the entire system architecture
>> and is very often vendor specific.
>>
>> The proposed rte_flow_tunnel_port_set helper function (drafted below) is designed
>> to provide a common, vendor agnostic, API for setting the virtual port value.
>> The helper API enables PMD implementations to return vendor specific combination of
>> rte_flow actions realizing the vendor's hardware model for setting a tunnel port.
>> Applications may append the list of actions returned from the helper function when
>> creating an rte_flow rule in hardware.
>>
>> Similarly, the rte_flow_tunnel_port_match helper (drafted below) allows for
>> multiple hardware implementations to return a list of fte_flow items.
>>
>> Miss handling
>> -------------
>> Packets going through multiple rte_flow groups are exposed to hw misses due to
>> partial packet processing. In such cases, the software should continue the
>> packet's processing from the point where the hardware missed.
>>
>> We propose a generic rte_flow_restore structure providing the state that was
>> stored in hardware when the packet missed.
>>
>> Currently, the structure will provide the tunnel state of the packet that
>> missed, namely:
>> 1. The group id that missed
>> 2. The tunnel port that missed
>> 3. Tunnel information that was stored in memory (due to decap action).
>> In the future, we may add additional fields as more state may be stored in
>> the device memory (e.g. ct_state).
>>
>> Applications may query the state via a new rte_flow_get_restore_info(mbuf) API,
>> thus allowing a vendor specific implementation.
>>
>> API draft is provided below
>>
>> ---
>> diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h
>> index b0e4199192..49c871fc46 100644
>> --- a/lib/librte_ethdev/rte_flow.h
>> +++ b/lib/librte_ethdev/rte_flow.h
>> @@ -3324,6 +3324,193 @@ int
>>    rte_flow_get_aged_flows(uint16_t port_id, void **contexts,
>>                          uint32_t nb_contexts, struct rte_flow_error *error);
>>
>> +/* Tunnel information. */
>> +__rte_experimental
>> +struct rte_flow_ip_tunnel_key {
>> +       rte_be64_t tun_id; /**< Tunnel identification. */
>> +       union {
>> +               struct {
>> +                       rte_be32_t src; /**< IPv4 source address. */
>> +                       rte_be32_t dst; /**< IPv4 destination address. */
>> +               } ipv4;
>> +               struct {
>> +                       uint8_t src[16]; /**< IPv6 source address. */
>> +                       uint8_t dst[16]; /**< IPv6 destination address. */
>> +               } ipv6;
>> +       } u;
>> +       bool       is_ipv6; /**< True for valid IPv6 fields. Otherwise IPv4. */
>> +       rte_be16_t tun_flags; /**< Tunnel flags. */
>> +       uint8_t    tos; /**< TOS for IPv4, TC for IPv6. */
>> +       uint8_t    ttl; /**< TTL for IPv4, HL for IPv6. */
>> +       rte_be32_t label; /**< Flow Label for IPv6. */
>> +       rte_be16_t tp_src; /**< Tunnel port source. */
>> +       rte_be16_t tp_dst; /**< Tunnel port destination. */
>> +};
>> +
>> +
>> +/* Tunnel has a type and the key information. */
>> +__rte_experimental
>> +struct rte_flow_tunnel {
>> +       /** Tunnel type, for example RTE_FLOW_ITEM_TYPE_VXLAN,
>> +         * RTE_FLOW_ITEM_TYPE_NVGRE etc. */
>> +       enum rte_flow_item_type         type;
>> +       struct rte_flow_ip_tunnel_key   tun_info; /**< Tunnel key info. */
>> +};
>> +
>> +/**
>> + * Indicate that the packet has a tunnel.
>> + */
>> +#define RTE_FLOW_RESTORE_INFO_TUNNEL  (1ULL << 0)
>> +
>> +/**
>> + * Indicate that the packet has a non decapsulated tunnel header.
>> + */
>> +#define RTE_FLOW_RESTORE_INFO_ENCAPSULATED  (1ULL << 1)
>> +
>> +/**
>> + * Indicate that the packet has a group_id.
>> + */
>> +#define RTE_FLOW_RESTORE_INFO_GROUP_ID  (1ULL << 2)
>> +
>> +/**
>> + * Restore information structure to communicate the current packet processing
>> + * state when some of the processing pipeline is done in hardware and should
>> + * continue in software.
>> + */
>> +__rte_experimental
>> +struct rte_flow_restore_info {
>> +       /** Bitwise flags (RTE_FLOW_RESTORE_INFO_*) to indicate validation of
>> +         * other fields in struct rte_flow_restore_info.
>> +         */
>> +       uint64_t flags;
>> +       uint32_t group_id; /**< Group ID. */
>> +       struct rte_flow_tunnel tunnel; /**< Tunnel information. */
>> +};
>> +
>> +/**
>> + * Allocate an array of actions to be used in rte_flow_create, to implement
>> + * tunnel-set for the given tunnel.
>> + * Sample usage:
>> + *   actions vxlan_decap / tunnel_set(tunnel properties) / jump group 0 / end
>> + *
>> + * @param port_id
>> + *   Port identifier of Ethernet device.
>> + * @param[in] tunnel
>> + *   Tunnel properties.
>> + * @param[out] actions
>> + *   Array of actions to be allocated by the PMD. This array should be
>> + *   concatenated with the actions array provided to rte_flow_create.
>> + * @param[out] num_of_actions
>> + *   Number of actions allocated.
>> + * @param[out] error
>> + *   Perform verbose error reporting if not NULL. PMDs initialize this
>> + *   structure in case of error only.
>> + *
>> + * @return
>> + *   0 on success, a negative errno value otherwise and rte_errno is set.
>> + */
>> +__rte_experimental
>> +int
>> +rte_flow_tunnel_set(uint16_t port_id,
>> +                   struct rte_flow_tunnel *tunnel,
>> +                   struct rte_flow_action **actions,
>> +                   uint32_t *num_of_actions,
>> +                   struct rte_flow_error *error);
>> +
>> +/**
>> + * Allocate an array of items to be used in rte_flow_create, to implement
>> + * tunnel-match for the given tunnel.
>> + * Sample usage:
>> + *   pattern tunnel-match(tunnel properties) / outer-header-matches /
>> + *           inner-header-matches / end
>> + *
>> + * @param port_id
>> + *   Port identifier of Ethernet device.
>> + * @param[in] tunnel
>> + *   Tunnel properties.
>> + * @param[out] items
>> + *   Array of items to be allocated by the PMD. This array should be
>> + *   concatenated with the items array provided to rte_flow_create.
>> + * @param[out] num_of_items
>> + *   Number of items allocated.
>> + * @param[out] error
>> + *   Perform verbose error reporting if not NULL. PMDs initialize this
>> + *   structure in case of error only.
>> + *
>> + * @return
>> + *   0 on success, a negative errno value otherwise and rte_errno is set.
>> + */
>> +__rte_experimental
>> +int
>> +rte_flow_tunnel_match(uint16_t port_id,
>> +                     struct rte_flow_tunnel *tunnel,
>> +                     struct rte_flow_item **items,
>> +                     uint32_t *num_of_items,
>> +                     struct rte_flow_error *error);
>> +
>> +/**
>> + * Populate the current packet processing state, if exists, for the given mbuf.
>> + *
>> + * @param port_id
>> + *   Port identifier of Ethernet device.
>> + * @param[in] m
>> + *   Mbuf struct.
>> + * @param[out] info
>> + *   Restore information. Upon success contains the HW state.
>> + * @param[out] error
>> + *   Perform verbose error reporting if not NULL. PMDs initialize this
>> + *   structure in case of error only.
>> + *
>> + * @return
>> + *   0 on success, a negative errno value otherwise and rte_errno is set.
>> + */
>> +__rte_experimental
>> +int
>> +rte_flow_get_restore_info(uint16_t port_id,
>> +                         struct rte_mbuf *m,
>> +                         struct rte_flow_restore_info *info,
>> +                         struct rte_flow_error *error);
>> +
>> +/**
>> + * Release the action array as allocated by rte_flow_tunnel_set.
>> + *
>> + * @param port_id
>> + *   Port identifier of Ethernet device.
>> + * @param[in] actions
>> + *   Array of actions to be released.
>> + * @param[out] error
>> + *   Perform verbose error reporting if not NULL. PMDs initialize this
>> + *   structure in case of error only.
>> + *
>> + * @return
>> + *   0 on success, a negative errno value otherwise and rte_errno is set.
>> + */
>> +__rte_experimental
>> +int
>> +rte_flow_action_release(uint16_t port_id,
>> +                       struct rte_flow_action *actions,
>> +                       struct rte_flow_error *error);
>> +
>> +/**
>> + * Release the item array as allocated by rte_flow_tunnel_match.
>> + *
>> + * @param port_id
>> + *   Port identifier of Ethernet device.
>> + * @param[in] items
>> + *   Array of items to be released.
>> + * @param[out] error
>> + *   Perform verbose error reporting if not NULL. PMDs initialize this
>> + *   structure in case of error only.
>> + *
>> + * @return
>> + *   0 on success, a negative errno value otherwise and rte_errno is set.
>> + */
>> +__rte_experimental
>> +int
>> +rte_flow_item_release(uint16_t port_id,
>> +                     struct rte_flow_item *items,
>> +                     struct rte_flow_error *error);
>> +
>>    #ifdef __cplusplus
>>    }
>>    #endif


More information about the dev mailing list