[dpdk-dev] [PATCH 2/2] ethdev: tunnel offload model
Andrew Rybchenko
arybchenko at solarflare.com
Sun Jul 5 16:50:49 CEST 2020
Hi Gregory,
I'm sorry for the review with toooo many questions without any
suggestions on how to answer it. Please, see below.
On 6/25/20 7:03 PM, Gregory Etelson wrote:
> From: Eli Britstein <elibr at mellanox.com>
>
> Hardware vendors implement tunneled traffic offload techniques
> differently. Although RTE flow API provides tools capable to offload
> all sorts of network stacks, software application must reference this
> hardware differences in flow rules compilation. As the result tunneled
> traffic flow rules that utilize hardware capabilities can be different
> for the same traffic.
>
> Tunnel port offload proposed in [1] provides software application with
> unified rules model for tunneled traffic regardless underlying
> hardware.
> - The model introduces a concept of a virtual tunnel port (VTP).
> - The model uses VTP to offload ingress tunneled network traffic
> with RTE flow rules.
> - The model is implemented as set of helper functions. Each PMD
> implements VTP offload according to underlying hardware offload
> capabilities. Applications must query PMD for VTP flow
> items / actions before using in creation of a VTP flow rule.
>
> The model components:
> - Virtual Tunnel Port (VTP) is a stateless software object that
> describes tunneled network traffic. VTP object usually contains
> descriptions of outer headers, tunnel headers and inner headers.
> - Tunnel Steering flow Rule (TSR) detects tunneled packets and
> delegates them to tunnel processing infrastructure, implemented
> in PMD for optimal hardware utilization, for further processing.
> - Tunnel Matching flow Rule (TMR) verifies packet configuration and
> runs offload actions in case of a match.
>
> Application actions:
> 1 Initialize VTP object according to tunnel
> network parameters.
> 2 Create TSR flow rule:
> 2.1 Query PMD for VTP actions: application can query for VTP actions
> more than once
> int
> rte_flow_tunnel_decap_set(uint16_t port_id,
> struct rte_flow_tunnel *tunnel,
> struct rte_flow_action **pmd_actions,
> uint32_t *num_of_pmd_actions,
> struct rte_flow_error *error);
>
> 2.2 Integrate PMD actions into TSR actions list.
> 2.3 Create TSR flow rule:
> flow create <port> group 0
> match {tunnel items} / end
> actions {PMD actions} / {App actions} / end
>
> 3 Create TMR flow rule:
> 3.1 Query PMD for VTP items: application can query for VTP items
> more than once
> int
> rte_flow_tunnel_match(uint16_t port_id,
> struct rte_flow_tunnel *tunnel,
> struct rte_flow_item **pmd_items,
> uint32_t *num_of_pmd_items,
> struct rte_flow_error *error);
>
> 3.2 Integrate PMD items into TMR items list:
> 3.3 Create TMR flow rule
> flow create <port> group 0
> match {PMD items} / {APP items} / end
> actions {offload actions} / end
>
> The model provides helper function call to restore packets that miss
> tunnel TMR rules to its original state:
> int
> rte_flow_get_restore_info(uint16_t port_id,
> struct rte_mbuf *mbuf,
> struct rte_flow_restore_info *info,
> struct rte_flow_error *error);
>
> rte_tunnel object filled by the call inside
> rte_flow_restore_info *info parameter can be used by the application
> to create new TMR rule for that tunnel.
>
> The model requirements:
> Software application must initialize
> rte_tunnel object with tunnel parameters before calling
> rte_flow_tunnel_decap_set() & rte_flow_tunnel_match().
>
> PMD actions array obtained in rte_flow_tunnel_decap_set() must be
> released by application with rte_flow_action_release() call.
> Application can release the actionsfter TSR rule was created.
>
> PMD items array obtained with rte_flow_tunnel_match() must be released
> by application with rte_flow_item_release() call. Application can
> release the items after rule was created. However, if the application
> needs to create additional TMR rule for the same tunnel it will need
> to obtain PMD items again.
>
> Application cannot destroy rte_tunnel object before it releases all
> PMD actions & PMD items referencing that tunnel.
>
> [1] https://mails.dpdk.org/archives/dev/2020-June/169656.html
>
> Signed-off-by: Eli Britstein <elibr at mellanox.com>
> Acked-by: Ori Kam <orika at mellanox.com>
> ---
> doc/guides/prog_guide/rte_flow.rst | 105 ++++++++++++
> lib/librte_ethdev/rte_ethdev_version.map | 5 +
> lib/librte_ethdev/rte_flow.c | 112 +++++++++++++
> lib/librte_ethdev/rte_flow.h | 196 +++++++++++++++++++++++
> lib/librte_ethdev/rte_flow_driver.h | 32 ++++
> 5 files changed, 450 insertions(+)
>
> diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
> index d5dd18ce99..cfd98c2e7d 100644
> --- a/doc/guides/prog_guide/rte_flow.rst
> +++ b/doc/guides/prog_guide/rte_flow.rst
> @@ -3010,6 +3010,111 @@ operations include:
> - Duplication of a complete flow rule description.
> - Pattern item or action name retrieval.
>
> +Tunneled traffic offload
> +~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Provide software application with unified rules model for tunneled traffic
> +regardless underlying hardware.
> +
> + - The model introduces a concept of a virtual tunnel port (VTP).
It looks like it is absolutely abstract concept now, since it
is not mentioned/referenced in the header file. It makes it
hard to put the description and API together.
> + - The model uses VTP to offload ingress tunneled network traffic
> + with RTE flow rules.
> + - The model is implemented as set of helper functions. Each PMD
> + implements VTP offload according to underlying hardware offload
> + capabilities. Applications must query PMD for VTP flow
> + items / actions before using in creation of a VTP flow rule.
For me it looks like "creation of a VTP flow rule" is not
covered yet. Flow rules examples mention it in pattern and
actions, but there is no corresponding pattern items and
actions. May be I simply misunderstand the idea.
> +
> +The model components:
> +
> +- Virtual Tunnel Port (VTP) is a stateless software object that
> + describes tunneled network traffic. VTP object usually contains
> + descriptions of outer headers, tunnel headers and inner headers.
Are inner headers really a part of the tunnel description?
> +- Tunnel Steering flow Rule (TSR) detects tunneled packets and
> + delegates them to tunnel processing infrastructure, implemented
> + in PMD for optimal hardware utilization, for further processing.
> +- Tunnel Matching flow Rule (TMR) verifies packet configuration and
> + runs offload actions in case of a match.
Is it for fully offloaded tunnels with encap/decap or all
tunnels (detected, but partially offloaded, e.g. checksumming)?
> +
> +Application actions:
> +
> +1 Initialize VTP object according to tunnel network parameters.
I.e. fill in 'struct rte_flow_tunnel'. Is it correct?
> +
> +2 Create TSR flow rule.
> +
> +2.1 Query PMD for VTP actions. Application can query for VTP actions more than once.
> +
> + .. code-block:: c
> +
> + int
> + rte_flow_tunnel_decap_set(uint16_t port_id,
> + struct rte_flow_tunnel *tunnel,
> + struct rte_flow_action **pmd_actions,
> + uint32_t *num_of_pmd_actions,
> + struct rte_flow_error *error);
> +
> +2.2 Integrate PMD actions into TSR actions list.
> +
> +2.3 Create TSR flow rule.
> +
> + .. code-block:: console
> +
> + flow create <port> group 0 match {tunnel items} / end actions {PMD actions} / {App actions} / end
Are application actions strictly required?
If no, it is better to make it clear.
Do tunnel items correlate here somehow with tunnel
specification in 'struct rte_flow_tunnel'?
Is it obtained using rte_flow_tunnel_match()?
> +
> +3 Create TMR flow rule.
> +
> +3.1 Query PMD for VTP items. Application can query for VTP items more than once.
> +
> + .. code-block:: c
> +
> + int
> + rte_flow_tunnel_match(uint16_t port_id,
> + struct rte_flow_tunnel *tunnel,
> + struct rte_flow_item **pmd_items,
> + uint32_t *num_of_pmd_items,
> + struct rte_flow_error *error);
> +
> +3.2 Integrate PMD items into TMR items list.
> +
> +3.3 Create TMR flow rule.
> +
> + .. code-block:: console
> +
> + flow create <port> group 0 match {PMD items} / {APP items} / end actions {offload actions} / end
> +
> +The model provides helper function call to restore packets that miss
> +tunnel TMR rules to its original state:
> +
> +.. code-block:: c
> +
> + int
> + rte_flow_get_restore_info(uint16_t port_id,
> + struct rte_mbuf *mbuf,
> + struct rte_flow_restore_info *info,
> + struct rte_flow_error *error);
> +
> +rte_tunnel object filled by the call inside
> +``rte_flow_restore_info *info parameter`` can be used by the application
> +to create new TMR rule for that tunnel.
I think an example, for example, for VXLAN over IPv4 tunnel
case with some concrete parameters would be very useful here
for understanding. Could it be annotated with a description
of the transformations happening with a packet on different
stages of the processing (including restore example).
> +
> +The model requirements:
> +
> +Software application must initialize
> +rte_tunnel object with tunnel parameters before calling
> +rte_flow_tunnel_decap_set() & rte_flow_tunnel_match().
> +
> +PMD actions array obtained in rte_flow_tunnel_decap_set() must be
> +released by application with rte_flow_action_release() call.
> +Application can release the actionsfter TSR rule was created.
actionsfter ?
> +
> +PMD items array obtained with rte_flow_tunnel_match() must be released
> +by application with rte_flow_item_release() call. Application can
> +release the items after rule was created. However, if the application
> +needs to create additional TMR rule for the same tunnel it will need
> +to obtain PMD items again.
> +
> +Application cannot destroy rte_tunnel object before it releases all
> +PMD actions & PMD items referencing that tunnel.
> +
> Caveats
> -------
>
[snip]
> diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h
> index b0e4199192..1374b6e5a7 100644
> --- a/lib/librte_ethdev/rte_flow.h
> +++ b/lib/librte_ethdev/rte_flow.h
> @@ -3324,6 +3324,202 @@ int
> rte_flow_get_aged_flows(uint16_t port_id, void **contexts,
> uint32_t nb_contexts, struct rte_flow_error *error);
>
> +/* Tunnel information. */
> +__rte_experimental
> +struct rte_flow_ip_tunnel_key {
> + rte_be64_t tun_id; /**< Tunnel identification. */
What is it? Why is it big-endian? Why is it in IP tunnel key?
I.e. why is it not in a generic structure?
> + union {
> + struct {
> + rte_be32_t src_addr; /**< IPv4 source address. */
> + rte_be32_t dst_addr; /**< IPv4 destination address. */
> + } ipv4;
> + struct {
> + uint8_t src_addr[16]; /**< IPv6 source address. */
> + uint8_t dst_addr[16]; /**< IPv6 destination address. */
> + } ipv6;
> + } u;
> + bool is_ipv6; /**< True for valid IPv6 fields. Otherwise IPv4. */
> + rte_be16_t tun_flags; /**< Tunnel flags. */
Which flags? Where are these flags defined?
Why is it big-endian?
> + uint8_t tos; /**< TOS for IPv4, TC for IPv6. */
> + uint8_t ttl; /**< TTL for IPv4, HL for IPv6. */
If combine, I'd stick to IPv6 terminology since it is a bit
better (well-thought, especially current tendencies in
(re)naming in software).
> + rte_be32_t label; /**< Flow Label for IPv6. */
What about IPv6 tunnels with extension headers? How to extend?
> + rte_be16_t tp_src; /**< Tunnel port source. */
> + rte_be16_t tp_dst; /**< Tunnel port destination. */
What about IP-in-IP tunnels? Is it applicable?
> +};
> +
> +
> +/* Tunnel has a type and the key information. */
> +__rte_experimental
> +struct rte_flow_tunnel {
> + /**
> + * Tunnel type, for example RTE_FLOW_ITEM_TYPE_VXLAN,
> + * RTE_FLOW_ITEM_TYPE_NVGRE etc.
> + */
> + enum rte_flow_item_type type;
> + struct rte_flow_ip_tunnel_key tun_info; /**< Tunnel key info. */
How to extended for non-IP tunnels? MPLS?
Or tunnels with more protocols? E.g. MPLS-over-UDP?
> +};
> +
> +/**
> + * Indicate that the packet has a tunnel.
> + */
> +#define RTE_FLOW_RESTORE_INFO_TUNNEL (1ULL << 0)
> +
> +/**
> + * Indicate that the packet has a non decapsulated tunnel header.
> + */
> +#define RTE_FLOW_RESTORE_INFO_ENCAPSULATED (1ULL << 1)
> +
> +/**
> + * Indicate that the packet has a group_id.
> + */
> +#define RTE_FLOW_RESTORE_INFO_GROUP_ID (1ULL << 2)
> +
> +/**
> + * Restore information structure to communicate the current packet processing
> + * state when some of the processing pipeline is done in hardware and should
> + * continue in software.
> + */
> +__rte_experimental
> +struct rte_flow_restore_info {
> + /**
> + * Bitwise flags (RTE_FLOW_RESTORE_INFO_*) to indicate validation of
> + * other fields in struct rte_flow_restore_info.
> + */
> + uint64_t flags;
> + uint32_t group_id; /**< Group ID. */
What is the group ID here?
> + struct rte_flow_tunnel tunnel; /**< Tunnel information. */
> +};
> +
> +/**
> + * Allocate an array of actions to be used in rte_flow_create, to implement
> + * tunnel-decap-set for the given tunnel.
> + * Sample usage:
> + * actions vxlan_decap / tunnel-decap-set(tunnel properties) /
> + * jump group 0 / end
Why is jump to group used in example above? Is it mandatory?
> + *
> + * @param port_id
> + * Port identifier of Ethernet device.
> + * @param[in] tunnel
> + * Tunnel properties.
> + * @param[out] actions
> + * Array of actions to be allocated by the PMD. This array should be
> + * concatenated with the actions array provided to rte_flow_create.
Please, specify concatenation order explicitly.
> + * @param[out] num_of_actions
> + * Number of actions allocated.
> + * @param[out] error
> + * Perform verbose error reporting if not NULL. PMDs initialize this
> + * structure in case of error only.
> + *
> + * @return
> + * 0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +__rte_experimental
> +int
> +rte_flow_tunnel_decap_set(uint16_t port_id,
> + struct rte_flow_tunnel *tunnel,
> + struct rte_flow_action **actions,
> + uint32_t *num_of_actions,
Why does approach to specify actions differ here?
I.e. array of specified size vs END-terminated array?
Must the actions array be END-terminated here?
It must be a strong reason to do it and it should be
explained.
> + struct rte_flow_error *error);
> +
> +/**
> + * Allocate an array of items to be used in rte_flow_create, to implement
> + * tunnel-match for the given tunnel.
> + * Sample usage:
> + * pattern tunnel-match(tunnel properties) / outer-header-matches /
> + * inner-header-matches / end
> + *
> + * @param port_id
> + * Port identifier of Ethernet device.
> + * @param[in] tunnel
> + * Tunnel properties.
> + * @param[out] items
> + * Array of items to be allocated by the PMD. This array should be
> + * concatenated with the items array provided to rte_flow_create.
Concatenation order/rules should be described.
Since it is an output which entity does the concatenation.
Is it allowed to refine PMD rules in application
rule specification?
> + * @param[out] num_of_items
> + * Number of items allocated.
> + * @param[out] error
> + * Perform verbose error reporting if not NULL. PMDs initialize this
> + * structure in case of error only.
> + *
> + * @return
> + * 0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +__rte_experimental
> +int
> +rte_flow_tunnel_match(uint16_t port_id,
> + struct rte_flow_tunnel *tunnel,
> + struct rte_flow_item **items,
> + uint32_t *num_of_items,
Same as above for actions.
> + struct rte_flow_error *error);
> +
> +/**
> + * Populate the current packet processing state, if exists, for the given mbuf.
> + *
> + * @param port_id
> + * Port identifier of Ethernet device.
> + * @param[in] m
> + * Mbuf struct.
> + * @param[out] info
> + * Restore information. Upon success contains the HW state.
> + * @param[out] error
> + * Perform verbose error reporting if not NULL. PMDs initialize this
> + * structure in case of error only.
> + *
> + * @return
> + * 0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +__rte_experimental
> +int
> +rte_flow_tunnel_get_restore_info(uint16_t port_id,
> + struct rte_mbuf *m,
> + struct rte_flow_restore_info *info,
Is it suggesting to make a copy of the restore info for each
mbuf? It sounds very expensive. Could you share your thoughts
about it.
> + struct rte_flow_error *error);
> +
> +/**
> + * Release the action array as allocated by rte_flow_tunnel_decap_set.
> + *
> + * @param port_id
> + * Port identifier of Ethernet device.
> + * @param[in] actions
> + * Array of actions to be released.
> + * @param[in] num_of_actions
> + * Number of elements in actions array.
> + * @param[out] error
> + * Perform verbose error reporting if not NULL. PMDs initialize this
> + * structure in case of error only.
> + *
> + * @return
> + * 0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +__rte_experimental
> +int
> +rte_flow_tunnel_action_decap_release(uint16_t port_id,
> + struct rte_flow_action *actions,
> + uint32_t num_of_actions,
Same question as above for actions and items specification
approach.
> + struct rte_flow_error *error);
> +
> +/**
> + * Release the item array as allocated by rte_flow_tunnel_match.
> + *
> + * @param port_id
> + * Port identifier of Ethernet device.
> + * @param[in] items
> + * Array of items to be released.
> + * @param[in] num_of_items
> + * Number of elements in item array.
> + * @param[out] error
> + * Perform verbose error reporting if not NULL. PMDs initialize this
> + * structure in case of error only.
> + *
> + * @return
> + * 0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +__rte_experimental
> +int
> +rte_flow_tunnel_item_release(uint16_t port_id,
> + struct rte_flow_item *items,
> + uint32_t num_of_items,
Same question as above for actions and items specification
approach.
> + struct rte_flow_error *error);
> #ifdef __cplusplus
> }
> #endif
[snip]
Andrew.
(Right now it is hard to fully imagine how to deal with it.
And it looks like a shim to vendor-specific API. May be I'm
wrong. Hopefully the next version will have PMD implementation
example and it will shed a bit more light on it.)
More information about the dev
mailing list