[dpdk-dev] [PATCH v4] ethdev: extend flow metadata

Andrew Rybchenko arybchenko at solarflare.com
Wed Oct 30 10:20:17 CET 2019


On 10/30/19 11:59 AM, Slava Ovsiienko wrote:
>> -----Original Message-----
>> From: Andrew Rybchenko <arybchenko at solarflare.com>
>> Sent: Wednesday, October 30, 2019 9:35
>> To: Slava Ovsiienko <viacheslavo at mellanox.com>; Thomas Monjalon
>> <thomas at monjalon.net>; olivier.matz at 6wind.com
>> Cc: dev at dpdk.org; Matan Azrad <matan at mellanox.com>; Ori Kam
>> <orika at mellanox.com>; Yongseok Koh <yskoh at mellanox.com>
>> Subject: Re: [dpdk-dev] [PATCH v4] ethdev: extend flow metadata
>>
>> @Olivier, please, take a look at the end of the mail.
>>
>> On 10/29/19 8:19 PM, Slava Ovsiienko wrote:
>>> Hi, Andrew
>>>
>>> Thank you for the review.
>>>
>>>> -----Original Message-----
>>>> From: Andrew Rybchenko <arybchenko at solarflare.com>
>>>> Sent: Tuesday, October 29, 2019 18:22
>>>> To: Slava Ovsiienko <viacheslavo at mellanox.com>; dev at dpdk.org
>>>> Cc: Thomas Monjalon <thomas at monjalon.net>; Matan Azrad
>>>> <matan at mellanox.com>; olivier.matz at 6wind.com; Ori Kam
>>>> <orika at mellanox.com>; Yongseok Koh <yskoh at mellanox.com>
>>>> Subject: Re: [dpdk-dev] [PATCH v4] ethdev: extend flow metadata
>>>>
>>>> On 10/27/19 9:40 PM, Viacheslav Ovsiienko wrote:
>>>>> Currently, metadata can be set on egress path via mbuf tx_metadata
>>>>> field with PKT_TX_METADATA flag and RTE_FLOW_ITEM_TYPE_META
>>>> matches metadata.
>>>>> This patch extends the metadata feature usability.
>>>>>
>>>>> 1) RTE_FLOW_ACTION_TYPE_SET_META
>>>>>
>>>>> When supporting multiple tables, Tx metadata can also be set by a
>>>>> rule and matched by another rule. This new action allows metadata to
>>>>> be set as a result of flow match.
>>>>>
>>>>> 2) Metadata on ingress
>>>>>
>>>>> There's also need to support metadata on ingress. Metadata can be
>>>>> set by SET_META action and matched by META item like Tx. The final
>>>>> value set by the action will be delivered to application via
>>>>> metadata dynamic field of mbuf which can be accessed by
>>>> RTE_FLOW_DYNF_METADATA().
>>>>> PKT_RX_DYNF_METADATA flag will be set along with the data.
>>>>>
>>>>> The mbuf dynamic field must be registered by calling
>>>>> rte_flow_dynf_metadata_register() prior to use SET_META action.
>>>>>
>>>>> The availability of dynamic mbuf metadata field can be checked with
>>>>> rte_flow_dynf_metadata_avail() routine.
>>>>>
>>>>> For loopback/hairpin packet, metadata set on Rx/Tx may or may not be
>>>>> propagated to the other path depending on hardware capability.
>>>>>
>>>>> Signed-off-by: Yongseok Koh <yskoh at mellanox.com>
>>>>> Signed-off-by: Viacheslav Ovsiienko <viacheslavo at mellanox.com>
>>>> Above explanations lack information about "meta" vs "mark" which may
>>>> be set on Rx as well and delivered in other mbuf field.
>>>> It should be explained by one more field is required and rules defined.
>>> There is some story about metadata features.
>>> Initially, there were proposed two metadata related actions:
>>>
>>> - RTE_FLOW_ACTION_TYPE_FLAG
>>> - RTE_FLOW_ACTION_TYPE_MARK
>>>
>>> These actions set the special flag in the packet metadata, MARK action
>>> stores some specified value in the metadata storage, and, on the
>>> packet receiving PMD puts the flag and value to the mbuf and
>>> applications can see the packet was threated inside flow engine
>>> according to the appropriate RTE flow(s). MARK and FLAG are like some
>>> kind of gateway to transfer some per-packet information from the flow
>> engine to the application via receiving datapath.
>>>   From the datapath point of view, the MARK and FLAG are related to the
>> receiving side only.
>>> It would useful to have the same gateway on the transmitting side and
>>> there was the feature of type RTE_FLOW_ITEM_TYPE_META was
>> proposed.
>>> The application can fill the field in mbuf and this value will be transferred to
>> some field in the packet metadata inside the flow engine.
>>> It did not matter whether these metadata fields are shared because of
>>> MARK and META items belonged to different domains (receiving and
>> transmitting) and could be vendor-specific.
>>> So far, so good, DPDK proposes some entities to control metadata
>>> inside the flow engine and gateways to exchange these values on a per-
>> packet basis via datapaths.
>>> As we can see, the MARK and META means are not symmetric, there is
>>> absent action which would allow us to set META value on the transmitting
>> path. So, the action of type:
>>> - RTE_FLOW_ACTION_TYPE_SET_META is proposed.
>>>
>>> The next, applications raise the new requirements for packet metadata.
>>> The flow engines are getting more complex, internal switches are
>>> introduced, multiple ports might be supported within the same flow
>>> engine namespace. From the DPDK points of view, it means the packets
>>> might be sent on one eth_dev port and received on the other one, and the
>> packet path inside the flow engine entirely belongs to the same hardware
>> device. The simplest example is SR-IOV with PF, VFs and the representors.
>>> And there is a brilliant opportunity to provide some out-of-band channel to
>> transfer some extra data
>>>    from one port to another one, besides the packet data itself.
>>>
>>>
>>>> Above explanations lack information about "meta" vs "mark" which may
>>>> be set on Rx as well and delivered in other mbuf field.
>>>> It should be explained by one more field is required and rules defined.
>>>> Otherwise we can endup in half PMDs supporting mark only, half PMDs
>>>> supporting meta only and applications in an interesting situation to
>>>> make a choice which one to use.
>>> There is no "mark" vs "meta". MARK and META means are kept for
>>> compatibility issues and legacy part works exactly as before. The
>>> trials (with flow_validate)  is supposed to check whether PMD supports
>>> MARK or META feature on appropriate domain. It depends on PMD
>>> implementation, configuration and underlaying HW/FW/kernel capabilities
>> and should be resolved in runtime.
>>
>> The trials a way, but very tricky way. My imagination draws me pictures how
>> an application code could look like in attempt to use either mark or meta for
>> Rx only and these pictures are not nice.
> Agree, trials is not the best way.
> For now there is no application trying to choose mark or meta, because these ones
> belonged to other domains, and extension is newly introduced.
> So, the applications using mark will continue use mark, the same is regarding meta.
> The new application definitely will ask for both mark and of them,
> we have the requirements from customers  - "give us as many through bits as you can".
> This new application just may refuse to work if metadata features are not detected,
> because relays on it strongly.
> BTW, trials are not so complex: rte_flow_validate(mark), rte_flow_validate_metadata()
> and that's it.

In assumption that pattern is supported and fate action used together 
with mark
is supported as well. Not that easy, but OK.

>> May be it will look acceptable when mark becomes a dynamic since usage of
>> either one or another dynamic field is definitely easier than usage of either
>> fixed or dynamic field.
> At least in PMD datapath it does not look very ugly.
>> May be dynamic field for mark at fixed offset should be introduced in the
>> release or the nearest future? It will allow to preserve ABI up to 20.11 and
>> provide future proof API.
>> The trick is to register dynamic meta field at fixed offset at start of a day to
>> be sure that it is guaranteed to succeed.
>> It sounds like it is a transition mechanism from fixed to dynamic fields.
>>
>> [snip]
>>
>>>>> diff --git a/lib/librte_ethdev/rte_flow.h
>>>>> b/lib/librte_ethdev/rte_flow.h index 4fee105..b821557 100644
>>>>> --- a/lib/librte_ethdev/rte_flow.h
>>>>> +++ b/lib/librte_ethdev/rte_flow.h
>>>>> @@ -28,6 +28,8 @@
>>>>>     #include <rte_byteorder.h>
>>>>>     #include <rte_esp.h>
>>>>>     #include <rte_higig.h>
>>>>> +#include <rte_mbuf.h>
>>>>> +#include <rte_mbuf_dyn.h>
>>>>>
>>>>>     #ifdef __cplusplus
>>>>>     extern "C" {
>>>>> @@ -418,7 +420,8 @@ enum rte_flow_item_type {
>>>>>     	/**
>>>>>     	 * [META]
>>>>>     	 *
>>>>> -	 * Matches a metadata value specified in mbuf metadata field.
>>>>> +	 * Matches a metadata value.
>>>>> +	 *
>>>>>     	 * See struct rte_flow_item_meta.
>>>>>     	 */
>>>>>     	RTE_FLOW_ITEM_TYPE_META,
>>>>> @@ -1263,9 +1266,17 @@ struct rte_flow_item_icmp6_nd_opt_tla_eth
>> {
>>>>>     #endif
>>>>>
>>>>>     /**
>>>>> - * RTE_FLOW_ITEM_TYPE_META.
>>>>> + * @warning
>>>>> + * @b EXPERIMENTAL: this structure may change without prior notice
>>>> Is it allowed to make experimental back?
>>> I think we should remove EXPERIMENTAL here. We do not introduce new
>>> feature, but just extend the apply area.
>> Agreed.
>>
>>>>>      *
>>>>> - * Matches a specified metadata value.
>>>>> + * RTE_FLOW_ITEM_TYPE_META
>>>>> + *
>>>>> + * Matches a specified metadata value. On egress, metadata can be
>>>>> + set either by
>>>>> + * mbuf tx_metadata field with PKT_TX_METADATA flag or
>>>>> + * RTE_FLOW_ACTION_TYPE_SET_META. On ingress,
>>>>> + RTE_FLOW_ACTION_TYPE_SET_META sets
>>>>> + * metadata for a packet and the metadata will be reported via mbuf
>>>>> + metadata
>>>>> + * dynamic field with PKT_RX_DYNF_METADATA flag. The dynamic
>> mbuf
>>>>> + field must be
>>>>> + * registered in advance by rte_flow_dynf_metadata_register().
>>>>>      */
>>>>>     struct rte_flow_item_meta {
>>>>>     	rte_be32_t data;
>>>> [snip]
>>>>
>>>>> @@ -2429,6 +2447,55 @@ struct rte_flow_action_set_mac {
>>>>>     	uint8_t mac_addr[RTE_ETHER_ADDR_LEN];
>>>>>     };
>>>>>
>>>>> +/**
>>>>> + * @warning
>>>>> + * @b EXPERIMENTAL: this structure may change without prior notice
>>>>> + *
>>>>> + * RTE_FLOW_ACTION_TYPE_SET_META
>>>>> + *
>>>>> + * Set metadata. Metadata set by mbuf tx_metadata field with
>>>>> + * PKT_TX_METADATA flag on egress will be overridden by this action.
>>>>> +On
>>>>> + * ingress, the metadata will be carried by mbuf metadata dynamic
>>>>> +field
>>>>> + * with PKT_RX_DYNF_METADATA flag if set.  The dynamic mbuf field
>>>>> +must be
>>>>> + * registered in advance by rte_flow_dynf_metadata_register().
>>>>> + *
>>>>> + * Altering partial bits is supported with mask. For bits which
>>>>> +have never
>>>>> + * been set, unpredictable value will be seen depending on driver
>>>>> + * implementation. For loopback/hairpin packet, metadata set on
>>>>> +Rx/Tx may
>>>>> + * or may not be propagated to the other path depending on HW
>>>> capability.
>>>>> + *
>>>>> + * RTE_FLOW_ITEM_TYPE_META matches metadata.
>>>>> + */
>>>>> +struct rte_flow_action_set_meta {
>>>>> +	rte_be32_t data;
>>>>> +	rte_be32_t mask;
>>>> As I understand tx_metadata is host endian. Just double-checking.
>>>> Is a new dynamic field host endian or big endian?
>>>> I definitely would like to see motivation in comments why data/mask
>>>> are big- endian here.
>>> metadata is opaque value, endianness does not matter, there are no
>>> some special motivations for choosing endiannes. rte_flow_item_meta()
>>> structure provides data with rte_be32_t type, so meta related action does
>> the same.
>>
>> Endianness of meta in mbuf and flow API should match and it must be
> Yes, and they match (for meta), both are rte_be32_t. OK, will emphasize it in docs.
>
>> documented. Endianness is important if a HW supports less bits since it
>> makes a hit for application to use LSB first if the bit space is sufficient.
>> mark is defined as host-endian (uint32_t) and I think meta should be the
>> same. Otherwise it complicates even more either mark or meta usage as
>> discussed above .
> mark is mark, meta is meta, these ones are not interrelated (despite they
> are becoming similar). And there is no choice between mark and meta.
> It is not obvious we should have the same endianness for both.
> There are some applications using this legacy features, so there might be compatibility
> issues either.

There are few reasons above to make meta host endian:
- match mark endianness (explained above, I still think that my reasons 
vaild)
- make it easier for application to use it without endianness conversion
   if a sequence number is used to allocate metas (similar to mark in OVS)
   and bit space is less than 32-bits

I see no single reason to keep it big-endian except a reason to keep it.

Since tx_metadata is going away and metadata was used for Tx only
before it is a good moment to fix it.

>> Yes, I think that rte_flow_item_meta should be fixed since both mark and
>> tx_metadata are host-endian.
>>
>> (it says nothing about HW interface which is vendor specific and vendor
>> PMDs should care about it)
> Handling this in PMD might introduce the extra endianness conversion in datapath,
> impacting performance. Not nice, IMO.

These concerns should not affect external interface since
it could be different for different HW vendors.



More information about the dev mailing list