[dpdk-dev] [RFC 1/3] ethdev: extend flow metadata

Andrew Rybchenko arybchenko at solarflare.com
Wed Jun 19 11:05:50 CEST 2019

On 11.06.2019 3:06, Yongseok Koh wrote:
> On Mon, Jun 10, 2019 at 10:20:28AM +0300, Andrew Rybchenko wrote:
>> On 6/10/19 6:19 AM, Wang, Haiyue wrote:
>>>> -----Original Message-----
>>>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Andrew Rybchenko
>>>> Sent: Sunday, June 9, 2019 22:24
>>>> To: Yongseok Koh <yskoh at mellanox.com>; shahafs at mellanox.com; thomas at monjalon.net; Yigit, Ferruh
>>>> <ferruh.yigit at intel.com>; adrien.mazarguil at 6wind.com; olivier.matz at 6wind.com
>>>> Cc: dev at dpdk.org
>>>> Subject: Re: [dpdk-dev] [RFC 1/3] ethdev: extend flow metadata
>>>> On 6/4/19 12:32 AM, Yongseok Koh wrote:
>>>>> Currently, metadata can be set on egress path via mbuf tx_meatadata field
>>>>> with PKT_TX_METADATA flag and RTE_FLOW_ITEM_TYPE_RX_META matches metadata.
>>>>> This patch extends the usability.
>>>>> When supporting multiple tables, Tx metadata can also be set by a rule and
>>>>> matched by another rule. This new action allows metadata to be set as a
>>>>> result of flow match.
>>>>> 2) Metadata on ingress
>>>>> There's also need to support metadata on packet Rx. Metadata can be set by
>>>>> SET_META action and matched by META item like Tx. The final value set by
>>>>> the action will be delivered to application via mbuf metadata field with
>>>>> PKT_RX_METADATA ol_flag.
>>>>> For this purpose, mbuf->tx_metadata is moved as a separate new field and
>>>>> renamed to 'metadata' to support both Rx and Tx metadata.
>>>>> For loopback/hairpin packet, metadata set on Rx/Tx may or may not be
>>>>> propagated to the other path depending on HW capability.
>>>>> Signed-off-by: Yongseok Koh <yskoh at mellanox.com>
>>>> There is a mark on Rx which is delivered to application in hash.fdir.hi.
>>>> Why do we need one more 32-bit value set by NIC and delivered to
>>>> application?
>>>> What is the difference between MARK and META on Rx?
>>>> When application should use MARK and when META?
>>>> Is there cases when both could be necessary?
>>> In my understanding, MARK is FDIR related thing, META seems to be NIC
>>> specific. And we also need this kind of specific data field to export
>>> NIC's data to application.
>> I think it is better to avoid NIC vendor-specifics in motivation. I
>> understand
>> that it exists for you, but I think it is better to look at it from RTE flow
>> API
>> definition point of view: both are 32-bit (except endianess and I'm not sure
>> that I understand why meta is defined as big-endian since it is not a value
>> coming from or going to network in a packet, I'm sorry that I've missed it
>> on review that time), both may be set using action on Rx, both may be
>> matched using pattern item.
> Yes, MARK and META has the same characteristic on Rx path. Let me clarify why I
> picked this way.
> What if device has more bits to deliver to host? Currently, only 32-bit data can
> be delivered to user via MARK ID. Now we have more requests from users (OVS
> connection tracking) that want to see more information generated during flow
> match from the device. Let's say it is 64 bits and it may contain intermediate
> match results to keep track of multi-table match, to keep address of callback
> function to call, or so. I thought about extending the current MARK to 64-bit
> but I knew that we couldn't make more room in the first cacheline of mbuf where
> every vendor has their critical interest. And the FDIR has been there for a long
> time and has lots of use-cases in DPDK (not easy to break). This is why I'm
> suggesting to obtain another 32 bits in the second cacheline of the structure.
> Also, I thought about other scenario as well. Even though we have MARK item
> introduced lately, it isn't used by any PMD at all for now, meaning it might not
> be match-able on a certain device. What if there are two types registers on Rx
> and one is match-able and the other isn't? PMD can use META for match-able
> register while MARK is used for non-match-able register without supporting
> item match. If MARK simply becomes 64-bit just because it has the same
> characteristic in terms of rte_flow, only one of such registers can be used as
> we can't say only part of bits are match-able on the item. Instead of extending
> the MARK to 64 bits, I thought it would be better to give more flexibility by
> bundling it with Tx metadata, which can set by mbuf.

Thanks a lot for explanations. If the way is finally approved, priority
among META and MARK should be defined. I.e. if only one is supported
or only one may be match, it must be MARK. Otherwise, it will be too
complicated for applications to find out which one to use.
Is there any limitations on usage of MARK or META in transfer rules?
There is a lot of work on documentation in this area to make it usable.

> The actual issue we have may be how we can make it scalable? What if there's
> more need to carry more data from device? Well, IIRC, Olivier once suggested to
> put a pointer (like mbuf->userdata) to extend mbuf struct beyond two cachelines.
> But we still have some space left at the end.
>>>> Moreover, the third patch adds 32-bit tags which are not delivered to
>>>> application. May be META/MARK should be simply a kind of TAG (e.g. with
>>>> index 0 or marked using additional attribute) which is delivered to
>>>> application?
> Yes, TAG is a kind of transient device-internal data which isn't delivered to
> host. It would be a design choice. I could define all these kinds as an array of
> MARK IDs having different attributes - some are exportable/match-able and others
> are not, which sounds quite complex. As rte_flow doesn't have a direct way to
> check device capability (user has to call a series of validate functions
> instead), I thought defining TAG would be better.
>>>> (It is either API breakage (if tx_metadata is removed) or ABI breakage
>>>> if metadata and tx_metadata will share new location after shinfo).
> Fortunately, mlx5 is the only entity which uses tx_metadata so far.

As I understand it is still breakage.

>>> Make use of udata64 to export NIC metadata to application ?
>>> 	RTE_STD_C11
>>> 	union {
>>> 		void *userdata;   /**< Can be used for external metadata */
>>> 		uint64_t udata64; /**< Allow 8-byte userdata on 32-bit */
>>> 		uint64_t rx_metadata;
>>> 	};
>> As I understand it does not work for Tx and I'm not sure that it is
>> a good idea to have different locations for Tx and Rx.
>> RFC adds it at the end of mbuf, but it was rejected before since
>> it eats space in mbuf structure (CC Konstantin).
> Yep, I was in the discussion. IIRC, the reason wasn't because it ate space but
> because it could recycle unused space on Tx path. We still have 16B after shinfo
> and I'm not sure how many bytes we should reserve. I think reserving space for
> one pointer would be fine.

I have no strong opinion.


> Thanks,
> Yongseok
>> There is a long discussion on the topic before [1], [2], [3] and [4].
>> Andrew.
>> [1] https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmails.dpdk.org%2Farchives%2Fdev%2F2018-August%2F109660.html&data=02%7C01%7Cyskoh%40mellanox.com%7C6c81080cb68340d2128c08d6ed742746%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636957480475389496&sdata=EFHyECwg0NBRvyrouZqWD6x0WD4xAsqsfYQGrEvS%2BEg%3D&reserved=0
>> [2] https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmails.dpdk.org%2Farchives%2Fdev%2F2018-September%2F111771.html&data=02%7C01%7Cyskoh%40mellanox.com%7C6c81080cb68340d2128c08d6ed742746%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636957480475389496&sdata=M8cQSmQhWKlUVKvFgux0T0TWAnJhPxdO4Dn3fkReTyg%3D&reserved=0
>> [3] https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmails.dpdk.org%2Farchives%2Fdev%2F2018-October%2F114559.html&data=02%7C01%7Cyskoh%40mellanox.com%7C6c81080cb68340d2128c08d6ed742746%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636957480475394493&sdata=ZVm5god7n1i07OCc5Z7B%2BBUpnjXCraJXU0FeF5KkCRc%3D&reserved=0
>> [4] https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmails.dpdk.org%2Farchives%2Fdev%2F2018-October%2F115469.html&data=02%7C01%7Cyskoh%40mellanox.com%7C6c81080cb68340d2128c08d6ed742746%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636957480475394493&sdata=XgKV%2B331Vqsq9Ns40giI1nAwscVxBxqb78vB1BY8z%2Bc%3D&reserved=0

More information about the dev mailing list