[dpdk-dev] [RFC 1/3] ethdev: extend flow metadata

Yongseok Koh yskoh at mellanox.com
Tue Jun 11 02:06:43 CEST 2019

On Mon, Jun 10, 2019 at 10:20:28AM +0300, Andrew Rybchenko wrote:
> On 6/10/19 6:19 AM, Wang, Haiyue wrote:
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Andrew Rybchenko
> > > Sent: Sunday, June 9, 2019 22:24
> > > To: Yongseok Koh <yskoh at mellanox.com>; shahafs at mellanox.com; thomas at monjalon.net; Yigit, Ferruh
> > > <ferruh.yigit at intel.com>; adrien.mazarguil at 6wind.com; olivier.matz at 6wind.com
> > > Cc: dev at dpdk.org
> > > Subject: Re: [dpdk-dev] [RFC 1/3] ethdev: extend flow metadata
> > > 
> > > On 6/4/19 12:32 AM, Yongseok Koh wrote:
> > > > Currently, metadata can be set on egress path via mbuf tx_meatadata field
> > > > with PKT_TX_METADATA flag and RTE_FLOW_ITEM_TYPE_RX_META matches metadata.
> > > > 
> > > > This patch extends the usability.
> > > > 
> > > > 
> > > > When supporting multiple tables, Tx metadata can also be set by a rule and
> > > > matched by another rule. This new action allows metadata to be set as a
> > > > result of flow match.
> > > > 
> > > > 2) Metadata on ingress
> > > > 
> > > > There's also need to support metadata on packet Rx. Metadata can be set by
> > > > SET_META action and matched by META item like Tx. The final value set by
> > > > the action will be delivered to application via mbuf metadata field with
> > > > PKT_RX_METADATA ol_flag.
> > > > 
> > > > For this purpose, mbuf->tx_metadata is moved as a separate new field and
> > > > renamed to 'metadata' to support both Rx and Tx metadata.
> > > > 
> > > > For loopback/hairpin packet, metadata set on Rx/Tx may or may not be
> > > > propagated to the other path depending on HW capability.
> > > > 
> > > > Signed-off-by: Yongseok Koh <yskoh at mellanox.com>
> > > There is a mark on Rx which is delivered to application in hash.fdir.hi.
> > > Why do we need one more 32-bit value set by NIC and delivered to
> > > application?
> > > What is the difference between MARK and META on Rx?
> > > When application should use MARK and when META?
> > > Is there cases when both could be necessary?
> > > 
> > In my understanding, MARK is FDIR related thing, META seems to be NIC
> > specific. And we also need this kind of specific data field to export
> > NIC's data to application.
> I think it is better to avoid NIC vendor-specifics in motivation. I
> understand
> that it exists for you, but I think it is better to look at it from RTE flow
> definition point of view: both are 32-bit (except endianess and I'm not sure
> that I understand why meta is defined as big-endian since it is not a value
> coming from or going to network in a packet, I'm sorry that I've missed it
> on review that time), both may be set using action on Rx, both may be
> matched using pattern item.

Yes, MARK and META has the same characteristic on Rx path. Let me clarify why I
picked this way.

What if device has more bits to deliver to host? Currently, only 32-bit data can
be delivered to user via MARK ID. Now we have more requests from users (OVS
connection tracking) that want to see more information generated during flow
match from the device. Let's say it is 64 bits and it may contain intermediate
match results to keep track of multi-table match, to keep address of callback
function to call, or so. I thought about extending the current MARK to 64-bit
but I knew that we couldn't make more room in the first cacheline of mbuf where
every vendor has their critical interest. And the FDIR has been there for a long
time and has lots of use-cases in DPDK (not easy to break). This is why I'm
suggesting to obtain another 32 bits in the second cacheline of the structure.

Also, I thought about other scenario as well. Even though we have MARK item
introduced lately, it isn't used by any PMD at all for now, meaning it might not
be match-able on a certain device. What if there are two types registers on Rx
and one is match-able and the other isn't? PMD can use META for match-able
register while MARK is used for non-match-able register without supporting
item match. If MARK simply becomes 64-bit just because it has the same
characteristic in terms of rte_flow, only one of such registers can be used as
we can't say only part of bits are match-able on the item. Instead of extending
the MARK to 64 bits, I thought it would be better to give more flexibility by
bundling it with Tx metadata, which can set by mbuf.

The actual issue we have may be how we can make it scalable? What if there's
more need to carry more data from device? Well, IIRC, Olivier once suggested to
put a pointer (like mbuf->userdata) to extend mbuf struct beyond two cachelines.
But we still have some space left at the end.

> > > Moreover, the third patch adds 32-bit tags which are not delivered to
> > > application. May be META/MARK should be simply a kind of TAG (e.g. with
> > > index 0 or marked using additional attribute) which is delivered to
> > > application?

Yes, TAG is a kind of transient device-internal data which isn't delivered to
host. It would be a design choice. I could define all these kinds as an array of
MARK IDs having different attributes - some are exportable/match-able and others
are not, which sounds quite complex. As rte_flow doesn't have a direct way to
check device capability (user has to call a series of validate functions
instead), I thought defining TAG would be better.

> > > (It is either API breakage (if tx_metadata is removed) or ABI breakage
> > > if metadata and tx_metadata will share new location after shinfo).

Fortunately, mlx5 is the only entity which uses tx_metadata so far.

> > Make use of udata64 to export NIC metadata to application ?
> > 	RTE_STD_C11
> > 	union {
> > 		void *userdata;   /**< Can be used for external metadata */
> > 		uint64_t udata64; /**< Allow 8-byte userdata on 32-bit */
> > 		uint64_t rx_metadata;
> > 	};
> As I understand it does not work for Tx and I'm not sure that it is
> a good idea to have different locations for Tx and Rx.
> RFC adds it at the end of mbuf, but it was rejected before since
> it eats space in mbuf structure (CC Konstantin).

Yep, I was in the discussion. IIRC, the reason wasn't because it ate space but
because it could recycle unused space on Tx path. We still have 16B after shinfo
and I'm not sure how many bytes we should reserve. I think reserving space for
one pointer would be fine.


> There is a long discussion on the topic before [1], [2], [3] and [4].
> Andrew.
> [1] https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmails.dpdk.org%2Farchives%2Fdev%2F2018-August%2F109660.html&data=02%7C01%7Cyskoh%40mellanox.com%7C6c81080cb68340d2128c08d6ed742746%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636957480475389496&sdata=EFHyECwg0NBRvyrouZqWD6x0WD4xAsqsfYQGrEvS%2BEg%3D&reserved=0
> [2] https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmails.dpdk.org%2Farchives%2Fdev%2F2018-September%2F111771.html&data=02%7C01%7Cyskoh%40mellanox.com%7C6c81080cb68340d2128c08d6ed742746%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636957480475389496&sdata=M8cQSmQhWKlUVKvFgux0T0TWAnJhPxdO4Dn3fkReTyg%3D&reserved=0
> [3] https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmails.dpdk.org%2Farchives%2Fdev%2F2018-October%2F114559.html&data=02%7C01%7Cyskoh%40mellanox.com%7C6c81080cb68340d2128c08d6ed742746%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636957480475394493&sdata=ZVm5god7n1i07OCc5Z7B%2BBUpnjXCraJXU0FeF5KkCRc%3D&reserved=0
> [4] https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmails.dpdk.org%2Farchives%2Fdev%2F2018-October%2F115469.html&data=02%7C01%7Cyskoh%40mellanox.com%7C6c81080cb68340d2128c08d6ed742746%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636957480475394493&sdata=XgKV%2B331Vqsq9Ns40giI1nAwscVxBxqb78vB1BY8z%2Bc%3D&reserved=0

More information about the dev mailing list