[dpdk-dev] [dpdk-dev v2 1/2] ethdev: add new tunnel type for ecpri

Zhang, Qi Z qi.z.zhang at intel.com
Mon Jan 11 15:02:43 CET 2021



> -----Original Message-----
> From: Thomas Monjalon <thomas at monjalon.net>
> Sent: Monday, January 11, 2021 7:38 PM
> To: Yigit, Ferruh <ferruh.yigit at intel.com>; Guo, Jia <jia.guo at intel.com>; Zhang,
> Qi Z <qi.z.zhang at intel.com>
> Cc: Andrew Rybchenko <andrew.rybchenko at oktetlabs.ru>; Ori Kam
> <orika at nvidia.com>; Wu, Jingjing <jingjing.wu at intel.com>; Yang, Qiming
> <qiming.yang at intel.com>; Wang, Haiyue <haiyue.wang at intel.com>;
> dev at dpdk.org; Gregory Etelson <getelson at nvidia.com>;
> maxime.coquelin at redhat.com; jerinj at marvell.com;
> ajit.khaparde at broadcom.com
> Subject: Re: [dpdk-dev] [dpdk-dev v2 1/2] ethdev: add new tunnel type for ecpri
> 
> 11/01/2021 12:26, Zhang, Qi Z:
> > From: Thomas Monjalon <thomas at monjalon.net>
> > > 10/01/2021 11:46, Ori Kam:
> > > > From: Zhang, Qi Z <qi.z.zhang at intel.com>
> > > > > From: Thomas Monjalon <thomas at monjalon.net>
> > > > > > 08/01/2021 10:29, Andrew Rybchenko:
> > > > > > > On 1/8/21 11:57 AM, Ferruh Yigit wrote:
> > > > > > > > On 1/8/2021 1:41 AM, Zhang, Qi Z wrote:
> > > > > > > >> From: Thomas Monjalon <thomas at monjalon.net>
> > > > > > > >>> 07/01/2021 16:24, Zhang, Qi Z:
> > > > > > > >>>> From: Thomas Monjalon <thomas at monjalon.net>
> > > > > > > >>>>> 07/01/2021 13:47, Zhang, Qi Z:
> > > > > > > >>>>>> From: Thomas Monjalon <thomas at monjalon.net>
> > > > > > > >>>>>>> 07/01/2021 10:32, Guo, Jia:
> > > > > > > >>>>>>>> From: Thomas Monjalon <thomas at monjalon.net>
> > > > > > > >>>>>>>>> 24/12/2020 07:59, Jeff Guo:
> > > > > > > >>>>>>>>>> --- a/lib/librte_ethdev/rte_ethdev.h
> > > > > > > >>>>>>>>>> +++ b/lib/librte_ethdev/rte_ethdev.h
> > > > > > > >>>>>>>>>> @@ -1219,6 +1219,7 @@ enum rte_eth_tunnel_type {
> > > > > > > >>>>>>>>>>       RTE_TUNNEL_TYPE_IP_IN_GRE,
> > > > > > > >>>>>>>>>>       RTE_L2_TUNNEL_TYPE_E_TAG,
> > > > > > > >>>>>>>>>>       RTE_TUNNEL_TYPE_VXLAN_GPE,
> > > > > > > >>>>>>>>>> +    RTE_TUNNEL_TYPE_ECPRI,
> > > > > > > >>>>>>>>>>       RTE_TUNNEL_TYPE_MAX,
> > > > > > > >>>>>>>>>>   };
> > > > > > > >>>>>>>>>
> > > > > > > >>>>>>>>> We tried to remove all these legacy API in DPDK 20.11.
> > > > > > > >>>>>>>>> Andrew decided to not remove this one because it
> > > > > > > >>>>>>>>> is not yet completely replaced by rte_flow in all drivers.
> > > > > > > >>>>>>>>> However, I am against continuing to update this API.
> > > > > > > >>>>>>>>> The opposite work should be done: migrate to rte_flow.
> > > > > > > >>>>>>>>
> > > > > > > >>>>>>>> Agree but seems that the legacy api and driver
> > > > > > > >>>>>>>> legacy implementation still keep in this release,
> > > > > > > >>>>>>>> and there is no a general way to replace the legacy by
> rte_flow right now.
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> I think rte_flow is a complete replacement with more
> features.
> > > > > > > >>>>>>
> > > > > > > >>>>>> Thomas, I may not agree with this.
> > > > > > > >>>>>>
> > > > > > > >>>>>> Actually the "enum rte_eth_tunnel_type" is used by
> > > > > > > >>>>>> rte_eth_dev_udp_tunnel_port_add A packet with
> > > > > > > >>>>>> specific dst udp port will be recognized as a
> > > > > > > >>>>>> specific tunnel packet type
> > > (e.g.
> > > > > > > >>>>>> vxlan, vxlan-gpe,
> > > > > > > >>>>> ecpri...) In Intel NIC, the API actually changes the
> > > > > > > >>>>> configuration of the packet parser in HW but not add a
> > > > > > > >>>>> filter rule and I guess all other devices may enable
> > > > > > > >>>>> it in a similar
> > > way.
> > > > > > > >>>>>> so naturally it should be a device (port) level
> > > > > > > >>>>>> configuration but not a rte_flow
> > > > > > > >>>>> rule for match, encap, decap...
> > > > > > > >>>>>
> > > > > > > >>>>> I don't understand how it helps to identify an UDP
> > > > > > > >>>>> port if there is no rule for this tunnel.
> > > > > > > >>>>> What is the usage?
> > > > > > > >>>>
> > > > > > > >>>> Yes, in general It is a rule, it matches a udp packet's
> > > > > > > >>>> dst port and the action is
> > > > > > > >>> "now the packet is identified as vxlan packet" then all
> > > > > > > >>> other rte_flow rules that match for a vlxan as pattern
> > > > > > > >>> will take
> > > effect.
> > > > > > > >>> but somehow, I think they are not rules in the same
> > > > > > > >>> domain, just like we have dedicate API for mac/vlan
> > > > > > > >>> filter, we'd better have a dedicate API for this also. (
> > > > > > > >>> RFC for Vxlan explains why we need this.
> > > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%
> > > > > 2Fto
> > > > > ols.ietf
> > > > > .org%2Fhtml%2Frfc7348&data=04%7C01%7Corika%40nvidia.com
> %7C
> > > 46b2
> > > > >
> > >
> d8f48944422f0d9008d8b43a2293%7C43083d15727340c1b7db39efd9ccc17a
> > > %7
> > > > >
> > >
> C0%7C0%7C637457509081543237%7CUnknown%7CTWFpbGZsb3d8eyJWIjoi
> > > MC
> > > > >
> > >
> 4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&a
> > > > >
> > >
> mp;sdata=RYWFMjuxkcUZ982kK2s44tBAjf%2FTkDyaa7VEybCtxOo%3D&r
> > > es
> > > > > erved=0).
> > > > > > > >>>>
> > > > > > > >>>> "Destination Port: IANA has assigned the value 4789 for
> > > > > > > >>>> the VXLAN UDP port, and this value SHOULD be used by
> > > > > > > >>>> default as the destination UDP port.  Some early
> > > > > > > >>>> implementations of VXLAN have used other values for the
> > > > > > > >>>> destination port.  To enable interoperability with
> > > > > > > >>>> these implementations, the destination port
> > > > > > SHOULD be configurable."
> > > > > > > >>>
> > > > > > > >>> Yes the port number is free.
> > > > > > > >>> But isn't it more natural to specify this port number as
> > > > > > > >>> part of the rte_flow rule?
> > > > > > > >>
> > > > > > > >> I think if we have a rte_flow action type that can be
> > > > > > > >> used to set a packet's tunnel type xxx, like below #flow
> > > > > > > >> create eth/ipv4/udp port is 4789/... action
> > > > > > > >> set_tunnel_type VxLAN / end then we may replace it with
> > > > > > > >> rte_flow, but I'm not sure if it's necessary, please share if you have
> a better idea.
> > > > > >
> > > > > > Of course we can specify the UDP port in rte_flow rule.
> > > > > > Please check rte_flow_item_udp.
> > > > > > That's a basic of rte_flow.
> > > > >
> > > > > Its not about the pattern match, it's about the action, what we
> > > > > need is a rte_flow action to "define a packet's tunnel type", but we don't
> have.
> > >
> > > A packet type alone is meaningless.
> > > It is always associated to an action, this is what rte_flow does.
> >
> > As I mentioned in previous, this is a device (port) level configuration, so it can
> only be configured by a PF driver or a privileged VF base on our security model.
> > A typical usage in a NFV environment could be:
> >
> > 1. A privileged VF (e.g. ice_dcf PMD) use rte_eth_dev_udp_tunnel_port_add
> to create tunnel port for eCPRI, them this will impact on all VFs in the same PF.
> > 2. A normal VF driver can create rte_flow rule that match specific patch for
> queue steering or apply RSS for eCPRI packets, but it DON'T have the
> permission to define the tunnel port.
> 
> Whaooh! A normal Intel VF is not allowed to match the tunnel it wants if not
> enabled by a priviledged VF?

> I would say it is a HW design flaw, but that's not the question.	

Why you think this is a design flaw? in real case, is it a typical requirement that different VF need different tunnel port for eCPRI (or VxLan) on the same PF? 
I believe it's not necessary to make it as a per VF resource in most cases, and I will be surprise if a driver that allow any VF to change the share resource without any privilege control.

Btw I guess mlx NIC has more flexible way to handle ecpri tunnel, just curious how it works, what's the expected result of below rules?

1. create flow eth / ipv4 / udp dst is 1234 / ecpri msgtype is 0 / ... to queue 0
2. create flow eth / ipv4 / udp dst is 5678 / ecrpi msgtype is 1 / ... to queue 1.

So both 1234 and 5678 will be regarded as an ECPRI packet? Or only the first one will work?
does dst udp port is always needed if an ecpri pattern is involved?


> 
> > So it does not help to have a rte_flow that match dst udp port as
> > tunnel while have an action in the same rule.
> 
> Now I understand you are truly looking for a device configuration.
> But as it looks more as a HW limitation, I don't think it should be part of ethdev.
> The fact is that it is already part of ethdev...
> 
> I don't know what to say. I need to digest how Intel limitation is "still" trying to
> drive some parts of the "generic" API.
> 
> 
> > > > > #flow create eth/ipv4/udp port is 4789/... action
> > > > > set_tunnel_type VxLAN
> > > > >
> > > > > I think rte_eth_dev_udp_tunnel_port_add does this job well
> > > > > already, if we plan to move it to rte_flow, at least we need a replacement
> solution.
> > >
> > > The documentation does not say why it is useful.
> > > With rte_flow you don't need it because a flow is specified with its action.
> > >
> > >
> > > > Let me see if I understand it correctly.
> > > > In your case, the issue is that you need to configure the HW to
> > > > parse the
> > > packet correctly right?
> > > > It is not about the matching it is about the configuration of the
> > > > HW, you wish to tell the HW that the packet should be parsed by
> > > > different means
> > > correct?
> > > >
> > > > If this is the case it sounds to me that you should use rte_flow
> > > > and if the user adds the following rule:
> > > > #flow create pattern eth / ivp4 / udp port is 4789 / .. action .....
> > > > You simply need to configure your HW to support the ecpri configuration.
> > > >
> > > >
> > > > > >
> > > > > >
> > > > > > > > Isn't this more a device configuration than filtering, not
> > > > > > > > sure about using rte_flow for this.
> > > > > > >
> > > > > > > +1
> > > > > >
> > > > > > A device configuration? No, setting an UDP port is a stack
> configuration.
> > > > > >
> > > > > >
> > > > > > > >> BTW, are we going to move all other filter like mac ,
> > > > > > > >> VLAN filter/strip/insert into rte_flow finally?
> > > > > >
> > > > > > Yes I think it should be the direction.
> > > > > > All of this can be achieved with rte_flow.
> > > > > >
> > > > > >
> > > > > > > >> if that's the plan, though I don't have much inputs for
> > > > > > > >> this right now, but I think we may not need to prevent
> > > > > > > >> new features be added based on current API if it does not
> > > > > > > >> introduce more complexity and not break anything.
> > > > > >
> > > > > > If we continue updating old API, we are just forking ourself:
> > > > > > having 2 APIs for the same feature is a non-sense.
> > > > > > We must remove APIs which are superseded by rte_flow.
> 
> 



More information about the dev mailing list