[dpdk-dev] [PATCH v3 0/3] ethdev: add generic L2/L3 tunnel encapsulation actions

Ori Kam orika at mellanox.com
Thu Oct 11 15:55:24 CEST 2018


Hi Adrian,

> -----Original Message-----
> From: Adrien Mazarguil <adrien.mazarguil at 6wind.com>
> Sent: Thursday, October 11, 2018 4:12 PM
> To: Ori Kam <orika at mellanox.com>
> Cc: Andrew Rybchenko <arybchenko at solarflare.com>; Ferruh Yigit
> <ferruh.yigit at intel.com>; stephen at networkplumber.org; Declan Doherty
> <declan.doherty at intel.com>; dev at dpdk.org; Dekel Peled
> <dekelp at mellanox.com>; Thomas Monjalon <thomas at monjalon.net>; Nélio
> Laranjeiro <nelio.laranjeiro at 6wind.com>; Yongseok Koh
> <yskoh at mellanox.com>; Shahaf Shuler <shahafs at mellanox.com>
> Subject: Re: [PATCH v3 0/3] ethdev: add generic L2/L3 tunnel encapsulation
> actions
> 
> Hey Ori,
> 
> (removing most of the discussion, I'll only reply to the summary)
> 
> On Thu, Oct 11, 2018 at 08:48:05AM +0000, Ori Kam wrote:
> > Hi Adrian,
> >
> > Thanks for your comments please see my answer below and inline.
> >
> > Due to a very short time limit and the fact that we have more than
> > 4 patches that are based on this we need to close it fast.
> >
> > As I can see there are number of options:
> > * the old approach that neither of us like. And which mean that for
> >    every tunnel we create a new command.
> 
> Just to be sure, you mean that for each new tunnel *type* a new rte_flow
> action *type* must be added to DPDK right? Because the above reads like with
> your proposal, a single flow rule can manage any number of TEPs and flow
> rule creation for subsequent tunnels can be somehow bypassed.
> 
> One flow *rule* is still needed per TEP or did I miss something?
> 
Yes you are correct one rte_action per tunnel *type* must be added to the DPDK.
Sorry I'm not sure what TEP is.

> > * My proposed suggestion as is. Which is easier for at least number of
> application
> >    to implement and faster in most cases.
> > * My suggestion with different name, but then we need to find also a name
> >    for the decap and also a name for decap_l3. This approach is also
> problematic
> >    since we have 2 API that are doing the same thig. For example in test-pmd
> encap
> >    vxlan in which API shell we use?
> 
> Since you're doing this for MPLSoUDP and MPLSoGRE, you could leave
> VXLAN/NVGRE encap as is, especially since (AFAIK) there are series still
> relying on their API floating on the ML.
> 
I don't care about leaving also the old approach.
I even got Acked for removing it 😊
The only issue is the duplication API.
for example what will be the test-pmd encap vxlan ?
If you agree that test PMD will be converted to the new one,
I don't care about leaving the old one.


> > * Combine between my suggestion and the current one by replacing the raw
> >    buffer with list of items. Less code duplication easier on the validation ( that
> >    don't think we need to validate the encap data) but we loss insertion rate.
> 
> Already suggested in the past [1], this led to VXLAN and NVGRE encap as we
> know them.
> 
> > * your suggestion of  list of action that each action is one item. Main problem
> >    is speed.  Complexity form the application side and time to implement.
> 
> Speed matters a lot to me also (go figure) but I still doubt this approach
> is measurably faster. On the usability side, compared to one action per
> protocol layer which better fits the rte_flow model, I'm also not
> convinced.
> 
> If we put aside usability and performance on which we'll never agree, there
> is still one outstanding issue: the lack of mask. Users cannot tell which
> fields are relevant and to be kept as is, and which are not.
> 
😊

> How do applications know what blanks are filled in by HW? How do PMDs know
> what applications expect? There's a risk of sending incomplete or malformed
> packets depending on the implementation.
> 
> One may expect PMDs and HW to just "do the sensible thing" but some
> applications won't know that some fields are not offloaded and will be
> emitted with an unexpected value, while others will attempt to force a
> normally offloaded field to some specific value and expect it to leave
> unmodified. This cannot be predicted by the PMD, something is needed.
> 
> Assuming you add a mask pointer to address this, generic encap should be
> functionally complete but not all that different from what we currently have
> for VXLAN/NVGRE and from Declan's earlier proposal for generic encap [1];
> PMD must parse the buffer (using a proper packet parser with your approach),
> collect relevant fields, see if anything's unsupported while doing so before
> proceeding with the flow rule.
> 
There is no value for mask, since the NIC treats the packet as a normal packet
this means that the NIC operation is dependent on the offloads configured.
If for example the user configured VXLAN outer checksum then the NIC will modify this
field. In any case all values must be given since the PMD cannot guess and insert them.
From the NIC the packet after the adding the buffer must be a valid packet.

> Anyway, if you add that mask and rename these actions (since they should work
> with pretty much anything, not necessarily tunnels, i.e. lazy applications
> could ask HW to prepend missing Ethernet headers to pure IP traffic), they
> can make sense. How about labeling this "raw" encap/decap?
> 
I like it.

>  RTE_FLOW_ACTION_TYPE_RAW_(ENCAP|DECAP)
> 
>  struct rte_flow_action_raw_encap {
enum tunnel_type type; /**< VXLAN/ MPLSoGRE... / UNKNOWN. */
Then NICs can prase the packet even faster. And decide if it is supported.
Uint8_t remove_l2; /**< Marks if the L2 should be removed before the encap. */
>      uint8_t *data; /**< Encapsulation data. */
>      uint8_t *preserve; /**< Bit-mask of @p data to preserve on output. */
Remove the preserve. Since like I said there is no meaning to this. 
>      size_t size; /**< Size of @p data and @p preserve. */
>  };
> 
> I guess decap could use the same object. Since there is no way to define a
> sensible default behavior that works across multiple vendors when "preserve"
> is not provided, I think this field cannot be NULL.
> 
Like I said lets remove it.

> As for "L3 decap", well, can't one just provide a separate encap action?
> I mean a raw decap action, followed by another action doing raw encap of the
> intended L2? A separate set of actions seems unnecessary for that.

Agree we will have RTE_FLOW_ACTION_TYPE_RAW_DECAP
Which decapsulate the outer headers, and then we will give the encap command with 
the L2 header.

> 
> [1] "[PATCH v3 2/4] ethdev: Add tunnel encap/decap actions"
> 
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmails.d
> pdk.org%2Farchives%2Fdev%2F2018-
> April%2F095733.html&data=02%7C01%7Corika%40mellanox.com%7C1a1
> eb443d3de47d01fce08d62f7b3960%7Ca652971c7d2e4d9ba6a4d149256f461b%
> 7C0%7C0%7C636748603638728232&sdata=hqge3zUCH%2FqIuqPwGPeSkY
> miqaaOvc3xvJ4zaRz3JNg%3D&reserved=0
> 
> --
> Adrien Mazarguil
> 6WIND

Best,
Ori


More information about the dev mailing list