[dpdk-dev] [PATCH v3 0/3] enhance TX checksum command and csum forwarding engine

Ananyev, Konstantin konstantin.ananyev at intel.com
Tue Jan 20 18:23:48 CET 2015


Hi Olivier,

> -----Original Message-----
> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
> Sent: Tuesday, January 20, 2015 12:39 PM
> To: Ananyev, Konstantin; Liu, Jijiang
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v3 0/3] enhance TX checksum command and csum forwarding engine
> 
> Hi,
> 
> On 01/20/2015 02:12 AM, Ananyev, Konstantin wrote:
> >>>> I think a good definition would
> >>>>     be:
> >>>>
> >>>>     Packet is IPv4. This flag must be set when using any offload
> >>>>     feature (TSO, L3 or L4 checksum) to tell the NIC that the packet
> >>>>     is an IPv4 packet.
> >>>>
> >>>>     That's why I added PKT_TX_IPV4 in the examples.
> >>>
> >>> I suppose we discussed it several times: both ways are possible.
> >>>  From PMD perspective - treating PKT_TX_IPV4 and PKT_TX_IP_CSUM
> >>> As mutually exclusive seems a bit more plausible.
> >>>  From the upper layer - my understanding, that it is doesn't really matter.
> >>> I thought we had an agreement about it in 1.8, no?
> >>
> >> Indeed, this was already discussed, but there was a lot of pressure
> >> for 1.8.0 to push something, even not perfect. The fog around comments
> >> shows that the API was not very clearly defined for 1.8.0. If you read
> >> the comments of the API, it is impossible to understand when the
> >> PKT_TX_IPV4 or PKT_TX_IP_CSUM flags must be set. I would even say
> >> more: the only place where the comments bring a valuable information
> >> (L4 checksum and TSO) describe the case where PKT_TX_IPV4 and
> >> PKT_TX_IP_CSUM are not exclusive...
> >>
> >> So I will fix that in my coming patch series. Just for information,
> >> I'm pretty sure that having PKT_TX_IPV4 and PKT_TX_IP_CSUM as not
> >> exclusive flag would not require any change anywhere in the PMDs (even
> >> in i40e).
> >
> > Right now - no.
> > Though as I said from PMD perspective having them exclusive is a bit preferable.
> > Again, I don't see any big difference from upper layer code.
> 
> Sure, it does not make a big difference in terms of code. But
> in terms of API, the naming of the flag is coherent to what it is
> used for. And it's easier to find a simple definition, like:
> 
>   * Packet is IPv4. This flag must be set when using any offload feature
>   * (TSO, L3 or L4 checksum) to tell the NIC that the packet is an IPv4
>   * packet.

Ok, and what's wrong with:
"Packet is IPv4. This flag must be set when using any offload feature
(TSO, L3 or L4 checksum) to tell the NIC that the packet is an IPv4
packet and no HW offload for IPv4 header checksum calculation is required"
?

> 
> >> On the contrary, making them exclusive would require to
> >> change the ixgbe TSO code because we check.
> >
> > Hmm, so you are saying there is a bug somewhere  in ixbe_rxtx.c?
> > What particular place you are talking about?
> 
> Sorry, I spoke too fast. In TSO code, we check PKT_TX_IP_CKSUM (and not
> PKT_TX_IPV4 as I thought), so it would work for both methods without
> patching the code.
> 
> In this case, it means that both approach would not require to
> modify the code.

Ok.

> 
> >>>>     *Problem 3*: without using the word "fortville", it is difficult
> >>>>     to understand the goal of the flag PKT_TX_UDP_TUNNEL_PKT. Indeed,
> >>>>     once PKT_TX_OUTER_IPV4/6 is set, it looks obvious that it's a
> >>>>     tunnel packet. I suggest to remove the PKT_TX_UDP_TUNNEL_PKT
> >>>>     flag. In linux, the driver doesn't care about the tunnel type,
> >>>>     it always set I40E_TXD_CTX_UDP_TUNNELING for all encapsulations [6].
> >>>
> >>> It might be obvious that it is a tunnel packet from PKT_TX_OUTER_* is set,
> >>> but it is not obvious what type of tunnelling it would be.
> >>> FVL HW supports HW TX offloads for different type of tunnelling and
> >>> requires that SW provide information about tunnelling type.
> >>>  From i40e datasheet:
> >>> L4TUNT L4 Tunneling Type (Teredo / GRE header / VXLAN header) indication:
> >>> 00b - No UDP / GRE tunneling (field must be set to zero if EIPT equals to zero)
> >>> 01b - UDP tunneling header (any UDP tunneling, VXLAN and Geneve).
> >>> 10b - GRE tunneling header
> >>> As we do plan to support other than UDP tunnelling types, I suppose we'll need to keep
> >>> PKT_TX_UDP_TUNNEL_PKT flag.
> >>
> >> As I've said: in linux, the driver doesn't care about the tunnel type,
> >> it always set I40E_TXD_CTX_UDP_TUNNELING for all encapsulations.
> >
> > Ok, and why it should be our problem?
> > We have a lot of things done in a different manner then linux/freebsd kernel drivers,
> > Why now it became a problem?
> 
> If linux doesn't need an equivalent flag for doing the same thing,
> it probably means we don't need it either.

Probably yes .... Or probably not.
Why do we need to guess what was the intention of guys who wrote that part of linux driver?
BTW, the macro for GRE is here:
find lib/librte_pmd_i40e/i40e -type f | xargs grep TUN | grep TXD
lib/librte_pmd_i40e/i40e/i40e_type.h:#define I40E_TXD_CTX_UDP_TUNNELING (0x1ULL << I40E_TXD_CTX_QW0_NATT_SHIFT)
lib/librte_pmd_i40e/i40e/i40e_type.h:#define I40E_TXD_CTX_GRE_TUNNELING (0x2ULL << I40E_TXD_CTX_QW0_NATT_SHIFT)

Though it not used (yet?) by some reason. 

> 
> In a performance-oriented software like dpdk, having a flag that we
> don't know what the hardware does with, that is not needed in other
> drivers of the same harware, that makes the API harder to understand
> could be a problem.

Here is a HW spec, that says what values have to be setup for L4TUNT.
Yes, I am not sure why they need to distinguish between VXLAN/GRE tunnelling.
Though, I suppose that wouldn't eliminate the requirement.
But for same, there is no good explanation why FVL HW need to know that it is IPv4 or IPv6 packet,
in the case when only L4 checksum offload is required (IIPT field).
Niantic, as I remember, is able to work ok without that requirement.   
Though, we still have to set it up.
 
> Another argument: if we can remove this flag, it would make the
> testpmd commands reworkd proposed by Jijiang much more easy to
> understand: only a new "csum parse-tunnel on|off" would be required,
> and it can be explained in a few words.

Well, from my point - testpmd commands that Jijiang proposed are perfectly clear and understandable. 
Another thing, as I remember, our primary concern should be public API, no testpmd.

> 
> I'll try to do some tests on a fortville NIC if I can find one. I'm
> curious to see if we can transmit any encapsulation packet (ip in ip,
> ip in gre, eth in gre, eth in vxlan, or even a proprietary tunnel).

Ok cool.

> 
> We should avoid the need to specify the tunnel type in the OUTER
> checksum API if we can, else it would limit us to specific
> supported protocols.

>From the FVL spec it is required by HW, it is not what we introducing on our own.
Spec stays explicitly that L4TUNT (L4 tunneling type) has to be setup for tunnelling packets.
Again from the spec, there are 3 different values it can take.
If you have an idea how to pass that information to  PMD without using flags, sure we can consider it.

> 
> >>>> I think the following cases should be *forbidden by the API*:
> >>>>
> >>>> case 9) calculate checksum of in_ip and in_tcp  (was case B.1 in [1])
> >>>>
> >>>>     mb->outer_l2_len = len(out_eth)
> >>>>     mb->outer_l3_len = len(out_ip)
> >>>>     mb->l2_len = len(out_udp + vxlan + in_eth)
> >>>>     mb->l3_len = len(out_ip)
> >>>>     mb->ol_flags |= PKT_TX_IPV4 | PKT_TX_UDP_TUNNEL_PKT | \
> >>>>       PKT_TX_IP_CSUM | PKT_TX_UDP_CKSUM;
> >>>>     set out_ip checksum to 0 in the packet
> >>>>     set out_udp checksum to pseudo header using rte_ipv4_phdr_cksum()
> >>>>
> >>>>     If we remove the flag PKT_TX_UDP_TUNNEL_PKT, this cannot be
> >>>>     supported, but there is no reason to support it as there is
> >>>>     already one way to do the same.
> >>>>
> >>>>     I think the driver should not even look at mb->outer_l2_len
> >>>>     and mb->outer_l3_len if no flag PKT_TX_OUTER_* is set.
> >>>
> >>> Why it should be forbidden?
> >>> I admit it might be a bit slower than case 4),
> >>> but I think absolutely legal way to setup HW offloads for inner L3/L4.
> >>> As I said we need a PKT_TX_UDP_TUNNEL_PKT anyway, so I suppose
> >>> PKT_TX_*_TUNNEL_PKT should be an indication is it a tunnel packet or not.
> >>> PKT_TX_OUTER_* flags indicate does outer cksum offload is required or not.
> >>
> >> I don't understand. The result in terms of hardware is exactly the
> >> same than case 4). Why should we have 2 different ways for doing the
> >> same thing?
> >
> > If HW supports that capability, why should we forbid it?
> > Let user to choose himself what way to use.
> > FVL spec lists it as a valid approach.
> 
> It is not a hardware feature.

It is.

> Case 4) and case 9) would fill the hardware registers exactly the same.

No, they wouldn't.
Please read corresponding section of FVL spec and i40e_rxtx.c
For case 4) we only need to setup TDD (TX data descriptor) with the following values:
IIPT, IPLEN, L4T, L4LEN 
For case 9) we need to setup both TDD and TCD (TX context descriptor) with the following values:
TDD: IIPT, IPLEN, L4T, L4LEN
TCD: EIPT, EIPLEN,  L4TUNT, L4TUNLEN 

> To me, it's just an API question.

No, it is not.

I still don't understand why you are so eager to 'forbid' it.
Yes we support it for FVL, but no one forces you to use it. 

> 
> > As one of possible use-cases:  HW VLAN tags insertion for both inner and outer packets.
> > FVL can do that, though as I know our PMD doesn't implement it yet.
> > For that, we'll need to specify at least:
> > outer_l2_len, outer_l3_len, l2_len.
> > While PKT_TX_OUTER_* might stay cleared.
> 
> If a VLAN flag has to be inserted in outer header, a new flag
> PKT_TX_OUTER_INSERT_VLAN would be added. So my specification
> would still be correct:
> 
>    The driver should look at mb->outer_lX_len only if a
>    PKT_TX_OUTER_* flag is present.
> 

Introducing PKT_TX_OUTER_INSERT_VLAN is ok.
Though I still think we'll need TX_*_TUNNEL flags and no need to 'forbid' case 9).
BTW, as I can see linux i40e driver for tunnelling packets uses case 9), not case 4), right?
Konstantin

> >> This is really confusing for an API. Moreover, you said
> >> it: it is slower that case 4).
> >
> > I don't know would be slower then 4) or not for sure.
> > That's my guess, based on the fact that for 9) we need to fill 2 descriptors, while fro 4) - only 1.
> > Though I didn't measure the difference.
> > That's actually one more reason why to allow and support it -
> > so people can make sure that on FVL both ways work as expected and measure the difference.
> >
> > Konstantin
> 
> 
> Regards,
> Olivier



More information about the dev mailing list