[dpdk-dev] [PATCH 1/3] mbuf:add two TX offload flags and change three fields

Ananyev, Konstantin konstantin.ananyev at intel.com
Thu Nov 27 15:56:25 CET 2014



> 
> -----Original Message-----
> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
> Sent: Thursday, November 27, 2014 6:00 PM
> To: Liu, Jijiang; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 1/3] mbuf:add two TX offload flags and change three fields
> 
> Hi Jijiang,
> 
> Please see some comments below.
> 
> On 11/27/2014 09:18 AM, Jijiang Liu wrote:
> > In place of removing the PKT_TX_VXLAN_CKSUM, we introduce 2 new flags: PKT_TX_OUT_IP_CKSUM, PKT_TX_UDP_TUNNEL_PKT,
> and a new field: l4_tun_len.
> > Replace the inner_l2_len and the inner_l3_len field with the outer_l2_len and outer_l3_len field.
> >
> > PKT_TX_OUT_IP_CKSUM: is not used for non-tunnelling packet;hardware outer checksum for tunnelling packet.
> > PKT_TX_UDP_TUNNEL_PKT: is used to tell PMD that the transmit packet is a UDP tunneling packet.
> > l4_tun_len: for VXLAN packet, it should be udp header length plus VXLAN header length.
> >
> > Signed-off-by: Jijiang Liu <jijiang.liu at intel.com>
> > ---
> >   lib/librte_mbuf/rte_mbuf.c |    2 +-
> >   lib/librte_mbuf/rte_mbuf.h |   23 ++++++++++++++---------
> >   2 files changed, 15 insertions(+), 10 deletions(-)
> >
> > diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
> > index 87c2963..e89c310 100644
> > --- a/lib/librte_mbuf/rte_mbuf.c
> > +++ b/lib/librte_mbuf/rte_mbuf.c
> > @@ -240,7 +240,7 @@ const char *rte_get_tx_ol_flag_name(uint64_t mask)
> >   	case PKT_TX_SCTP_CKSUM: return "PKT_TX_SCTP_CKSUM";
> >   	case PKT_TX_UDP_CKSUM: return "PKT_TX_UDP_CKSUM";
> >   	case PKT_TX_IEEE1588_TMST: return "PKT_TX_IEEE1588_TMST";
> > -	case PKT_TX_VXLAN_CKSUM: return "PKT_TX_VXLAN_CKSUM";
> > +	case PKT_TX_UDP_TUNNEL_PKT: return "PKT_TX_UDP_TUNNEL_PKT";
> >   	case PKT_TX_TCP_SEG: return "PKT_TX_TCP_SEG";
> >   	default: return NULL;
> 
> As I said as a reply to the cover letter, I suggest to use PKT_TX_OUT_UDP_CKSUM instead of PKT_TX_UDP_TUNNEL_PKT.

HW don't support outer L4 checksum offload.
But to calculate inner checksums correctly, it needs a hint from SW about L4 Tunneling Type.
Currently the following values are recognised by HW:

L4 Tunneling Type (Teredo / GRE header / VXLAN header) indication:
00b - No UDP / GRE tunneling (field must be set to zero if EIPT equals to zero)
01b - UDP tunneling header (any UDP tunneling, VXLAN and Geneve).
10b - GRE tunneling header
Else - reserved

You can check yourself:
http://www.intel.com/content/www/us/en/embedded/products/networking/xl710-10-40-controller-datasheet.html
Sections 8.4.2.2.1 and 8.4.4.2

> 
> Also, the PKT_TX_OUT_IP_CKSUM case is missing here.
> 
> > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> > index 367fc56..48cd8e1 100644
> > --- a/lib/librte_mbuf/rte_mbuf.h
> > +++ b/lib/librte_mbuf/rte_mbuf.h
> > @@ -99,10 +99,9 @@ extern "C" {
> >   #define PKT_RX_TUNNEL_IPV6_HDR (1ULL << 12) /**< RX tunnel packet with IPv6 header. */
> >   #define PKT_RX_FDIR_ID       (1ULL << 13) /**< FD id reported if FDIR match. */
> >   #define PKT_RX_FDIR_FLX      (1ULL << 14) /**< Flexible bytes reported if FDIR match. */
> > -/* add new RX flags here */
> >
> 
> We should probably not remove this line.
> 
> 
> >   /* add new TX flags here */
> > -#define PKT_TX_VXLAN_CKSUM   (1ULL << 50) /**< TX checksum of VXLAN computed by NIC */
> > +#define PKT_TX_UDP_TUNNEL_PKT (1ULL << 50) /**< TX packet is an UDP
> > +tunneling packet */
> >   #define PKT_TX_IEEE1588_TMST (1ULL << 51) /**< TX IEEE1588 packet to
> > timestamp. */
> >
> >   /**
> > @@ -125,13 +124,20 @@ extern "C" {
> >   #define PKT_TX_IP_CKSUM      (1ULL << 54) /**< IP cksum of TX pkt. computed by NIC. */
> >   #define PKT_TX_IPV4_CSUM     PKT_TX_IP_CKSUM /**< Alias of PKT_TX_IP_CKSUM. */
> >
> > +#define PKT_TX_VLAN_PKT      (1ULL << 55) /**< TX packet is a 802.1q VLAN packet. */
> > +
> >   /** Tell the NIC it's an IPv4 packet. Required for L4 checksum offload or TSO. */
> > -#define PKT_TX_IPV4          PKT_RX_IPV4_HDR
> > +#define PKT_TX_IPV4          (1ULL << 56)
> >
> >   /** Tell the NIC it's an IPv6 packet. Required for L4 checksum offload or TSO. */
> > -#define PKT_TX_IPV6          PKT_RX_IPV6_HDR
> > +#define PKT_TX_IPV6          (1ULL << 57)
> 
> The description in comment does not match the description in the cover letter.
> 
> Also, I think replacing PKT_RX_IPV[46]_HDR by the value may go in another commit.
> 
> 
> > -#define PKT_TX_VLAN_PKT      (1ULL << 55) /**< TX packet is a 802.1q VLAN packet. */
> > +/** Outer IP cksum of TX pkt. computed by NIC for tunneling packet */
> > +#define PKT_TX_OUTER_IP_CKSUM   (1ULL << 58)
> > +#define PKT_TX_OUTER_IPV4_CSUM  PKT_TX_OUTER_IP_CKSUM /**< Alias of
> > +PKT_TX_OUTER_IP_CKSUM. */
> 
> Why do we need an alias?
> 
> By the way, I think the alias of PKT_TX_IP_CKSUM is also uneeded and can be removed. But it's not the topic of your series.
> 
> Also, the name PKT_TX_OUTER_IP_CKSUM does not match the name in the cover letter and commit logs.
> 
> 
> > +
> > +/** Tell the NIC it's an outer IPv6 packet for tunneling packet.*/
> > +#define PKT_TX_OUTER_IPV6    (1ULL << 59)
> >
> 
> This flag is not in the cover letter or commit log. What is its purpose?


My bad, forgot that for outer IP, will also need to specify it's type.
So same story here as for inner IP.
So in total, we might need 3 flags for outer IP:

/* Tells HW that outer IP is IPV4 and checksum for it should be calculated by HW. */
PKT_TX_OUTER_IP_CKSUM

/* Tells HW that outer IP is IPV4 and checksum for it should not be calculated by HW. */
PKT_TX_OUTER_IPV4

/* Tells HW that outer IP is IPV6. */
PKT_TX_OUTER_IPV6

> 
> 
> >   /**
> >    * TCP segmentation offload. To enable this offload feature for a @@
> > -266,10 +272,9 @@ struct rte_mbuf {
> >   			uint64_t tso_segsz:16; /**< TCP TSO segment size */
> >
> >   			/* fields for TX offloading of tunnels */
> > -			uint64_t inner_l3_len:9; /**< inner L3 (IP) Hdr Length. */
> > -			uint64_t inner_l2_len:7; /**< inner L2 (MAC) Hdr Length. */
> > -
> > -			/* uint64_t unused:8; */
> > +			uint64_t outer_l3_len:9; /**< outer L3 (IP) Hdr Length. */
> > +			uint64_t outer_l2_len:7; /**< outer L2 (MAC) Hdr Length. */
> > +			uint64_t l4_tun_len:8; /**< L4 tunnelling header length */
> >   		};
> >   	};
> >   } __rte_cache_aligned;
> >
> 
> About l4_tun_len, I have another comment I forgot to add in the cover letter. Can we remove it and include its length in outer_l2_len
> instead? For instance, replace:
> 
>       mb->l2_len =  eth_hdr_in;
>       mb->l3_len = ipv4_hdr_in;
>       mb->outer_l2_len = eth_hdr_out;
>       mb->outer_l3_len = ipv4_hdr_out;
>       mb->l4tun_len = vxlan_hdr;
>       mb->ol_flags |= PKT_TX_OUT_IP_CKSUM  | PKT_TX_UDP_TUNNEL |
>         PKT_TX_IP_CKSUM |  PKT_TX_TCP_CKSUM;
> 
> by:
> 
>       mb->l2_len =  eth_hdr_in;
>       mb->l3_len = ipv4_hdr_in;
>       mb->outer_l2_len = eth_hdr_out + vxlan_hdr;
>       mb->outer_l3_len = ipv4_hdr_out;
>       mb->ol_flags |= PKT_TX_OUT_IP_CKSUM  | PKT_TX_UDP_TUNNEL |
>         PKT_TX_IP_CKSUM |  PKT_TX_TCP_CKSUM;
> 
> I think it won't bother the driver, and it's coherent with case B.2 of your cover letter.

You probably meant:
mb->l2_len =  eth_hdr_in + vxlan_hdr;
?
Yes, I think it could be done that way too.
Though I still prefer to keep l4tun_len - it makes things a bit cleaner (at least to me). 
After all  we do have space for it in mbuf's tx_offload.
Konstantin

> 
> Regards,
> Olivier



More information about the dev mailing list