[dpdk-dev] [PATCH v3 1/3] gso: support UDP/IPv4 fragmentation

Hu, Jiayu jiayu.hu at intel.com
Wed Jun 27 04:28:44 CEST 2018


Hi Ophir,

Replies are inline.

> -----Original Message-----
> From: Ophir Munk [mailto:ophirmu at mellanox.com]
> Sent: Wednesday, June 27, 2018 7:59 AM
> To: Hu, Jiayu <jiayu.hu at intel.com>; dev at dpdk.org
> Cc: Wang, Xiao W <xiao.w.wang at intel.com>; Ananyev, Konstantin
> <konstantin.ananyev at intel.com>; Zhang, Yuwei1
> <yuwei1.zhang at intel.com>; Iremonger, Bernard
> <bernard.iremonger at intel.com>; Thomas Monjalon
> <thomas at monjalon.net>
> Subject: RE: [dpdk-dev] [PATCH v3 1/3] gso: support UDP/IPv4
> fragmentation
> 
> Hi,
> Please find some comments below.
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jiayu Hu
> > Sent: Friday, June 22, 2018 8:54 AM
> > To: dev at dpdk.org
> > Cc: xiao.w.wang at intel.com; konstantin.ananyev at intel.com;
> > yuwei1.zhang at intel.com; bernard.iremonger at intel.com; Thomas
> Monjalon
> > <thomas at monjalon.net>; Jiayu Hu <jiayu.hu at intel.com>
> > Subject: [dpdk-dev] [PATCH v3 1/3] gso: support UDP/IPv4 fragmentation
> >
> > This patch adds GSO support for UDP/IPv4 packets. Supported packets
> may
> > include a single VLAN tag. UDP/IPv4 GSO doesn't check if input packets
> have
> > correct checksums, and doesn't update checksums for output packets (the
> > responsibility for this lies with the application).
> > Additionally, UDP/IPv4 GSO doesn't process IP fragmented packets.
> >
> > UDP/IPv4 GSO uses two chained MBUFs, one direct MBUF and one indrect
> > MBUF, to organize an output packet. The direct MBUF stores the packet
> > header, while the indirect mbuf simply points to a location within the
> original
> > packet's payload. Consequently, use of UDP GSO requires multi-segment
> > MBUF support in the TX functions of the NIC driver.
> >
> > If a packet is GSO'd, UDP/IPv4 GSO reduces its MBUF refcnt by 1. As a
> result,
> > when all of its GSOed segments are freed, the packet is freed
> automatically.
> >
> > Signed-off-by: Jiayu Hu <jiayu.hu at intel.com>
> > ---
> >  lib/librte_gso/Makefile     |  1 +
> >  lib/librte_gso/gso_common.h |  3 ++
> >  lib/librte_gso/gso_udp4.c   | 81
> > +++++++++++++++++++++++++++++++++++++++++++++
> >  lib/librte_gso/gso_udp4.h   | 42 +++++++++++++++++++++++
> >  lib/librte_gso/meson.build  |  2 +-
> >  lib/librte_gso/rte_gso.c    | 24 +++++++++++---
> >  lib/librte_gso/rte_gso.h    |  6 +++-
> >  7 files changed, 152 insertions(+), 7 deletions(-)  create mode 100644
> > lib/librte_gso/gso_udp4.c  create mode 100644 lib/librte_gso/gso_udp4.h
> >
> > diff --git a/lib/librte_gso/Makefile b/lib/librte_gso/Makefile index
> > 3648ec0..1fac53a 100644
> > --- a/lib/librte_gso/Makefile
> > +++ b/lib/librte_gso/Makefile
> > @@ -19,6 +19,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_GSO) += rte_gso.c
> >  SRCS-$(CONFIG_RTE_LIBRTE_GSO) += gso_common.c
> >  SRCS-$(CONFIG_RTE_LIBRTE_GSO) += gso_tcp4.c
> >  SRCS-$(CONFIG_RTE_LIBRTE_GSO) += gso_tunnel_tcp4.c
> > +SRCS-$(CONFIG_RTE_LIBRTE_GSO) += gso_udp4.c
> >
> >  # install this header file
> >  SYMLINK-$(CONFIG_RTE_LIBRTE_GSO)-include += rte_gso.h diff --git
> > a/lib/librte_gso/gso_common.h b/lib/librte_gso/gso_common.h index
> > 5ca5974..6cd764f 100644
> > --- a/lib/librte_gso/gso_common.h
> > +++ b/lib/librte_gso/gso_common.h
> > @@ -31,6 +31,9 @@
> >  		(PKT_TX_TCP_SEG | PKT_TX_IPV4 | PKT_TX_OUTER_IPV4 |
> \
> >  		 PKT_TX_TUNNEL_GRE))
> >
> > +#define IS_IPV4_UDP(flag) (((flag) & (PKT_TX_UDP_SEG | PKT_TX_IPV4))
> ==
> > \
> > +		(PKT_TX_UDP_SEG | PKT_TX_IPV4))
> > +
> >  /**
> >   * Internal function which updates the UDP header of a packet, following
> >   * segmentation. This is required to update the header's datagram length
> > field.
> > diff --git a/lib/librte_gso/gso_udp4.c b/lib/librte_gso/gso_udp4.c new file
> > mode 100644 index 0000000..927dee1
> > --- /dev/null
> > +++ b/lib/librte_gso/gso_udp4.c
> 
> File gso_upd4.c could be very similar to file gso_tcp4.c and that would
> avoid code duplication.
> In a unified file you could use a tcp vs. udp flag to distinguish between them
> when necessary.
> The files are short (~75 lines) so it is not a critical issue.

The function gso_tcp4_segment and gso_udp4_segment have different prototype,
and their GSO rules are different. They are two different basic GSO types.

The GSO library gives each GSO type (TCP, tunnel) an internal .c and .h file, and their name
represents their content. The rte_gso_segment calls different function according to the GSO type.
This style is very clear for developers or users to understand. In addition, as you said, the code
is short. So I think it's better to keep this style for UDP/IPv4 GSO.

> 
> > @@ -0,0 +1,81 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(c) 2018 Intel Corporation
> > + */
> > +
> > +#include "gso_common.h"
> > +#include "gso_udp4.h"
> > +
> > +#define IPV4_HDR_MF_BIT (1U << 13)
> > +
> > +static inline void
> > +update_ipv4_udp_headers(struct rte_mbuf *pkt, struct rte_mbuf **segs,
> > +		uint16_t nb_segs)
> > +{
> > +	struct ipv4_hdr *ipv4_hdr;
> > +	uint16_t frag_offset = 0, is_mf;
> > +	uint16_t l2_hdrlen = pkt->l2_len, l3_hdrlen = pkt->l3_len;
> > +	uint16_t tail_idx = nb_segs - 1, length, i;
> > +
> > +	/*
> > +	 * Update IP header fields for output segments. Specifically,
> > +	 * keep the same IP id, update fragment offset and total
> > +	 * length.
> > +	 */
> > +	for (i = 0; i < nb_segs; i++) {
> > +		ipv4_hdr = rte_pktmbuf_mtod_offset(segs[i], struct
> ipv4_hdr
> > *,
> > +			l2_hdrlen);
> > +		length = segs[i]->pkt_len - l2_hdrlen;
> > +		ipv4_hdr->total_length = rte_cpu_to_be_16(length);
> > +
> > +		is_mf = i < tail_idx ? IPV4_HDR_MF_BIT : 0;
> > +		ipv4_hdr->fragment_offset =
> > +			rte_cpu_to_be_16(frag_offset | is_mf);
> > +		frag_offset += ((length - l3_hdrlen) >> 3);
> > +	}
> > +}
> > +
> > +int
> > +gso_udp4_segment(struct rte_mbuf *pkt,
> > +		uint16_t gso_size,
> > +		struct rte_mempool *direct_pool,
> > +		struct rte_mempool *indirect_pool,
> > +		struct rte_mbuf **pkts_out,
> > +		uint16_t nb_pkts_out)
> > +{
> > +	struct ipv4_hdr *ipv4_hdr;
> > +	uint16_t pyld_unit_size, hdr_offset;
> > +	uint16_t frag_off;
> > +	int ret;
> > +
> > +	/* Don't process the fragmented packet */
> > +	ipv4_hdr = rte_pktmbuf_mtod_offset(pkt, struct ipv4_hdr *,
> > +			pkt->l2_len);
> > +	frag_off = rte_be_to_cpu_16(ipv4_hdr->fragment_offset);
> > +	if (unlikely(IS_FRAGMENTED(frag_off))) {
> > +		pkts_out[0] = pkt;
> > +		return 1;
> > +	}
> > +
> > +	/*
> > +	 * UDP fragmentation is the same as IP fragmentation.
> > +	 * Except the first one, other output packets just have l2
> > +	 * and l3 headers.
> > +	 */
> > +	hdr_offset = pkt->l2_len + pkt->l3_len;
> > +
> > +	/* Don't process the packet without data. */
> > +	if (unlikely(hdr_offset + pkt->l4_len >= pkt->pkt_len)) {
> > +		pkts_out[0] = pkt;
> > +		return 1;
> > +	}
> > +
> > +	pyld_unit_size = gso_size - hdr_offset;
> > +
> > +	/* Segment the payload */
> > +	ret = gso_do_segment(pkt, hdr_offset, pyld_unit_size, direct_pool,
> > +			indirect_pool, pkts_out, nb_pkts_out);
> > +	if (ret > 1)
> > +		update_ipv4_udp_headers(pkt, pkts_out, ret);
> > +
> > +	return ret;
> > +}
> > diff --git a/lib/librte_gso/gso_udp4.h b/lib/librte_gso/gso_udp4.h new file
> > mode 100644 index 0000000..b2a2908
> 
> File gso_upd4.h is almost identical to file gso_tcp4.h so both files (although
> short ~40 lines) could have been unified into one file.

Ditto.

> 
> > --- /dev/null
> > +++ b/lib/librte_gso/gso_udp4.h
> > @@ -0,0 +1,42 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(c) 2018 Intel Corporation
> > + */
> > +
> > +#ifndef _GSO_UDP4_H_
> > +#define _GSO_UDP4_H_
> > +
> > +#include <stdint.h>
> > +#include <rte_mbuf.h>
> > +
> > +/**
> > + * Segment an UDP/IPv4 packet. This function doesn't check if the input
> > + * packet has correct checksums, and doesn't update checksums for
> > +output
> > + * GSO segments. Furthermore, it doesn't process IP fragment packets.
> > + *
> > + * @param pkt
> > + *  The packet mbuf to segment.
> > + * @param gso_size
> > + *  The max length of a GSO segment, measured in bytes.
> > + * @param direct_pool
> > + *  MBUF pool used for allocating direct buffers for output segments.
> > + * @param indirect_pool
> > + *  MBUF pool used for allocating indirect buffers for output segments.
> > + * @param pkts_out
> > + *  Pointer array used to store the MBUF addresses of output GSO
> > + *  segments, when the function succeeds. If the memory space in
> > + *  pkts_out is insufficient, it fails and returns -EINVAL.
> > + * @param nb_pkts_out
> > + *  The max number of items that 'pkts_out' can keep.
> > + *
> > + * @return
> > + *   - The number of GSO segments filled in pkts_out on success.
> > + *   - Return -ENOMEM if run out of memory in MBUF pools.
> > + *   - Return -EINVAL for invalid parameters.
> > + */
> > +int gso_udp4_segment(struct rte_mbuf *pkt,
> > +		uint16_t gso_size,
> > +		struct rte_mempool *direct_pool,
> > +		struct rte_mempool *indirect_pool,
> > +		struct rte_mbuf **pkts_out,
> > +		uint16_t nb_pkts_out);
> > +#endif
> > diff --git a/lib/librte_gso/meson.build b/lib/librte_gso/meson.build index
> > 056534f..ad8dd85 100644
> > --- a/lib/librte_gso/meson.build
> > +++ b/lib/librte_gso/meson.build
> > @@ -1,7 +1,7 @@
> >  # SPDX-License-Identifier: BSD-3-Clause  # Copyright(c) 2017 Intel
> > Corporation
> >
> > -sources = files('gso_common.c', 'gso_tcp4.c',
> > +sources = files('gso_common.c', 'gso_tcp4.c', 'gso_udp4.c',
> >   		'gso_tunnel_tcp4.c', 'rte_gso.c')
> >  headers = files('rte_gso.h')
> >  deps += ['ethdev']
> > diff --git a/lib/librte_gso/rte_gso.c b/lib/librte_gso/rte_gso.c index
> > a44e3d4..751b5b6 100644
> > --- a/lib/librte_gso/rte_gso.c
> > +++ b/lib/librte_gso/rte_gso.c
> > @@ -11,6 +11,17 @@
> >  #include "gso_common.h"
> >  #include "gso_tcp4.h"
> >  #include "gso_tunnel_tcp4.h"
> > +#include "gso_udp4.h"
> > +
> > +#define ILLEGAL_UDP_GSO_CTX(ctx) \
> > +	((((ctx)->gso_types & DEV_TX_OFFLOAD_UDP_TSO) == 0) || \
> > +	 (ctx)->gso_size < RTE_GSO_UDP_SEG_SIZE_MIN)
> > +
> > +#define ILLEGAL_TCP_GSO_CTX(ctx) \
> > +	((((ctx)->gso_types & (DEV_TX_OFFLOAD_TCP_TSO | \
> > +		DEV_TX_OFFLOAD_VXLAN_TNL_TSO | \
> > +		DEV_TX_OFFLOAD_GRE_TNL_TSO)) == 0) || \
> > +		(ctx)->gso_size < RTE_GSO_SEG_SIZE_MIN)
> 
> Can you please explain why it is correct that the min len for VXLAN_TNL or
> GRE_TNL is that of TCP MIN size (RTE_GSO_SEG_SIZE_MIN)

The logic here is a little complicated. First, we have two GSO types, i.e. TCP and UDP,
and they have different requirements for min lengths and gso_types flags. In the code,
we use ILLEGAL_UDP_GSO_CTX() and ILLEGAL_TCP_GSO_CTX() to check if the input packet
doesn't meet the requirements. Rte_gso_segment() starts to process the input packet only
when it meets UDP or TCP requirement. In other words, rte_gso_segment() stops if the
packet doesn't meet both requirements at the same time. This is why I use
" ILLEGAL_UDP_GSO_CTX(gso_ctx) && ILLEGAL_TCP_GSO_CTX(gso_ctx))" as the exit
condition.

RTE_GSO_SEG_SIZE_MIN is not used to decide if input tunnel packets meet length
requirement, but just checks a min length. In fact, we cannot decide a real correct min
length for a packet type, since it may have vlan header or not. The original GSO code
leverages this macro for tunnel packets to do a minimal check, so I think we can keep it
here.

> 
> >
> 
> To make the macros above and their usage below clearer:
> 
> 1. I would change the || with &&. and == with !=
> 
> #define ILLEGAL_UDP_GSO_CTX(ctx) \
> 	((((ctx)->gso_types & DEV_TX_OFFLOAD_UDP_TSO) != 0) && \
> 	 (ctx)->gso_size < RTE_GSO_UDP_SEG_SIZE_MIN)

This macro doesn't check all conditions for illegal UDP GSO. For example,
if we input a UDP/IPv4 packet with setting gso_size to 1500 and without setting
DEV_TX_OFFLOAD_UDP_TSO in gso_types, this macro returns 0, which means
it's a legal UDP GSO packet.

If you mean we'd better use LEGAL rather than ILLEGAL as the check in the code,
the exit condition should be: 

#define LEGAL_UDP_GSO_CTX(ctx) \
	((((ctx)->gso_types & DEV_TX_OFFLOAD_UDP_TSO) != 0) && \
	(ctx)->gso_size >= RTE_GSO_UDP_SEG_SIZE_MIN)

if (~(LEGAL_UDP_GSO_CTX(..) || LEGAL_TCP_GSO_CTX(..)))
	return -EINVAL;

But in fact, it's the same as current implementation. I can add some
explanations to the code for users to better understand the logic, if you
think it's OK.

> 
> #define ILLEGAL_TCP_GSO_CTX(ctx) \
> 	((((ctx)->gso_types & (DEV_TX_OFFLOAD_TCP_TSO | \
> 		DEV_TX_OFFLOAD_VXLAN_TNL_TSO | \
> 		DEV_TX_OFFLOAD_GRE_TNL_TSO)) != 0) && \
> 		(ctx)->gso_size < RTE_GSO_SEG_SIZE_MIN)
> 

Ditto.

> 2. Then later I would change the && with ||
> 
> Changing original:
> 			(ILLEGAL_UDP_GSO_CTX(gso_ctx) &&
> 			 ILLEGAL_TCP_GSO_CTX(gso_ctx)))
> 
> With this:
> 			ILLEGAL_UDP_GSO_CTX(gso_ctx) ||
> 			 ILLEGAL_TCP_GSO_CTX(gso_ctx))

Ditto.

> 
> 
> >  int
> >  rte_gso_segment(struct rte_mbuf *pkt,
> > @@ -27,14 +38,12 @@ rte_gso_segment(struct rte_mbuf *pkt,
> >
> >  	if (pkt == NULL || pkts_out == NULL || gso_ctx == NULL ||
> >  			nb_pkts_out < 1 ||
> > -			gso_ctx->gso_size < RTE_GSO_SEG_SIZE_MIN ||
> > -			((gso_ctx->gso_types &
> > (DEV_TX_OFFLOAD_TCP_TSO |
> > -			DEV_TX_OFFLOAD_VXLAN_TNL_TSO |
> > -			DEV_TX_OFFLOAD_GRE_TNL_TSO)) == 0))
> > +			(ILLEGAL_UDP_GSO_CTX(gso_ctx) &&
> > +			 ILLEGAL_TCP_GSO_CTX(gso_ctx)))
> >  		return -EINVAL;
> >
> >  	if (gso_ctx->gso_size >= pkt->pkt_len) {
> > -		pkt->ol_flags &= (~PKT_TX_TCP_SEG);
> > +		pkt->ol_flags &= (~(PKT_TX_TCP_SEG | PKT_TX_UDP_SEG));
> >  		pkts_out[0] = pkt;
> >  		return 1;
> >  	}
> > @@ -59,6 +68,11 @@ rte_gso_segment(struct rte_mbuf *pkt,
> >  		ret = gso_tcp4_segment(pkt, gso_size, ipid_delta,
> >  				direct_pool, indirect_pool,
> >  				pkts_out, nb_pkts_out);
> > +	} else if (IS_IPV4_UDP(pkt->ol_flags) &&
> > +			(gso_ctx->gso_types &
> > DEV_TX_OFFLOAD_UDP_TSO)) {
> > +		pkt->ol_flags &= (~PKT_TX_UDP_SEG);
> > +		ret = gso_udp4_segment(pkt, gso_size, direct_pool,
> > +				indirect_pool, pkts_out, nb_pkts_out);
> >  	} else {
> >  		/* unsupported packet, skip */
> >  		pkts_out[0] = pkt;
> > diff --git a/lib/librte_gso/rte_gso.h b/lib/librte_gso/rte_gso.h index
> > f4abd61..a626a11 100644
> > --- a/lib/librte_gso/rte_gso.h
> > +++ b/lib/librte_gso/rte_gso.h
> > @@ -17,10 +17,14 @@ extern "C" {
> >  #include <stdint.h>
> >  #include <rte_mbuf.h>
> >
> > -/* Minimum GSO segment size. */
> > +/* Minimum GSO segment size for TCP based packets. */
> >  #define RTE_GSO_SEG_SIZE_MIN (sizeof(struct ether_hdr) + \
> >  		sizeof(struct ipv4_hdr) + sizeof(struct tcp_hdr) + 1)
> 
> RTE_GSO_SEG_SIZE_MIN is actually TCP min size. Can you name this macro
> as
> RTE_GSO_TCP_SEG_SIZE_MIN (symmetrically to the UDP macro below)?
> 

Yes, you are right. The name is not good. But I don't know if changing name
will introduce ABI change, so I select this name as a workaround.

Thanks,
Jiayu

> >
> > +/* Minimum GSO segment size for UDP based packets. */ #define
> > +RTE_GSO_UDP_SEG_SIZE_MIN (sizeof(struct ether_hdr) + \
> > +		sizeof(struct ipv4_hdr) + sizeof(struct udp_hdr) + 1)
> > +
> >  /* GSO flags for rte_gso_ctx. */
> >  #define RTE_GSO_FLAG_IPID_FIXED (1ULL << 0)  /**< Use fixed IP ids for
> > output GSO segments. Setting
> > --
> > 2.7.4



More information about the dev mailing list