[dpdk-dev] [PATCH v3 1/3] gso: support UDP/IPv4 fragmentation
    Hu, Jiayu 
    jiayu.hu at intel.com
       
    Wed Jun 27 07:05:58 CEST 2018
    
    
  
> -----Original Message-----
> From: Hu, Jiayu
> Sent: Wednesday, June 27, 2018 10:29 AM
> To: Ophir Munk <ophirmu at mellanox.com>; dev at dpdk.org
> Cc: Wang, Xiao W <xiao.w.wang at intel.com>; Ananyev, Konstantin
> <konstantin.ananyev at intel.com>; Zhang, Yuwei1
> <yuwei1.zhang at intel.com>; Iremonger, Bernard
> <bernard.iremonger at intel.com>; Thomas Monjalon
> <thomas at monjalon.net>
> Subject: RE: [dpdk-dev] [PATCH v3 1/3] gso: support UDP/IPv4
> fragmentation
> 
> Hi Ophir,
> 
> Replies are inline.
> 
> > -----Original Message-----
> > From: Ophir Munk [mailto:ophirmu at mellanox.com]
> > Sent: Wednesday, June 27, 2018 7:59 AM
> > To: Hu, Jiayu <jiayu.hu at intel.com>; dev at dpdk.org
> > Cc: Wang, Xiao W <xiao.w.wang at intel.com>; Ananyev, Konstantin
> > <konstantin.ananyev at intel.com>; Zhang, Yuwei1
> > <yuwei1.zhang at intel.com>; Iremonger, Bernard
> > <bernard.iremonger at intel.com>; Thomas Monjalon
> > <thomas at monjalon.net>
> > Subject: RE: [dpdk-dev] [PATCH v3 1/3] gso: support UDP/IPv4
> > fragmentation
> >
> > Hi,
> > Please find some comments below.
> >
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jiayu Hu
> > > Sent: Friday, June 22, 2018 8:54 AM
> > > To: dev at dpdk.org
> > > Cc: xiao.w.wang at intel.com; konstantin.ananyev at intel.com;
> > > yuwei1.zhang at intel.com; bernard.iremonger at intel.com; Thomas
> > Monjalon
> > > <thomas at monjalon.net>; Jiayu Hu <jiayu.hu at intel.com>
> > > Subject: [dpdk-dev] [PATCH v3 1/3] gso: support UDP/IPv4
> fragmentation
> > >
> > > This patch adds GSO support for UDP/IPv4 packets. Supported packets
> > may
> > > include a single VLAN tag. UDP/IPv4 GSO doesn't check if input packets
> > have
> > > correct checksums, and doesn't update checksums for output packets
> (the
> > > responsibility for this lies with the application).
> > > Additionally, UDP/IPv4 GSO doesn't process IP fragmented packets.
> > >
> > > UDP/IPv4 GSO uses two chained MBUFs, one direct MBUF and one
> indrect
> > > MBUF, to organize an output packet. The direct MBUF stores the packet
> > > header, while the indirect mbuf simply points to a location within the
> > original
> > > packet's payload. Consequently, use of UDP GSO requires multi-segment
> > > MBUF support in the TX functions of the NIC driver.
> > >
> > > If a packet is GSO'd, UDP/IPv4 GSO reduces its MBUF refcnt by 1. As a
> > result,
> > > when all of its GSOed segments are freed, the packet is freed
> > automatically.
> > >
> > > Signed-off-by: Jiayu Hu <jiayu.hu at intel.com>
> > > ---
> > >  lib/librte_gso/Makefile     |  1 +
> > >  lib/librte_gso/gso_common.h |  3 ++
> > >  lib/librte_gso/gso_udp4.c   | 81
> > > +++++++++++++++++++++++++++++++++++++++++++++
> > >  lib/librte_gso/gso_udp4.h   | 42 +++++++++++++++++++++++
> > >  lib/librte_gso/meson.build  |  2 +-
> > >  lib/librte_gso/rte_gso.c    | 24 +++++++++++---
> > >  lib/librte_gso/rte_gso.h    |  6 +++-
> > >  7 files changed, 152 insertions(+), 7 deletions(-)  create mode 100644
> > > lib/librte_gso/gso_udp4.c  create mode 100644
> lib/librte_gso/gso_udp4.h
> > >
> > > diff --git a/lib/librte_gso/Makefile b/lib/librte_gso/Makefile index
> > > 3648ec0..1fac53a 100644
> > > --- a/lib/librte_gso/Makefile
> > > +++ b/lib/librte_gso/Makefile
> > > @@ -19,6 +19,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_GSO) += rte_gso.c
> > >  SRCS-$(CONFIG_RTE_LIBRTE_GSO) += gso_common.c
> > >  SRCS-$(CONFIG_RTE_LIBRTE_GSO) += gso_tcp4.c
> > >  SRCS-$(CONFIG_RTE_LIBRTE_GSO) += gso_tunnel_tcp4.c
> > > +SRCS-$(CONFIG_RTE_LIBRTE_GSO) += gso_udp4.c
> > >
> > >  # install this header file
> > >  SYMLINK-$(CONFIG_RTE_LIBRTE_GSO)-include += rte_gso.h diff --git
> > > a/lib/librte_gso/gso_common.h b/lib/librte_gso/gso_common.h index
> > > 5ca5974..6cd764f 100644
> > > --- a/lib/librte_gso/gso_common.h
> > > +++ b/lib/librte_gso/gso_common.h
> > > @@ -31,6 +31,9 @@
> > >  		(PKT_TX_TCP_SEG | PKT_TX_IPV4 | PKT_TX_OUTER_IPV4 |
> > \
> > >  		 PKT_TX_TUNNEL_GRE))
> > >
> > > +#define IS_IPV4_UDP(flag) (((flag) & (PKT_TX_UDP_SEG | PKT_TX_IPV4))
> > ==
> > > \
> > > +		(PKT_TX_UDP_SEG | PKT_TX_IPV4))
> > > +
> > >  /**
> > >   * Internal function which updates the UDP header of a packet,
> following
> > >   * segmentation. This is required to update the header's datagram
> length
> > > field.
> > > diff --git a/lib/librte_gso/gso_udp4.c b/lib/librte_gso/gso_udp4.c new
> file
> > > mode 100644 index 0000000..927dee1
> > > --- /dev/null
> > > +++ b/lib/librte_gso/gso_udp4.c
> >
> > File gso_upd4.c could be very similar to file gso_tcp4.c and that would
> > avoid code duplication.
> > In a unified file you could use a tcp vs. udp flag to distinguish between
> them
> > when necessary.
> > The files are short (~75 lines) so it is not a critical issue.
> 
> The function gso_tcp4_segment and gso_udp4_segment have different
> prototype,
> and their GSO rules are different. They are two different basic GSO types.
> 
> The GSO library gives each GSO type (TCP, tunnel) an internal .c and .h file,
> and their name
> represents their content. The rte_gso_segment calls different function
> according to the GSO type.
> This style is very clear for developers or users to understand. In addition, as
> you said, the code
> is short. So I think it's better to keep this style for UDP/IPv4 GSO.
> 
> >
> > > @@ -0,0 +1,81 @@
> > > +/* SPDX-License-Identifier: BSD-3-Clause
> > > + * Copyright(c) 2018 Intel Corporation
> > > + */
> > > +
> > > +#include "gso_common.h"
> > > +#include "gso_udp4.h"
> > > +
> > > +#define IPV4_HDR_MF_BIT (1U << 13)
> > > +
> > > +static inline void
> > > +update_ipv4_udp_headers(struct rte_mbuf *pkt, struct rte_mbuf
> **segs,
> > > +		uint16_t nb_segs)
> > > +{
> > > +	struct ipv4_hdr *ipv4_hdr;
> > > +	uint16_t frag_offset = 0, is_mf;
> > > +	uint16_t l2_hdrlen = pkt->l2_len, l3_hdrlen = pkt->l3_len;
> > > +	uint16_t tail_idx = nb_segs - 1, length, i;
> > > +
> > > +	/*
> > > +	 * Update IP header fields for output segments. Specifically,
> > > +	 * keep the same IP id, update fragment offset and total
> > > +	 * length.
> > > +	 */
> > > +	for (i = 0; i < nb_segs; i++) {
> > > +		ipv4_hdr = rte_pktmbuf_mtod_offset(segs[i], struct
> > ipv4_hdr
> > > *,
> > > +			l2_hdrlen);
> > > +		length = segs[i]->pkt_len - l2_hdrlen;
> > > +		ipv4_hdr->total_length = rte_cpu_to_be_16(length);
> > > +
> > > +		is_mf = i < tail_idx ? IPV4_HDR_MF_BIT : 0;
> > > +		ipv4_hdr->fragment_offset =
> > > +			rte_cpu_to_be_16(frag_offset | is_mf);
> > > +		frag_offset += ((length - l3_hdrlen) >> 3);
> > > +	}
> > > +}
> > > +
> > > +int
> > > +gso_udp4_segment(struct rte_mbuf *pkt,
> > > +		uint16_t gso_size,
> > > +		struct rte_mempool *direct_pool,
> > > +		struct rte_mempool *indirect_pool,
> > > +		struct rte_mbuf **pkts_out,
> > > +		uint16_t nb_pkts_out)
> > > +{
> > > +	struct ipv4_hdr *ipv4_hdr;
> > > +	uint16_t pyld_unit_size, hdr_offset;
> > > +	uint16_t frag_off;
> > > +	int ret;
> > > +
> > > +	/* Don't process the fragmented packet */
> > > +	ipv4_hdr = rte_pktmbuf_mtod_offset(pkt, struct ipv4_hdr *,
> > > +			pkt->l2_len);
> > > +	frag_off = rte_be_to_cpu_16(ipv4_hdr->fragment_offset);
> > > +	if (unlikely(IS_FRAGMENTED(frag_off))) {
> > > +		pkts_out[0] = pkt;
> > > +		return 1;
> > > +	}
> > > +
> > > +	/*
> > > +	 * UDP fragmentation is the same as IP fragmentation.
> > > +	 * Except the first one, other output packets just have l2
> > > +	 * and l3 headers.
> > > +	 */
> > > +	hdr_offset = pkt->l2_len + pkt->l3_len;
> > > +
> > > +	/* Don't process the packet without data. */
> > > +	if (unlikely(hdr_offset + pkt->l4_len >= pkt->pkt_len)) {
> > > +		pkts_out[0] = pkt;
> > > +		return 1;
> > > +	}
> > > +
> > > +	pyld_unit_size = gso_size - hdr_offset;
> > > +
> > > +	/* Segment the payload */
> > > +	ret = gso_do_segment(pkt, hdr_offset, pyld_unit_size, direct_pool,
> > > +			indirect_pool, pkts_out, nb_pkts_out);
> > > +	if (ret > 1)
> > > +		update_ipv4_udp_headers(pkt, pkts_out, ret);
> > > +
> > > +	return ret;
> > > +}
> > > diff --git a/lib/librte_gso/gso_udp4.h b/lib/librte_gso/gso_udp4.h new
> file
> > > mode 100644 index 0000000..b2a2908
> >
> > File gso_upd4.h is almost identical to file gso_tcp4.h so both files
> (although
> > short ~40 lines) could have been unified into one file.
> 
> Ditto.
> 
> >
> > > --- /dev/null
> > > +++ b/lib/librte_gso/gso_udp4.h
> > > @@ -0,0 +1,42 @@
> > > +/* SPDX-License-Identifier: BSD-3-Clause
> > > + * Copyright(c) 2018 Intel Corporation
> > > + */
> > > +
> > > +#ifndef _GSO_UDP4_H_
> > > +#define _GSO_UDP4_H_
> > > +
> > > +#include <stdint.h>
> > > +#include <rte_mbuf.h>
> > > +
> > > +/**
> > > + * Segment an UDP/IPv4 packet. This function doesn't check if the
> input
> > > + * packet has correct checksums, and doesn't update checksums for
> > > +output
> > > + * GSO segments. Furthermore, it doesn't process IP fragment packets.
> > > + *
> > > + * @param pkt
> > > + *  The packet mbuf to segment.
> > > + * @param gso_size
> > > + *  The max length of a GSO segment, measured in bytes.
> > > + * @param direct_pool
> > > + *  MBUF pool used for allocating direct buffers for output segments.
> > > + * @param indirect_pool
> > > + *  MBUF pool used for allocating indirect buffers for output segments.
> > > + * @param pkts_out
> > > + *  Pointer array used to store the MBUF addresses of output GSO
> > > + *  segments, when the function succeeds. If the memory space in
> > > + *  pkts_out is insufficient, it fails and returns -EINVAL.
> > > + * @param nb_pkts_out
> > > + *  The max number of items that 'pkts_out' can keep.
> > > + *
> > > + * @return
> > > + *   - The number of GSO segments filled in pkts_out on success.
> > > + *   - Return -ENOMEM if run out of memory in MBUF pools.
> > > + *   - Return -EINVAL for invalid parameters.
> > > + */
> > > +int gso_udp4_segment(struct rte_mbuf *pkt,
> > > +		uint16_t gso_size,
> > > +		struct rte_mempool *direct_pool,
> > > +		struct rte_mempool *indirect_pool,
> > > +		struct rte_mbuf **pkts_out,
> > > +		uint16_t nb_pkts_out);
> > > +#endif
> > > diff --git a/lib/librte_gso/meson.build b/lib/librte_gso/meson.build
> index
> > > 056534f..ad8dd85 100644
> > > --- a/lib/librte_gso/meson.build
> > > +++ b/lib/librte_gso/meson.build
> > > @@ -1,7 +1,7 @@
> > >  # SPDX-License-Identifier: BSD-3-Clause  # Copyright(c) 2017 Intel
> > > Corporation
> > >
> > > -sources = files('gso_common.c', 'gso_tcp4.c',
> > > +sources = files('gso_common.c', 'gso_tcp4.c', 'gso_udp4.c',
> > >   		'gso_tunnel_tcp4.c', 'rte_gso.c')
> > >  headers = files('rte_gso.h')
> > >  deps += ['ethdev']
> > > diff --git a/lib/librte_gso/rte_gso.c b/lib/librte_gso/rte_gso.c index
> > > a44e3d4..751b5b6 100644
> > > --- a/lib/librte_gso/rte_gso.c
> > > +++ b/lib/librte_gso/rte_gso.c
> > > @@ -11,6 +11,17 @@
> > >  #include "gso_common.h"
> > >  #include "gso_tcp4.h"
> > >  #include "gso_tunnel_tcp4.h"
> > > +#include "gso_udp4.h"
> > > +
> > > +#define ILLEGAL_UDP_GSO_CTX(ctx) \
> > > +	((((ctx)->gso_types & DEV_TX_OFFLOAD_UDP_TSO) == 0) || \
> > > +	 (ctx)->gso_size < RTE_GSO_UDP_SEG_SIZE_MIN)
> > > +
> > > +#define ILLEGAL_TCP_GSO_CTX(ctx) \
> > > +	((((ctx)->gso_types & (DEV_TX_OFFLOAD_TCP_TSO | \
> > > +		DEV_TX_OFFLOAD_VXLAN_TNL_TSO | \
> > > +		DEV_TX_OFFLOAD_GRE_TNL_TSO)) == 0) || \
> > > +		(ctx)->gso_size < RTE_GSO_SEG_SIZE_MIN)
> >
> > Can you please explain why it is correct that the min len for VXLAN_TNL
> or
> > GRE_TNL is that of TCP MIN size (RTE_GSO_SEG_SIZE_MIN)
> 
> The logic here is a little complicated. First, we have two GSO types, i.e. TCP
> and UDP,
> and they have different requirements for min lengths and gso_types flags.
> In the code,
> we use ILLEGAL_UDP_GSO_CTX() and ILLEGAL_TCP_GSO_CTX() to check if
> the input packet
> doesn't meet the requirements. Rte_gso_segment() starts to process the
> input packet only
> when it meets UDP or TCP requirement. In other words, rte_gso_segment()
> stops if the
> packet doesn't meet both requirements at the same time. This is why I use
> " ILLEGAL_UDP_GSO_CTX(gso_ctx) && ILLEGAL_TCP_GSO_CTX(gso_ctx))"
> as the exit
> condition.
> 
> RTE_GSO_SEG_SIZE_MIN is not used to decide if input tunnel packets meet
> length
> requirement, but just checks a min length. In fact, we cannot decide a real
> correct min
> length for a packet type, since it may have vlan header or not. The original
> GSO code
> leverages this macro for tunnel packets to do a minimal check, so I think we
> can keep it
> here.
> 
> >
> > >
> >
> > To make the macros above and their usage below clearer:
> >
> > 1. I would change the || with &&. and == with !=
> >
> > #define ILLEGAL_UDP_GSO_CTX(ctx) \
> > 	((((ctx)->gso_types & DEV_TX_OFFLOAD_UDP_TSO) != 0) && \
> > 	 (ctx)->gso_size < RTE_GSO_UDP_SEG_SIZE_MIN)
> 
> This macro doesn't check all conditions for illegal UDP GSO. For example,
> if we input a UDP/IPv4 packet with setting gso_size to 1500 and without
> setting
> DEV_TX_OFFLOAD_UDP_TSO in gso_types, this macro returns 0, which
> means
> it's a legal UDP GSO packet.
> 
> If you mean we'd better use LEGAL rather than ILLEGAL as the check in the
> code,
> the exit condition should be:
> 
> #define LEGAL_UDP_GSO_CTX(ctx) \
> 	((((ctx)->gso_types & DEV_TX_OFFLOAD_UDP_TSO) != 0) && \
> 	(ctx)->gso_size >= RTE_GSO_UDP_SEG_SIZE_MIN)
> 
> if (~(LEGAL_UDP_GSO_CTX(..) || LEGAL_TCP_GSO_CTX(..)))
> 	return -EINVAL;
> 
> But in fact, it's the same as current implementation. I can add some
> explanations to the code for users to better understand the logic, if you
> think it's OK.
> 
> >
> > #define ILLEGAL_TCP_GSO_CTX(ctx) \
> > 	((((ctx)->gso_types & (DEV_TX_OFFLOAD_TCP_TSO | \
> > 		DEV_TX_OFFLOAD_VXLAN_TNL_TSO | \
> > 		DEV_TX_OFFLOAD_GRE_TNL_TSO)) != 0) && \
> > 		(ctx)->gso_size < RTE_GSO_SEG_SIZE_MIN)
> >
> 
> Ditto.
> 
> > 2. Then later I would change the && with ||
> >
> > Changing original:
> > 			(ILLEGAL_UDP_GSO_CTX(gso_ctx) &&
> > 			 ILLEGAL_TCP_GSO_CTX(gso_ctx)))
> >
> > With this:
> > 			ILLEGAL_UDP_GSO_CTX(gso_ctx) ||
> > 			 ILLEGAL_TCP_GSO_CTX(gso_ctx))
> 
> Ditto.
> 
> >
> >
> > >  int
> > >  rte_gso_segment(struct rte_mbuf *pkt,
> > > @@ -27,14 +38,12 @@ rte_gso_segment(struct rte_mbuf *pkt,
> > >
> > >  	if (pkt == NULL || pkts_out == NULL || gso_ctx == NULL ||
> > >  			nb_pkts_out < 1 ||
> > > -			gso_ctx->gso_size < RTE_GSO_SEG_SIZE_MIN ||
> > > -			((gso_ctx->gso_types &
> > > (DEV_TX_OFFLOAD_TCP_TSO |
> > > -			DEV_TX_OFFLOAD_VXLAN_TNL_TSO |
> > > -			DEV_TX_OFFLOAD_GRE_TNL_TSO)) == 0))
> > > +			(ILLEGAL_UDP_GSO_CTX(gso_ctx) &&
> > > +			 ILLEGAL_TCP_GSO_CTX(gso_ctx)))
> > >  		return -EINVAL;
> > >
> > >  	if (gso_ctx->gso_size >= pkt->pkt_len) {
> > > -		pkt->ol_flags &= (~PKT_TX_TCP_SEG);
> > > +		pkt->ol_flags &= (~(PKT_TX_TCP_SEG | PKT_TX_UDP_SEG));
> > >  		pkts_out[0] = pkt;
> > >  		return 1;
> > >  	}
> > > @@ -59,6 +68,11 @@ rte_gso_segment(struct rte_mbuf *pkt,
> > >  		ret = gso_tcp4_segment(pkt, gso_size, ipid_delta,
> > >  				direct_pool, indirect_pool,
> > >  				pkts_out, nb_pkts_out);
> > > +	} else if (IS_IPV4_UDP(pkt->ol_flags) &&
> > > +			(gso_ctx->gso_types &
> > > DEV_TX_OFFLOAD_UDP_TSO)) {
> > > +		pkt->ol_flags &= (~PKT_TX_UDP_SEG);
> > > +		ret = gso_udp4_segment(pkt, gso_size, direct_pool,
> > > +				indirect_pool, pkts_out, nb_pkts_out);
> > >  	} else {
> > >  		/* unsupported packet, skip */
> > >  		pkts_out[0] = pkt;
> > > diff --git a/lib/librte_gso/rte_gso.h b/lib/librte_gso/rte_gso.h index
> > > f4abd61..a626a11 100644
> > > --- a/lib/librte_gso/rte_gso.h
> > > +++ b/lib/librte_gso/rte_gso.h
> > > @@ -17,10 +17,14 @@ extern "C" {
> > >  #include <stdint.h>
> > >  #include <rte_mbuf.h>
> > >
> > > -/* Minimum GSO segment size. */
> > > +/* Minimum GSO segment size for TCP based packets. */
> > >  #define RTE_GSO_SEG_SIZE_MIN (sizeof(struct ether_hdr) + \
> > >  		sizeof(struct ipv4_hdr) + sizeof(struct tcp_hdr) + 1)
> >
> > RTE_GSO_SEG_SIZE_MIN is actually TCP min size. Can you name this
> macro
> > as
> > RTE_GSO_TCP_SEG_SIZE_MIN (symmetrically to the UDP macro below)?
> >
> 
> Yes, you are right. The name is not good. But I don't know if changing name
> will introduce ABI change, so I select this name as a workaround.
Sorry, it will not cause ABI change but will influence the applications
which have already used the GSO library, if we rename RTE_GSO_SEG_SIZE_MIN.
This is because this macro is an external macro. From my perspective, I prefer to
 introduce a new macro for UDP based GSO, instead of renaming it.
Jiayu
> 
> Thanks,
> Jiayu
> 
> > >
> > > +/* Minimum GSO segment size for UDP based packets. */ #define
> > > +RTE_GSO_UDP_SEG_SIZE_MIN (sizeof(struct ether_hdr) + \
> > > +		sizeof(struct ipv4_hdr) + sizeof(struct udp_hdr) + 1)
> > > +
> > >  /* GSO flags for rte_gso_ctx. */
> > >  #define RTE_GSO_FLAG_IPID_FIXED (1ULL << 0)  /**< Use fixed IP ids
> for
> > > output GSO segments. Setting
> > > --
> > > 2.7.4
    
    
More information about the dev
mailing list