[dpdk-dev] [PATCH v10 2/3] lib/gro: add TCP/IPv4 GRO support
    Hu, Jiayu 
    jiayu.hu at intel.com
       
    Mon Jul  3 07:13:58 CEST 2017
    
    
  
Hi Jianfeng,
> -----Original Message-----
> From: Tan, Jianfeng
> Sent: Sunday, July 2, 2017 6:20 PM
> To: Hu, Jiayu <jiayu.hu at intel.com>; dev at dpdk.org
> Cc: Ananyev, Konstantin <konstantin.ananyev at intel.com>;
> stephen at networkplumber.org; yliu at fridaylinux.org; Wu, Jingjing
> <jingjing.wu at intel.com>; Yao, Lei A <lei.a.yao at intel.com>; Bie, Tiwei
> <tiwei.bie at intel.com>
> Subject: Re: [PATCH v10 2/3] lib/gro: add TCP/IPv4 GRO support
> 
> 
> 
> On 7/1/2017 7:08 PM, Jiayu Hu wrote:
> > In this patch, we introduce five APIs to support TCP/IPv4 GRO.
> > - gro_tcp4_tbl_create: create a TCP/IPv4 reassembly table, which is used
> >      to merge packets.
> > - gro_tcp4_tbl_destroy: free memory space of a TCP/IPv4 reassembly table.
> > - gro_tcp4_tbl_timeout_flush: flush timeout packets from a TCP/IPv4
> >      reassembly table.
> > - gro_tcp4_tbl_item_num: return the number of packets in a TCP/IPv4
> >      reassembly table.
> > - gro_tcp4_reassemble: reassemble an inputted TCP/IPv4 packet.
> >
> > TCP/IPv4 GRO API assumes all inputted packets are with correct IPv4
> > and TCP checksums. And TCP/IPv4 GRO API doesn't update IPv4 and TCP
> > checksums for merged packets. If inputted packets are IP fragmented,
> > TCP/IPv4 GRO API assumes they are complete packets (i.e. with L4
> > headers).
> >
> > In TCP/IPv4 GRO, we use a table structure, called TCP/IPv4 reassembly
> > table, to reassemble packets. A TCP/IPv4 reassembly table includes a key
> > array and a item array, where the key array keeps the criteria to merge
> > packets and the item array keeps packet information.
> >
> > One key in the key array points to an item group, which consists of
> > packets which have the same criteria value. If two packets are able to
> > merge, they must be in the same item group. Each key in the key array
> > includes two parts:
> > - criteria: the criteria of merging packets. If two packets can be
> >      merged, they must have the same criteria value.
> > - start_index: the index of the first incoming packet of the item group.
> >
> > Each element in the item array keeps the information of one packet. It
> > mainly includes two parts:
> > - pkt: packet address
> > - next_pkt_index: the index of the next packet in the same item group.
> >      All packets in the same item group are chained by next_pkt_index.
> >      With next_pkt_index, we can locate all packets in the same item
> >      group one by one.
> >
> > To process an incoming packet needs three steps:
> > a. check if the packet should be processed. Packets with one of the
> >      following properties won't be processed:
> > 	- SYN, FIN, RST or PSH bit is set;
> > 	- packet payload length is 0.
> > b. traverse the key array to find a key which has the same criteria
> >      value with the incoming packet. If find, goto step c. Otherwise,
> >      insert a new key and insert the packet into the item array.
> > c. locate the first packet in the item group via the start_index in the
> >      key. Then traverse all packets in the item group via next_pkt_index.
> >      If find one packet which can merge with the incoming one, merge them
> >      together. If can't find, insert the packet into this item group.
> >
> > Signed-off-by: Jiayu Hu <jiayu.hu at intel.com>
> > ---
> >   doc/guides/rel_notes/release_17_08.rst |   7 +
> >   lib/librte_gro/Makefile                |   1 +
> >   lib/librte_gro/rte_gro.c               | 123 ++++++++-
> >   lib/librte_gro/rte_gro.h               |  10 +-
> >   lib/librte_gro/rte_gro_tcp4.c          | 439
> +++++++++++++++++++++++++++++++++
> >   lib/librte_gro/rte_gro_tcp4.h          | 172 +++++++++++++
> >   6 files changed, 736 insertions(+), 16 deletions(-)
> >   create mode 100644 lib/librte_gro/rte_gro_tcp4.c
> >   create mode 100644 lib/librte_gro/rte_gro_tcp4.h
> >
> > diff --git a/doc/guides/rel_notes/release_17_08.rst
> b/doc/guides/rel_notes/release_17_08.rst
> > index 842f46f..f067247 100644
> > --- a/doc/guides/rel_notes/release_17_08.rst
> > +++ b/doc/guides/rel_notes/release_17_08.rst
> > @@ -75,6 +75,13 @@ New Features
> >
> >     Added support for firmwares with multiple Ethernet ports per physical
> port.
> >
> > +* **Add Generic Receive Offload API support.**
> > +
> > +  Generic Receive Offload (GRO) API supports to reassemble TCP/IPv4
> > +  packets. GRO API assumes all inputted packets are with correct
> > +  checksums. GRO API doesn't update checksums for merged packets. If
> > +  inputted packets are IP fragmented, GRO API assumes they are complete
> > +  packets (i.e. with L4 headers).
> >
> >   Resolved Issues
> >   ---------------
> > diff --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile
> > index 7e0f128..43e276e 100644
> > --- a/lib/librte_gro/Makefile
> > +++ b/lib/librte_gro/Makefile
> > @@ -43,6 +43,7 @@ LIBABIVER := 1
> >
> >   # source files
> >   SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro.c
> > +SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro_tcp4.c
> >
> >   # install this header file
> >   SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include += rte_gro.h
> > diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
> > index 648835b..a4641f9 100644
> > --- a/lib/librte_gro/rte_gro.c
> > +++ b/lib/librte_gro/rte_gro.c
> > @@ -32,8 +32,11 @@
> >
> >   #include <rte_malloc.h>
> >   #include <rte_mbuf.h>
> > +#include <rte_cycles.h>
> > +#include <rte_ethdev.h>
> >
> >   #include "rte_gro.h"
> > +#include "rte_gro_tcp4.h"
> >
> >   typedef void *(*gro_tbl_create_fn)(uint16_t socket_id,
> >   		uint16_t max_flow_num,
> > @@ -41,9 +44,12 @@ typedef void *(*gro_tbl_create_fn)(uint16_t
> socket_id,
> >   typedef void (*gro_tbl_destroy_fn)(void *tbl);
> >   typedef uint32_t (*gro_tbl_item_num_fn)(void *tbl);
> >
> > -static gro_tbl_create_fn
> tbl_create_functions[RTE_GRO_TYPE_MAX_NUM];
> > -static gro_tbl_destroy_fn
> tbl_destroy_functions[RTE_GRO_TYPE_MAX_NUM];
> > -static gro_tbl_item_num_fn
> tbl_item_num_functions[RTE_GRO_TYPE_MAX_NUM];
> > +static gro_tbl_create_fn
> tbl_create_functions[RTE_GRO_TYPE_MAX_NUM] = {
> > +	gro_tcp4_tbl_create, NULL};
> > +static gro_tbl_destroy_fn
> tbl_destroy_functions[RTE_GRO_TYPE_MAX_NUM] = {
> > +	gro_tcp4_tbl_destroy, NULL};
> > +static gro_tbl_item_num_fn tbl_item_num_functions[
> > +	RTE_GRO_TYPE_MAX_NUM] = {gro_tcp4_tbl_item_num, NULL};
> >
> >   /**
> >    * GRO table, which is used to merge packets. It keeps many reassembly
> > @@ -130,27 +136,118 @@ void rte_gro_tbl_destroy(void *tbl)
> >   }
> >
> >   uint16_t
> > -rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
> > +rte_gro_reassemble_burst(struct rte_mbuf **pkts,
> >   		uint16_t nb_pkts,
> > -		const struct rte_gro_param *param __rte_unused)
> > +		const struct rte_gro_param *param)
> >   {
> > -	return nb_pkts;
> > +	uint16_t i;
> > +	uint16_t nb_after_gro = nb_pkts;
> > +	uint32_t item_num;
> > +
> > +	/* allocate a reassembly table for TCP/IPv4 GRO */
> > +	struct gro_tcp4_tbl tcp_tbl;
> > +	struct gro_tcp4_key tcp_keys[RTE_GRO_MAX_BURST_ITEM_NUM]
> = {0};
> > +	struct gro_tcp4_item tcp_items[RTE_GRO_MAX_BURST_ITEM_NUM]
> = {0};
> > +
> > +	struct rte_mbuf *unprocess_pkts[nb_pkts];
> > +	uint16_t unprocess_num = 0;
> > +	int32_t ret;
> > +	uint64_t current_time;
> > +
> > +	if ((param->desired_gro_types & RTE_GRO_TCP_IPV4) == 0)
> > +		return nb_pkts;
> > +
> > +	/* get the actual number of items */
> > +	item_num = RTE_MIN(nb_pkts, (param->max_flow_num *
> > +			param->max_item_per_flow));
> > +	item_num = RTE_MIN(item_num,
> RTE_GRO_MAX_BURST_ITEM_NUM);
> > +
> > +	tcp_tbl.keys = tcp_keys;
> > +	tcp_tbl.items = tcp_items;
> > +	tcp_tbl.key_num = 0;
> > +	tcp_tbl.item_num = 0;
> > +	tcp_tbl.max_key_num = item_num;
> > +	tcp_tbl.max_item_num = item_num;
> > +
> > +	current_time = rte_rdtsc();
> > +
> > +	for (i = 0; i < nb_pkts; i++) {
> > +		if (RTE_ETH_IS_IPV4_HDR(pkts[i]->packet_type) &&
> > +				(pkts[i]->packet_type &
> RTE_PTYPE_L4_TCP)) {
> > +			ret = gro_tcp4_reassemble(pkts[i],
> > +					&tcp_tbl,
> > +					param->max_packet_size,
> > +					current_time);
> > +			if (ret > 0)
> > +				/* merge successfully */
> > +				nb_after_gro--;
> > +			else if (ret < 0)
> > +				unprocess_pkts[unprocess_num++] =
> > +					pkts[i];
> > +		} else
> > +			unprocess_pkts[unprocess_num++] =
> > +				pkts[i];
> > +	}
> > +
> > +	/* re-arrange GROed packets */
> > +	if (nb_after_gro < nb_pkts) {
> > +		i = gro_tcp4_tbl_timeout_flush(&tcp_tbl, 0,
> > +				pkts, nb_pkts);
> > +		if (unprocess_num > 0)
> > +			memcpy(&pkts[i], unprocess_pkts,
> > +					sizeof(struct rte_mbuf *) *
> > +					unprocess_num);
> > +	}
> > +	return nb_after_gro;
> >   }
> >
> >   uint16_t
> > -rte_gro_reassemble(struct rte_mbuf **pkts __rte_unused,
> > +rte_gro_reassemble(struct rte_mbuf **pkts,
> >   		uint16_t nb_pkts,
> > -		void *tbl __rte_unused)
> > +		void *tbl)
> >   {
> > -	return nb_pkts;
> > +	uint16_t i, unprocess_num = 0;
> > +	struct rte_mbuf *unprocess_pkts[nb_pkts];
> > +	struct gro_tbl *gro_tbl = (struct gro_tbl *)tbl;
> > +	uint64_t current_time;
> > +
> > +	if ((gro_tbl->desired_gro_types & RTE_GRO_TCP_IPV4) == 0)
> > +		return nb_pkts;
> > +
> > +	current_time = rte_rdtsc();
> > +	for (i = 0; i < nb_pkts; i++) {
> > +		if (RTE_ETH_IS_IPV4_HDR(pkts[i]->packet_type) &&
> > +				(pkts[i]->packet_type &
> RTE_PTYPE_L4_TCP)) {
> > +			if (gro_tcp4_reassemble(pkts[i],
> > +						gro_tbl-
> >tbls[RTE_GRO_TCP_IPV4_INDEX],
> > +						gro_tbl->max_packet_size,
> > +						current_time) < 0)
> > +				unprocess_pkts[unprocess_num++] = pkts[i];
> > +		} else
> > +			unprocess_pkts[unprocess_num++] = pkts[i];
> > +	}
> > +	if (unprocess_num > 0)
> > +		memcpy(pkts, unprocess_pkts,
> > +				sizeof(struct rte_mbuf *) * unprocess_num);
> > +
> > +	return unprocess_num;
> >   }
> >
> >   uint16_t
> > -rte_gro_timeout_flush(void *tbl __rte_unused,
> > -		uint64_t desired_gro_types __rte_unused,
> > -		struct rte_mbuf **out __rte_unused,
> > -		uint16_t max_nb_out __rte_unused)
> > +rte_gro_timeout_flush(void *tbl,
> > +		uint64_t desired_gro_types,
> > +		struct rte_mbuf **out,
> > +		uint16_t max_nb_out)
> >   {
> > +	struct gro_tbl *gro_tbl = (struct gro_tbl *)tbl;
> > +
> > +	desired_gro_types = desired_gro_types &
> > +		gro_tbl->desired_gro_types;
> > +	if (desired_gro_types & RTE_GRO_TCP_IPV4)
> > +		return gro_tcp4_tbl_timeout_flush(
> > +				gro_tbl->tbls[RTE_GRO_TCP_IPV4_INDEX],
> > +				gro_tbl->max_timeout_cycles,
> > +				out, max_nb_out);
> >   	return 0;
> >   }
> >
> > diff --git a/lib/librte_gro/rte_gro.h b/lib/librte_gro/rte_gro.h
> > index 02c9113..68d2fc6 100644
> > --- a/lib/librte_gro/rte_gro.h
> > +++ b/lib/librte_gro/rte_gro.h
> > @@ -45,7 +45,11 @@ extern "C" {
> >
> >   /* max number of supported GRO types */
> >   #define RTE_GRO_TYPE_MAX_NUM 64
> > -#define RTE_GRO_TYPE_SUPPORT_NUM 0	/**< current supported
> GRO num */
> > +#define RTE_GRO_TYPE_SUPPORT_NUM 1	/**< current supported
> GRO num */
> > +
> > +/* TCP/IPv4 GRO flag */
> > +#define RTE_GRO_TCP_IPV4_INDEX 0
> > +#define RTE_GRO_TCP_IPV4 (1ULL << RTE_GRO_TCP_IPV4_INDEX)
> >
> >
> >   struct rte_gro_param {
> > @@ -118,14 +122,14 @@ uint16_t rte_gro_reassemble_burst(struct
> rte_mbuf **pkts,
> >    *
> >    * @param pkts
> >    *  packet to reassemble. Besides, after this function finishes, it
> > - *  keeps the unprocessed packets (i.e. without data or unsupported
> > + *  keeps the unprocessed packets (e.g. without data or unsupported
> >    *  GRO types).
> >    * @param nb_pkts
> >    *  the number of packets to reassemble.
> >    * @param tbl
> >    *  a pointer points to a GRO table.
> >    * @return
> > - *  return the number of unprocessed packets (i.e. without data or
> > + *  return the number of unprocessed packets (e.g. without data or
> >    *  unsupported GRO types). If all packets are processed (merged or
> >    *  inserted into the table), return 0.
> >    */
> > diff --git a/lib/librte_gro/rte_gro_tcp4.c b/lib/librte_gro/rte_gro_tcp4.c
> > new file mode 100644
> > index 0000000..8f2aa86
> > --- /dev/null
> > +++ b/lib/librte_gro/rte_gro_tcp4.c
> > @@ -0,0 +1,439 @@
> > +/*-
> > + *   BSD LICENSE
> > + *
> > + *   Copyright(c) 2017 Intel Corporation. All rights reserved.
> > + *
> > + *   Redistribution and use in source and binary forms, with or without
> > + *   modification, are permitted provided that the following conditions
> > + *   are met:
> > + *
> > + *     * Redistributions of source code must retain the above copyright
> > + *       notice, this list of conditions and the following disclaimer.
> > + *     * Redistributions in binary form must reproduce the above copyright
> > + *       notice, this list of conditions and the following disclaimer in
> > + *       the documentation and/or other materials provided with the
> > + *       distribution.
> > + *     * Neither the name of Intel Corporation nor the names of its
> > + *       contributors may be used to endorse or promote products derived
> > + *       from this software without specific prior written permission.
> > + *
> > + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
> CONTRIBUTORS
> > + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT
> NOT
> > + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
> FITNESS FOR
> > + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
> COPYRIGHT
> > + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
> INCIDENTAL,
> > + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
> BUT NOT
> > + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
> LOSS OF USE,
> > + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
> AND ON ANY
> > + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
> TORT
> > + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
> OF THE USE
> > + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
> DAMAGE.
> > + */
> > +
> > +#include <rte_malloc.h>
> > +#include <rte_mbuf.h>
> > +#include <rte_cycles.h>
> > +
> > +#include <rte_ethdev.h>
> > +#include <rte_ip.h>
> > +#include <rte_tcp.h>
> > +
> > +#include "rte_gro_tcp4.h"
> > +
> > +void *gro_tcp4_tbl_create(uint16_t socket_id,
> > +		uint16_t max_flow_num,
> > +		uint16_t max_item_per_flow)
> > +{
> > +	size_t size;
> > +	uint32_t entries_num;
> > +	struct gro_tcp4_tbl *tbl;
> > +
> > +	entries_num = max_flow_num * max_item_per_flow;
> > +	entries_num = entries_num > GRO_TCP4_TBL_MAX_ITEM_NUM ?
> > +		GRO_TCP4_TBL_MAX_ITEM_NUM : entries_num;
> > +
> > +	if (entries_num == 0)
> > +		return NULL;
> > +
> > +	tbl = (struct gro_tcp4_tbl *)rte_zmalloc_socket(
> > +			__func__,
> > +			sizeof(struct gro_tcp4_tbl),
> > +			RTE_CACHE_LINE_SIZE,
> > +			socket_id);
> > +	if (tbl == NULL)
> > +		return NULL;
> > +
> > +	size = sizeof(struct gro_tcp4_item) * entries_num;
> > +	tbl->items = (struct gro_tcp4_item *)rte_zmalloc_socket(
> > +			__func__,
> > +			size,
> > +			RTE_CACHE_LINE_SIZE,
> > +			socket_id);
> > +	if (tbl->items == NULL) {
> > +		rte_free(tbl);
> > +		return NULL;
> > +	}
> > +	tbl->max_item_num = entries_num;
> > +
> > +	size = sizeof(struct gro_tcp4_key) * entries_num;
> > +	tbl->keys = (struct gro_tcp4_key *)rte_zmalloc_socket(
> > +			__func__,
> > +			size, RTE_CACHE_LINE_SIZE,
> > +			socket_id);
> > +	if (tbl->keys == NULL) {
> > +		rte_free(tbl->items);
> > +		rte_free(tbl);
> > +		return NULL;
> > +	}
> > +	tbl->max_key_num = entries_num;
> > +	return tbl;
> > +}
> > +
> > +void gro_tcp4_tbl_destroy(void *tbl)
> > +{
> > +	struct gro_tcp4_tbl *tcp_tbl = (struct gro_tcp4_tbl *)tbl;
> > +
> > +	if (tcp_tbl) {
> > +		rte_free(tcp_tbl->items);
> > +		rte_free(tcp_tbl->keys);
> > +	}
> > +	rte_free(tcp_tbl);
> > +}
> > +
> > +static struct rte_mbuf *get_mbuf_lastseg(struct rte_mbuf *pkt)
> > +{
> > +	struct rte_mbuf *lastseg = pkt;
> > +
> > +	while (lastseg->next)
> > +		lastseg = lastseg->next;
> > +
> > +	return lastseg;
> > +}
> > +
> > +/**
> > + * merge two TCP/IPv4 packets without updating checksums.
> > + */
> > +static int
> > +merge_two_tcp4_packets(struct gro_tcp4_item *item_src,
> > +		struct rte_mbuf *pkt,
> > +		uint32_t max_packet_size,
> > +		int cmp)
> > +{
> > +	struct rte_mbuf *pkt_head, *pkt_tail, *lastseg;
> > +	struct ipv4_hdr *ipv4_hdr1, *ipv4_hdr0;
> > +	uint16_t tcp_dl1;
> > +
> > +	if (cmp > 0) {
> > +		/* append the new packet into tail */
> > +		pkt_head = item_src->pkt;
> > +		pkt_tail = pkt;
> > +	} else {
> > +		/* append the new packet into head */
> 
> Typo: append -> prepend
Thanks, I will modify it.
> 
> > +		pkt_head = pkt;
> > +		pkt_tail = item_src->pkt;
> > +	}
> > +
> > +	/* parse the tail packet */
> > +	ipv4_hdr1 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt_tail,
> > +				char *) + pkt_tail->l2_len);
> > +	tcp_dl1 = rte_be_to_cpu_16(ipv4_hdr1->total_length) -
> > +		pkt_tail->l3_len - pkt_tail->l4_len;
> > +
> > +	if (pkt_head->pkt_len + tcp_dl1 > max_packet_size)
> > +		return -1;
> > +
> > +	/* remove packet header for the tail packet */
> > +	rte_pktmbuf_adj(pkt_tail, pkt_tail->l2_len +
> > +			pkt_tail->l3_len +
> > +			pkt_tail->l4_len);
> > +
> > +	if (cmp > 0) {
> > +		/* chain the new packet to the tail of the original packet */
> > +		item_src->lastseg->next = pkt_tail;
> > +		/* update the lastseg for the item */
> > +		item_src->lastseg = get_mbuf_lastseg(pkt_tail);
> > +	} else {
> > +		/* chain the original packet to the tail of the new packet */
> > +		lastseg = get_mbuf_lastseg(pkt_head);
> > +		lastseg->next = pkt_tail;
> > +		/* update the item */
> > +		item_src->pkt = pkt_head;
> > +	}
> > +
> > +	/* parse the head packet */
> > +	ipv4_hdr0 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt_head,
> > +				char *) + pkt_head->l2_len);
> > +
> > +	/* update IP header for the head packet */
> > +	ipv4_hdr0->total_length = rte_cpu_to_be_16(
> > +			rte_be_to_cpu_16(
> > +				ipv4_hdr0->total_length)
> > +			+ tcp_dl1);
> > +	ipv4_hdr0->packet_id = ipv4_hdr1->packet_id;
> 
> Why do we bother to change id field in IP header? I think it's not
> necessary.
Currently, when merge packets, we check if IP ID is consecutive. If two
packets (p1 and p2) are merged and we don't update the IP ID for the
merged packet p1-p2, the IP ID of p1-p2 is the IP ID of p1. When p3
comes, we can't merge it with p1-p2, since their IP IDs are not consecutive.
So I update IP ID when two packets are merged. If we don't update IP ID,
that means we shouldn't check if IP ID is consecutive when merge packets.
But Konstantin suggests me to check the IP ID. Maybe we need to confirm
with him again.
> 
> > +
> > +	/* update mbuf metadata for the merged packet */
> > +	pkt_head->nb_segs += pkt_tail->nb_segs;
> > +	pkt_head->pkt_len += pkt_tail->pkt_len;
> > +	return 1;
> > +}
> > +
> > +static int
> > +check_seq_option(struct rte_mbuf *pkt,
> > +		struct tcp_hdr *tcp_hdr,
> > +		uint16_t ip_id,
> > +		uint16_t tcp_hl,
> > +		uint16_t tcp_dl)
> > +{
> > +	struct ipv4_hdr *ipv4_hdr0;
> > +	struct tcp_hdr *tcp_hdr0;
> > +	uint16_t tcp_hl0, tcp_dl0, ip_id0;
> > +	uint32_t sent_seq0, sent_seq;
> > +	uint16_t len;
> > +
> > +	ipv4_hdr0 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt,
> > +				char *) + pkt->l2_len);
> > +	tcp_hdr0 = (struct tcp_hdr *)((char *)ipv4_hdr0 + pkt->l3_len);
> > +
> > +	ip_id0 = rte_be_to_cpu_16(ipv4_hdr0->packet_id);
> > +	tcp_hl0 = pkt->l4_len;
> > +	tcp_dl0 = rte_be_to_cpu_16(ipv4_hdr0->total_length) -
> > +		pkt->l3_len - tcp_hl0;
> > +
> > +	/* check if TCP option fields equal. If not, return 0. */
> > +	len = RTE_MAX(tcp_hl, tcp_hl0) - sizeof(struct tcp_hdr);
> > +	if ((tcp_hl != tcp_hl0) || ((len > 0) &&
> > +				(memcmp(tcp_hdr + 1,
> > +						tcp_hdr0 + 1,
> > +						len) != 0)))
> > +		return 0;
> > +
> > +	/* check if the two packets are neighbors */
> > +	sent_seq0 = rte_be_to_cpu_32(tcp_hdr0->sent_seq);
> > +	sent_seq = rte_be_to_cpu_32(tcp_hdr->sent_seq);
> > +
> > +	if ((sent_seq == (sent_seq0 + tcp_dl0)) &&
> > +			(ip_id == (ip_id0 + 1)))
> > +		/* append the new packet into tail */
> > +		return 1;
> > +	else if (((sent_seq + tcp_dl) == sent_seq0) &&
> > +		((ip_id + 1) == ip_id0))
> > +		/* append the new packet into head */
> > +		return -1;
> > +	else
> > +		return 0;
> > +}
> > +
> > +static uint32_t
> > +find_an_empty_item(struct gro_tcp4_tbl *tbl)
> > +{
> > +	uint32_t i;
> > +
> > +	for (i = 0; i < tbl->max_item_num; i++)
> > +		if (tbl->items[i].pkt == NULL)
> > +			return i;
> > +	return INVALID_ARRAY_INDEX;
> > +}
> > +
> > +static uint32_t
> > +find_an_empty_key(struct gro_tcp4_tbl *tbl)
> > +{
> > +	uint32_t i;
> > +
> > +	for (i = 0; i < tbl->max_key_num; i++)
> > +		if (tbl->keys[i].is_valid == 0)
> > +			return i;
> > +	return INVALID_ARRAY_INDEX;
> > +}
> > +
> > +int32_t
> > +gro_tcp4_reassemble(struct rte_mbuf *pkt,
> > +		struct gro_tcp4_tbl *tbl,
> > +		uint32_t max_packet_size,
> > +		uint64_t start_time)
> > +{
> > +	struct ether_hdr *eth_hdr;
> > +	struct ipv4_hdr *ipv4_hdr;
> > +	struct tcp_hdr *tcp_hdr;
> > +	uint16_t tcp_dl, ip_id;
> > +
> > +	struct tcp4_key key;
> > +	uint32_t cur_idx, prev_idx, item_idx;
> > +	uint32_t i, key_idx;
> > +	int cmp;
> > +
> > +	eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
> > +	ipv4_hdr = (struct ipv4_hdr *)((char *)eth_hdr + pkt->l2_len);
> > +
> > +	/* check if the packet should be processed */
> > +	if (pkt->l3_len < sizeof(struct ipv4_hdr))
> > +		return -1;
> 
> Unnecessary precheck as you already checked it's a valid IPv4 + TCP
> packet in the upper gro framework.
Thanks, I will remove it.
> 
> > +	tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + pkt->l3_len);
> > +	/* if SYN, FIN, RST, 	or URG is set, return immediately */
> > +	if ((tcp_hdr->tcp_flags & (~((uint8_t)TCP_ACK_FLAG))) ||
> > +			((tcp_hdr->tcp_flags & TCP_ACK_FLAG) == 0))
> 
> This can be simplified to: if (tcp_hdr->tcp_flags != TCP_ACK_FLAG)
> 
> And in the above comment, we state that CWR and ECE is also not in the
> scope.
Thanks, I will change it.
> 
> > +		return -1;
> > +	tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - pkt->l3_len
> > +		- pkt->l4_len;
> > +	if (tcp_dl == 0)
> > +		return -1;
> > +
> > +	ip_id = rte_be_to_cpu_16(ipv4_hdr->packet_id);
> > +
> > +	/* find a key and traverse all packets in its item group */
> > +	key.eth_saddr = eth_hdr->s_addr;
> > +	key.eth_daddr = eth_hdr->d_addr;
> > +	key.ip_src_addr = rte_be_to_cpu_32(ipv4_hdr->src_addr);
> > +	key.ip_dst_addr = rte_be_to_cpu_32(ipv4_hdr->dst_addr);
> > +	key.src_port = rte_be_to_cpu_16(tcp_hdr->src_port);
> > +	key.dst_port = rte_be_to_cpu_16(tcp_hdr->dst_port);
> > +	key.recv_ack = rte_be_to_cpu_32(tcp_hdr->recv_ack);
> 
> Why do we bother to care about the endian? We only use them for equality
> comparison.
Oh yes, it's unnecessary to change the endian here. I will modify it.
> 
> > +
> > +	for (i = 0; i < tbl->max_key_num; i++) {
> > +		/* search for a key */
> > +		if ((tbl->keys[i].is_valid == 0) ||
> > +				(memcmp(&(tbl->keys[i].key), &key,
> > +						sizeof(struct tcp4_key)) !=
> 0))
> > +			continue;
> 
> Please put below code out of the for loop to reduce indent. And further,
> better to keep this flow identification into an inline function as I
> think finally we will optimize this when there are lots of flows.
> 
> As Stephen suggested about the memset, I think it also applies to memcpy
> and memcmp here and anywhere for the key.
Thanks, I will use an inline function to compare key.
> 
> > +
> > +		cur_idx = tbl->keys[i].start_index;
> > +		prev_idx = cur_idx;
> > +		while (cur_idx != INVALID_ARRAY_INDEX) {
> 
> do ... while {} can help to reduce one comparison.
Thanks, I will change it.
> 
> > +			cmp = check_seq_option(tbl->items[cur_idx].pkt,
> > +					tcp_hdr,
> 
> 
> > +					ip_id,
> > +					pkt->l4_len,
> > +					tcp_dl);
> > +			if (cmp != 0) {
> > +				if (merge_two_tcp4_packets(
> > +							&(tbl-
> >items[cur_idx]),
> > +							pkt,
> > +							max_packet_size,
> > +							cmp) > 0)
> > +					return 1;
> > +				/**
> > +				 * fail to merge two packets since
> > +				 * it's beyond the max packet length.
> > +				 * Insert it into the item group.
> > +				 */
> > +				item_idx = find_an_empty_item(tbl);
> > +				if (item_idx == INVALID_ARRAY_INDEX)
> > +					return -1;
> > +				tbl->items[prev_idx].next_pkt_idx =
> item_idx;
> > +				tbl->items[item_idx].pkt = pkt;
> > +				tbl->items[item_idx].lastseg =
> > +					get_mbuf_lastseg(pkt);
> > +				tbl->items[item_idx].next_pkt_idx =
> > +					INVALID_ARRAY_INDEX;
> > +				tbl->items[item_idx].start_time = start_time;
> > +				tbl->item_num++;
> 
> Above code piece will be duplicated in the following. Please abstract
> them into a function named, for example, gro_tcp4_add_new_item().
Thanks, I will modify it.
> 
> > +				return 0;
> > +			}
> > +			prev_idx = cur_idx;
> > +			cur_idx = tbl->items[cur_idx].next_pkt_idx;
> > +		}
> > +		/**
> > +		 * find a corresponding item group but fails to find
> > +		 * one packet to merge. Insert it into this item group.
> > +		 */
> > +		item_idx = find_an_empty_item(tbl);
> > +		if (item_idx == INVALID_ARRAY_INDEX)
> > +			return -1;
> > +		tbl->items[prev_idx].next_pkt_idx = item_idx;
> > +		tbl->items[item_idx].pkt = pkt;
> > +		tbl->items[item_idx].lastseg =
> > +			get_mbuf_lastseg(pkt);
> > +		tbl->items[item_idx].next_pkt_idx = INVALID_ARRAY_INDEX;
> > +		tbl->items[item_idx].start_time = start_time;
> > +		tbl->item_num++;
> > +		return 0;
> > +	}
> > +
> > +	/**
> > +	 * merge fail as the given packet has
> > +	 * a new key. So insert a new key.
> > +	 */
> > +	item_idx = find_an_empty_item(tbl);
> > +	key_idx = find_an_empty_key(tbl);
> > +	/**
> > +	 * if current key or item number is greater than the max
> > +	 * value, don't insert the packet into the table and return
> > +	 * immediately.
> > +	 */
> > +	if (item_idx == INVALID_ARRAY_INDEX ||
> > +			key_idx == INVALID_ARRAY_INDEX)
> > +		return -1;
> > +	tbl->items[item_idx].pkt = pkt;
> > +	tbl->items[item_idx].lastseg = get_mbuf_lastseg(pkt);
> > +	tbl->items[item_idx].next_pkt_idx = INVALID_ARRAY_INDEX;
> > +	tbl->items[item_idx].start_time = start_time;
> > +	tbl->item_num++;
> > +
> > +	memcpy(&(tbl->keys[key_idx].key),
> > +			&key, sizeof(struct tcp4_key));
> > +	tbl->keys[key_idx].start_index = item_idx;
> > +	tbl->keys[key_idx].is_valid = 1;
> > +	tbl->key_num++;
> > +
> > +	return 0;
> > +}
> > +
> > +uint16_t
> > +gro_tcp4_tbl_timeout_flush(struct gro_tcp4_tbl *tbl,
> > +		uint64_t timeout_cycles,
> > +		struct rte_mbuf **out,
> > +		uint16_t nb_out)
> > +{
> > +	uint16_t k = 0;
> > +	uint32_t i, j;
> > +	uint64_t current_time;
> > +
> > +	current_time = rte_rdtsc();
> > +	for (i = 0; i < tbl->max_key_num; i++) {
> > +		/* all keys have been checked, return immediately */
> > +		if (tbl->key_num == 0)
> > +			return k;
> > +
> > +		if (tbl->keys[i].is_valid == 0)
> > +			continue;
> > +
> > +		j = tbl->keys[i].start_index;
> > +		while (j != INVALID_ARRAY_INDEX) {
> > +			if (current_time - tbl->items[j].start_time >=
> > +					timeout_cycles) {
> > +				out[k++] = tbl->items[j].pkt;
> > +				tbl->items[j].pkt = NULL;
> > +				tbl->item_num--;
> > +				j = tbl->items[j].next_pkt_idx;
> > +
> > +				/**
> > +				 * delete the key as all of
> > +				 * its packets are flushed.
> > +				 */
> > +				if (j == INVALID_ARRAY_INDEX) {
> > +					tbl->keys[i].is_valid = 0;
> > +					tbl->key_num--;
> > +				} else
> > +					/* update start_index of the key */
> > +					tbl->keys[i].start_index = j;
> > +
> > +				if (k == nb_out)
> > +					return k;
> > +			} else
> > +				/**
> > +				 * left packets of this key won't be
> > +				 * timeout, so go to check other keys.
> > +				 */
> > +				break;
> > +		}
> > +	}
> > +	return k;
> > +}
> > +
> > +uint32_t gro_tcp4_tbl_item_num(void *tbl)
> > +{
> > +	struct gro_tcp4_tbl *gro_tbl = (struct gro_tcp4_tbl *)tbl;
> > +
> > +	if (gro_tbl)
> > +		return gro_tbl->item_num;
> > +	return 0;
> > +}
> > diff --git a/lib/librte_gro/rte_gro_tcp4.h b/lib/librte_gro/rte_gro_tcp4.h
> > new file mode 100644
> > index 0000000..27856f8
> > --- /dev/null
> > +++ b/lib/librte_gro/rte_gro_tcp4.h
> > @@ -0,0 +1,172 @@
> > +/*-
> > + *   BSD LICENSE
> > + *
> > + *   Copyright(c) 2017 Intel Corporation. All rights reserved.
> > + *
> > + *   Redistribution and use in source and binary forms, with or without
> > + *   modification, are permitted provided that the following conditions
> > + *   are met:
> > + *
> > + *     * Redistributions of source code must retain the above copyright
> > + *       notice, this list of conditions and the following disclaimer.
> > + *     * Redistributions in binary form must reproduce the above copyright
> > + *       notice, this list of conditions and the following disclaimer in
> > + *       the documentation and/or other materials provided with the
> > + *       distribution.
> > + *     * Neither the name of Intel Corporation nor the names of its
> > + *       contributors may be used to endorse or promote products derived
> > + *       from this software without specific prior written permission.
> > + *
> > + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
> CONTRIBUTORS
> > + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT
> NOT
> > + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
> FITNESS FOR
> > + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
> COPYRIGHT
> > + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
> INCIDENTAL,
> > + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
> BUT NOT
> > + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
> LOSS OF USE,
> > + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
> AND ON ANY
> > + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
> TORT
> > + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
> OF THE USE
> > + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
> DAMAGE.
> > + */
> > +
> > +#ifndef _RTE_GRO_TCP4_H_
> > +#define _RTE_GRO_TCP4_H_
> > +
> > +#define INVALID_ARRAY_INDEX 0xffffffffUL
> > +#define GRO_TCP4_TBL_MAX_ITEM_NUM (UINT32_MAX - 1)
> > +
> > +/* criteria of mergeing packets */
> > +struct tcp4_key {
> > +	struct ether_addr eth_saddr;
> > +	struct ether_addr eth_daddr;
> 
> Why we need to keep ether addr in the key? Suppose we are running a
> bonding port combined with phy port 0 and phy port1. Then no matter
> which dst mac addr is used, these packets belongs to the same flow.
> 
Currently, we can't handle this case. And as I know, linux also checks
ethernet addresses. You mean we need to handle this case?
> > +	uint32_t ip_src_addr;
> > +	uint32_t ip_dst_addr;
> > +
> > +	uint32_t recv_ack;	/**< acknowledgment sequence number. */
> > +	uint16_t src_port;
> > +	uint16_t dst_port;
> > +};
> > +
> > +struct gro_tcp4_key {
> > +	struct tcp4_key key;
> > +	uint32_t start_index;	/**< the first packet index of the flow */
> > +	uint8_t is_valid;
> > +};
> > +
> > +struct gro_tcp4_item {
> > +	struct rte_mbuf *pkt;	/**< packet address. */
> > +	struct rte_mbuf *lastseg;	/**< last segment of the packet */
> > +	/* the time when the packet in added into the table */
> > +	uint64_t start_time;
> 
> I think the comment should be the time of the eldest packet in this
> item. And an explicit unit statement would be useful for understanding.
Thanks, I will modify it.
> 
> > +	uint32_t next_pkt_idx;	/**< next packet index. */
> 
> The comment is not good enough. We use this field to chain all packets
> belonging to the same flow but not mergeable in sequence.
Thanks, I will change the comment.
> 
> What's more, if we change this field to pointer, we can make this
> structure universal for all gro engines, for example:
> 
> struct gro_item {
>      struct gro_item *next;
>      struct rte_mbuf *first_seg;
>      struct rte_mbuf *last_seg;
>      uint64_t start_time;
> };
> 
> And I think if we store two more fields, seq_begin, and seq_end, in the
> item, we can avoid touching any mbufs' metadata.
Yes, thanks.
> 
> > +};
> > +
> > +/**
> > + * TCP/IPv4 reassembly table.
> > + */
> > +struct gro_tcp4_tbl {
> > +	struct gro_tcp4_item *items;	/**< item array */
> > +	struct gro_tcp4_key *keys;	/**< key array */
> > +	uint32_t item_num;	/**< current item number */
> > +	uint32_t key_num;	/**< current key num */
> > +	uint32_t max_item_num;	/**< item array size */
> > +	uint32_t max_key_num;	/**< key array size */
> > +};
> 
> This structure could be universal for all gro engines;
> 
> struct gro_tbl {
>      void *items; /* will use different initializer for  different gro
> engines */
>      void *keys; /* will use different intializer for different gro
> engines */
>      uint32_t item_num;
>      ...
> }
But I still think it's not a good idea to define a same table structure for
all GRO types, since we only have TCP/IPv4 GRO now and other GRO
types may use different reassembly algorithm.
> 
> > +
> > +/**
> > + * This function creates a TCP/IPv4 reassembly table.
> > + *
> > + * @param socket_id
> > + *  socket index where the Ethernet port connects to.
> > + * @param max_flow_num
> > + *  the maximum number of flows in the TCP/IPv4 GRO table
> > + * @param max_item_per_flow
> > + *  the maximum packet number per flow.
> > + * @return
> > + *  if create successfully, return a pointer which points to the
> > + *  created TCP/IPv4 GRO table. Otherwise, return NULL.
> > + */
> > +void *gro_tcp4_tbl_create(uint16_t socket_id,
> > +		uint16_t max_flow_num,
> > +		uint16_t max_item_per_flow);
> 
> Fix the alignment.
Thanks, I will modify it.
> 
> > +
> > +/**
> > + * This function destroys a TCP/IPv4 reassembly table.
> > + * @param tbl
> > + *  a pointer points to the TCP/IPv4 reassembly table.
> > + */
> > +void gro_tcp4_tbl_destroy(void *tbl);
> > +
> > +/**
> > + * This function searches for a packet in the TCP/IPv4 reassembly table
> > + * to merge with the inputted one. To merge two packets is to chain them
> > + * together and update packet headers. Packets, whose SYN, FIN, RST,
> PSH
> > + * or URG bit is set, are returned immediately. Packets which only have
> > + * packet headers (i.e. without data) are also returned immediately.
> > + * Otherwise, the packet is either merged, or inserted into the table.
> > + * Besides, if there is no available space to insert the packet, this
> > + * function returns immediately too.
> > + *
> > + * This function assumes the inputted packet is with correct IPv4 and
> > + * TCP checksums. And if two packets are merged, it won't re-calculate
> > + * IPv4 and TCP checksums. Besides, if the inputted packet is IP
> > + * fragmented, it assumes the packet is complete (with TCP header).
> > + *
> > + * @param pkt
> > + *  packet to reassemble.
> > + * @param tbl
> > + *  a pointer that points to a TCP/IPv4 reassembly table.
> > + * @param max_packet_size
> > + *  max packet length after merged
> > + * @start_time
> > + *  the start time that the packet is inserted into the table
> > + * @return
> > + *  if the packet doesn't have data, or SYN, FIN, RST, PSH or URG bit is
> > + *  set, or there is no available space in the table to insert a new
> > + *  item or a new key, return a negative value. If the packet is merged
> > + *  successfully, return an positive value. If the packet is inserted
> > + *  into the table, return 0.
> > + */
> > +int32_t
> > +gro_tcp4_reassemble(struct rte_mbuf *pkt,
> > +		struct gro_tcp4_tbl *tbl,
> > +		uint32_t max_packet_size,
> > +		uint64_t start_time);
> 
> Ditto.
> 
> > +
> > +/**
> > + * This function flushes timeout packets in a TCP/IPv4 reassembly table
> > + * to applications, and without updating checksums for merged packets.
> > + * The max number of flushed timeout packets is the element number of
> > + * the array which is used to keep flushed packets.
> > + *
> > + * @param tbl
> > + *  a pointer that points to a TCP GRO table.
> > + * @param timeout_cycles
> > + *  the maximum time that packets can stay in the table.
> > + * @param out
> > + *  pointer array which is used to keep flushed packets.
> > + * @param nb_out
> > + *  the element number of out. It's also the max number of timeout
> > + *  packets that can be flushed finally.
> > + * @return
> > + *  the number of packets that are returned.
> > + */
> > +uint16_t
> > +gro_tcp4_tbl_timeout_flush(struct gro_tcp4_tbl *tbl,
> > +		uint64_t timeout_cycles,
> > +		struct rte_mbuf **out,
> > +		uint16_t nb_out);
> 
> Ditto.
> 
> > +
> > +/**
> > + * This function returns the number of the packets in a TCP/IPv4
> > + * reassembly table.
> > + *
> > + * @param tbl
> > + *  pointer points to a TCP/IPv4 reassembly table.
> > + * @return
> > + *  the number of packets in the table
> > + */
> > +uint32_t
> > +gro_tcp4_tbl_item_num(void *tbl);
> > +#endif
    
    
More information about the dev
mailing list