[dpdk-dev] [PATCH v11 2/3] lib/gro: add TCP/IPv4 GRO support

Tan, Jianfeng jianfeng.tan at intel.com
Fri Jul 7 08:55:28 CEST 2017



On 7/5/2017 12:08 PM, Jiayu Hu wrote:
> In this patch, we introduce five APIs to support TCP/IPv4 GRO.
> - gro_tcp4_tbl_create: create a TCP/IPv4 reassembly table, which is used
>      to merge packets.
> - gro_tcp4_tbl_destroy: free memory space of a TCP/IPv4 reassembly table.
> - gro_tcp4_tbl_timeout_flush: flush timeout packets from a TCP/IPv4
>      reassembly table.
> - gro_tcp4_tbl_get_count: return the number of packets in a TCP/IPv4
>      reassembly table.
> - gro_tcp4_reassemble: reassemble an inputted TCP/IPv4 packet.
>
> TCP/IPv4 GRO API assumes all inputted packets are with correct IPv4
> and TCP checksums. And TCP/IPv4 GRO API doesn't update IPv4 and TCP
> checksums for merged packets. If inputted packets are IP fragmented,
> TCP/IPv4 GRO API assumes they are complete packets (i.e. with L4
> headers).
>
> In TCP/IPv4 GRO, we use a table structure, called TCP/IPv4 reassembly
> table, to reassemble packets. A TCP/IPv4 reassembly table includes a key
> array and a item array, where the key array keeps the criteria to merge
> packets and the item array keeps packet information.
>
> One key in the key array points to an item group, which consists of
> packets which have the same criteria value. If two packets are able to
> merge, they must be in the same item group. Each key in the key array
> includes two parts:
> - criteria: the criteria of merging packets. If two packets can be
>      merged, they must have the same criteria value.
> - start_index: the index of the first incoming packet of the item group.
>
> Each element in the item array keeps the information of one packet. It
> mainly includes three parts:
> - firstseg: the address of the first segment of the packet
> - lastsegL the address of the last segment of the packet
> - next_pkt_index: the index of the next packet in the same item group.
>      All packets in the same item group are chained by next_pkt_index.
>      With next_pkt_index, we can locate all packets in the same item
>      group one by one.
>
> To process an incoming packet needs three steps:
> a. check if the packet should be processed. Packets with one of the
>      following properties won't be processed:
> 	- FIN, SYN, RST URG, PSH, ECE or CWR bit is set;
> 	- packet payload length is 0.
> b. traverse the key array to find a key which has the same criteria
>      value with the incoming packet. If find, goto step c. Otherwise,
>      insert a new key and insert the packet into the item array.
> c. locate the first packet in the item group via the start_index in the
>      key. Then traverse all packets in the item group via next_pkt_index.
>      If find one packet which can merge with the incoming one, merge them
>      together. If can't find, insert the packet into this item group.
>
> Signed-off-by: Jiayu Hu <jiayu.hu at intel.com>
> ---
>   doc/guides/rel_notes/release_17_08.rst |   7 +
>   lib/librte_gro/Makefile                |   1 +
>   lib/librte_gro/gro_tcp4.c              | 493 +++++++++++++++++++++++++++++++++
>   lib/librte_gro/gro_tcp4.h              | 206 ++++++++++++++
>   lib/librte_gro/rte_gro.c               | 121 +++++++-
>   lib/librte_gro/rte_gro.h               |   5 +-
>   6 files changed, 819 insertions(+), 14 deletions(-)
>   create mode 100644 lib/librte_gro/gro_tcp4.c
>   create mode 100644 lib/librte_gro/gro_tcp4.h
>
> diff --git a/doc/guides/rel_notes/release_17_08.rst b/doc/guides/rel_notes/release_17_08.rst
> index 842f46f..f067247 100644
> --- a/doc/guides/rel_notes/release_17_08.rst
> +++ b/doc/guides/rel_notes/release_17_08.rst
> @@ -75,6 +75,13 @@ New Features
>   
>     Added support for firmwares with multiple Ethernet ports per physical port.
>   
> +* **Add Generic Receive Offload API support.**
> +
> +  Generic Receive Offload (GRO) API supports to reassemble TCP/IPv4
> +  packets. GRO API assumes all inputted packets are with correct
> +  checksums. GRO API doesn't update checksums for merged packets. If
> +  inputted packets are IP fragmented, GRO API assumes they are complete
> +  packets (i.e. with L4 headers).
>   
>   Resolved Issues
>   ---------------
> diff --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile
> index 7e0f128..747eeec 100644
> --- a/lib/librte_gro/Makefile
> +++ b/lib/librte_gro/Makefile
> @@ -43,6 +43,7 @@ LIBABIVER := 1
>   
>   # source files
>   SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro.c
> +SRCS-$(CONFIG_RTE_LIBRTE_GRO) += gro_tcp4.c
>   
>   # install this header file
>   SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include += rte_gro.h
> diff --git a/lib/librte_gro/gro_tcp4.c b/lib/librte_gro/gro_tcp4.c
> new file mode 100644
> index 0000000..703282d
> --- /dev/null
> +++ b/lib/librte_gro/gro_tcp4.c
> @@ -0,0 +1,493 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2017 Intel Corporation. All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of Intel Corporation nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#include <rte_malloc.h>
> +#include <rte_mbuf.h>
> +#include <rte_cycles.h>
> +#include <rte_ethdev.h>
> +#include <rte_ip.h>
> +#include <rte_tcp.h>
> +
> +#include "gro_tcp4.h"
> +
> +void *
> +gro_tcp4_tbl_create(uint16_t socket_id,
> +		uint16_t max_flow_num,
> +		uint16_t max_item_per_flow)
> +{
> +	struct gro_tcp4_tbl *tbl;
> +	size_t size;
> +	uint32_t entries_num;
> +
> +	entries_num = max_flow_num * max_item_per_flow;
> +	entries_num = entries_num > GRO_TCP4_TBL_MAX_ITEM_NUM ?
> +		GRO_TCP4_TBL_MAX_ITEM_NUM : entries_num;

As I commented before, this check is not good;
entries_num is uint32_t; it can never be greater than (UINT32_MAX - 1). 
Plus, we cannot allocate a memory as big as sizeof(struct gro_tcp4_item) 
* UINT32_MAX.
If we really need a check, please make it smaller. Considering each item 
means a flow in some extent, I think we can limit it to 1M flows for now.

(Sorry, I should comment at the definition of GRO_TCP4_TBL_MAX_ITEM_NUM.


> +
> +	if (entries_num == 0)
> +		return NULL;
> +
> +	tbl = rte_zmalloc_socket(__func__,
> +			sizeof(struct gro_tcp4_tbl),
> +			RTE_CACHE_LINE_SIZE,
> +			socket_id);
> +	if (tbl == NULL)
> +		return NULL;
> +
> +	size = sizeof(struct gro_tcp4_item) * entries_num;
> +	tbl->items = rte_zmalloc_socket(__func__,
> +			size,
> +			RTE_CACHE_LINE_SIZE,
> +			socket_id);
> +	if (tbl->items == NULL) {
> +		rte_free(tbl);
> +		return NULL;
> +	}
> +	tbl->max_item_num = entries_num;
> +
> +	size = sizeof(struct gro_tcp4_key) * entries_num;
> +	tbl->keys = rte_zmalloc_socket(__func__,
> +			size,
> +			RTE_CACHE_LINE_SIZE,
> +			socket_id);
> +	if (tbl->keys == NULL) {
> +		rte_free(tbl->items);
> +		rte_free(tbl);
> +		return NULL;
> +	}
> +	tbl->max_key_num = entries_num;
> +
> +	return tbl;
> +}
> +
> +void
> +gro_tcp4_tbl_destroy(void *tbl)
> +{
> +	struct gro_tcp4_tbl *tcp_tbl = tbl;
> +
> +	if (tcp_tbl) {
> +		rte_free(tcp_tbl->items);
> +		rte_free(tcp_tbl->keys);
> +	}
> +	rte_free(tcp_tbl);
> +}
> +
> +/*
> + * merge two TCP/IPv4 packets without updating checksums.
> + * If cmp is larger than 0, append the new packet to the
> + * original packet. Otherwise, pre-pend the new packet to
> + * the original packet.
> + */
> +static inline int
> +merge_two_tcp4_packets(struct gro_tcp4_item *item_src,
> +		struct rte_mbuf *pkt,
> +		uint16_t ip_id,
> +		uint32_t sent_seq,
> +		int cmp)
> +{
> +	struct rte_mbuf *pkt_head, *pkt_tail, *lastseg;
> +	uint16_t tcp_dl1;

We don't have a tcp_dl2, and for readability,  we should not hide "dl"; 
so just change the name to tcp_datalen.

> +
> +	if (cmp > 0) {
> +		pkt_head = item_src->firstseg;
> +		pkt_tail = pkt;
> +	} else {
> +		pkt_head = pkt;
> +		pkt_tail = item_src->firstseg;
> +	}
> +
> +	/* check if the packet length will be beyond the max value */
> +	tcp_dl1 = pkt_tail->pkt_len - pkt_tail->l2_len -
> +		pkt_tail->l3_len - pkt_tail->l4_len;
> +	if (pkt_head->pkt_len - pkt_head->l2_len + tcp_dl1 >
> +			TCP4_MAX_L3_LENGTH)
> +		return -1;
> +
> +	/* remove packet header for the tail packet */
> +	rte_pktmbuf_adj(pkt_tail,
> +			pkt_tail->l2_len +
> +			pkt_tail->l3_len +
> +			pkt_tail->l4_len);
> +
> +	/* chain two packets together */
> +	if (cmp > 0) {
> +		item_src->lastseg->next = pkt;
> +		item_src->lastseg = rte_pktmbuf_lastseg(pkt);
> +		/* update IP ID to the larger value */
> +		item_src->ip_id = ip_id;
> +	} else {
> +		lastseg = rte_pktmbuf_lastseg(pkt);
> +		lastseg->next = item_src->firstseg;
> +		item_src->firstseg = pkt;
> +		/* update sent_seq to the smaller value */
> +		item_src->sent_seq = sent_seq;
> +	}
> +	item_src->nb_merged++;
> +
> +	/* update mbuf metadata for the merged packet */
> +	pkt_head->nb_segs += pkt_tail->nb_segs;
> +	pkt_head->pkt_len += pkt_tail->pkt_len;
> +
> +	return 1;
> +}
> +
> +static inline int
> +check_seq_option(struct gro_tcp4_item *item,
> +		struct tcp_hdr *tcp_hdr,
> +		uint16_t tcp_hl,
> +		uint16_t tcp_dl,
> +		uint16_t ip_id,
> +		uint32_t sent_seq)
> +{
> +	struct rte_mbuf *pkt0 = item->firstseg;
> +	struct ipv4_hdr *ipv4_hdr0;
> +	struct tcp_hdr *tcp_hdr0;
> +	uint16_t tcp_hl0, tcp_dl0;
> +	uint16_t len;
> +
> +	ipv4_hdr0 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt0, char *) +
> +			pkt0->l2_len);
> +	tcp_hdr0 = (struct tcp_hdr *)((char *)ipv4_hdr0 + pkt0->l3_len);
> +	tcp_hl0 = pkt0->l4_len;
> +
> +	/* check if TCP option fields equal. If not, return 0. */
> +	len = RTE_MAX(tcp_hl, tcp_hl0) - sizeof(struct tcp_hdr);
> +	if ((tcp_hl != tcp_hl0) ||
> +			((len > 0) && (memcmp(tcp_hdr + 1,
> +					tcp_hdr0 + 1,
> +					len) != 0)))
> +		return 0;
> +
> +	/* check if the two packets are neighbors */
> +	tcp_dl0 = pkt0->pkt_len - pkt0->l2_len - pkt0->l3_len - tcp_hl0;
> +	if ((sent_seq == (item->sent_seq + tcp_dl0)) &&
> +			(ip_id == (item->ip_id + 1)))
> +		/* append the new packet */
> +		return 1;
> +	else if (((sent_seq + tcp_dl) == item->sent_seq) &&
> +			((ip_id + item->nb_merged) == item->ip_id))
> +		/* pre-pend the new packet */
> +		return -1;
> +	else
> +		return 0;
> +}
> +
> +static inline uint32_t
> +find_an_empty_item(struct gro_tcp4_tbl *tbl)
> +{
> +	uint32_t i;
> +
> +	for (i = 0; i < tbl->max_item_num; i++)
> +		if (tbl->items[i].firstseg == NULL)
> +			return i;
> +	return INVALID_ARRAY_INDEX;
> +}
> +
> +static inline uint32_t
> +find_an_empty_key(struct gro_tcp4_tbl *tbl)
> +{
> +	uint32_t i;
> +
> +	for (i = 0; i < tbl->max_key_num; i++)
> +		if (tbl->keys[i].is_valid == 0)
> +			return i;
> +	return INVALID_ARRAY_INDEX;
> +}
> +
> +static inline uint32_t
> +insert_new_item(struct gro_tcp4_tbl *tbl,
> +		struct rte_mbuf *pkt,
> +		uint16_t ip_id,
> +		uint32_t sent_seq,
> +		uint32_t prev_idx,
> +		uint64_t start_time)
> +{
> +	uint32_t item_idx;
> +
> +	item_idx = find_an_empty_item(tbl);
> +	if (item_idx == INVALID_ARRAY_INDEX)
> +		return INVALID_ARRAY_INDEX;
> +
> +	tbl->items[item_idx].firstseg = pkt;
> +	tbl->items[item_idx].lastseg = rte_pktmbuf_lastseg(pkt);
> +	tbl->items[item_idx].start_time = start_time;
> +	tbl->items[item_idx].next_pkt_idx = INVALID_ARRAY_INDEX;
> +	tbl->items[item_idx].sent_seq = sent_seq;
> +	tbl->items[item_idx].ip_id = ip_id;
> +	tbl->items[item_idx].nb_merged = 1;
> +	tbl->item_num++;
> +
> +	/* if the previous packet exists, chain the new one with it */
> +	if (prev_idx != INVALID_ARRAY_INDEX)
> +		tbl->items[prev_idx].next_pkt_idx = item_idx;
> +
> +	return item_idx;
> +}
> +
> +static inline uint32_t
> +delete_item(struct gro_tcp4_tbl *tbl, uint32_t item_idx)
> +{
> +	uint32_t next_idx = tbl->items[item_idx].next_pkt_idx;
> +
> +	/* set NULL to firstseg to indicate it's an empty item */
> +	tbl->items[item_idx].firstseg = NULL;
> +	tbl->item_num--;
> +
> +	return next_idx;
> +}
> +
> +static inline uint32_t
> +insert_new_key(struct gro_tcp4_tbl *tbl,
> +		struct tcp4_key *key_src,
> +		uint32_t item_idx)
> +{
> +	struct tcp4_key *key_dst;
> +	uint32_t key_idx;
> +
> +	key_idx = find_an_empty_key(tbl);
> +	if (key_idx == INVALID_ARRAY_INDEX)
> +		return INVALID_ARRAY_INDEX;
> +
> +	key_dst = &(tbl->keys[key_idx].key);
> +
> +	ether_addr_copy(&(key_src->eth_saddr), &(key_dst->eth_saddr));
> +	ether_addr_copy(&(key_src->eth_daddr), &(key_dst->eth_daddr));
> +	key_dst->ip_src_addr = key_src->ip_src_addr;
> +	key_dst->ip_dst_addr = key_src->ip_dst_addr;
> +	key_dst->recv_ack = key_src->recv_ack;
> +	key_dst->src_port = key_src->src_port;
> +	key_dst->dst_port = key_src->dst_port;
> +
> +	tbl->keys[key_idx].start_index = item_idx;
> +	tbl->keys[key_idx].is_valid = 1;
> +	tbl->key_num++;
> +
> +	return key_idx;
> +}
> +
> +static inline int
> +compare_key(struct tcp4_key k1, struct tcp4_key k2)
> +{
> +	uint16_t *c1, *c2;
> +
> +	c1 = (uint16_t *)&(k1.eth_saddr);
> +	c2 = (uint16_t *)&(k2.eth_saddr);
> +	if ((c1[0] != c2[0]) || (c1[1] != c2[1]) || (c1[2] != c2[2]))
> +		return -1;
> +	c1 = (uint16_t *)&(k1.eth_daddr);
> +	c2 = (uint16_t *)&(k2.eth_daddr);
> +	if ((c1[0] != c2[0]) || (c1[1] != c2[1]) || (c1[2] != c2[2]))
> +		return -1;
> +	if ((k1.ip_src_addr != k2.ip_src_addr) ||
> +			(k1.ip_dst_addr != k2.ip_dst_addr) ||
> +			(k1.recv_ack != k2.recv_ack) ||
> +			(k1.src_port != k2.src_port) ||
> +			(k1.dst_port != k2.dst_port))
> +		return -1;
> +
> +	return 0;
> +}

Above function can be written in a cleaner way:

static inline int
is_same_key(struct tcp4_key k1, struct tcp4_key k2)
{

         if (is_same_ether_addr(&k1.eth_saddr, &k2.eth_saddr) == 0)
                 return 0;

         if (is_same_ether_addr(&k1.eth_daddr, &k2.eth_daddr) == 0)
                 return 0;

         return ((k1.ip_src_addr == k2.ip_src_addr) &&
                 (k1.ip_dst_addr == k2.ip_dst_addr) &&
                 (k1.recv_ack == k2.recv_ack) &&
                 (k1.src_port == k2.src_port) &&
                 (k1.dst_port == k2.dst_port));
}

> +
> +/*
> + * update packet length and IP ID for the flushed packet.
> + */
> +static inline void
> +update_packet_header(struct gro_tcp4_item *item)
> +{
> +	struct ipv4_hdr *ipv4_hdr;
> +	struct rte_mbuf *pkt = item->firstseg;
> +
> +	ipv4_hdr = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt, char *) +
> +			pkt->l2_len);
> +	ipv4_hdr->total_length = rte_cpu_to_be_16(pkt->pkt_len -
> +			pkt->l2_len);
> +	ipv4_hdr->packet_id = rte_cpu_to_be_16(item->ip_id);
> +}
> +
> +int32_t
> +gro_tcp4_reassemble(struct rte_mbuf *pkt,
> +		struct gro_tcp4_tbl *tbl,
> +		uint64_t start_time)
> +{
> +	struct ether_hdr *eth_hdr;
> +	struct ipv4_hdr *ipv4_hdr;
> +	struct tcp_hdr *tcp_hdr;
> +	uint32_t sent_seq;
> +	uint16_t tcp_dl, ip_id;
> +
> +	struct tcp4_key key;
> +	uint32_t cur_idx, prev_idx, item_idx;
> +	uint32_t i;
> +	int cmp;
> +
> +	eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
> +	ipv4_hdr = (struct ipv4_hdr *)((char *)eth_hdr + pkt->l2_len);
> +	tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + pkt->l3_len);
> +
> +	/*
> +	 * if FIN, SYN, RST, PSH, URG, ECE or CWR is set, return immediately.
> +	 */
> +	if (tcp_hdr->tcp_flags != TCP_ACK_FLAG)
> +		return -1;
> +	/* if payload length is 0, return immediately */
> +	tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - pkt->l3_len -
> +		pkt->l4_len;
> +	if (tcp_dl == 0)
> +		return -1;
> +
> +	ip_id = rte_be_to_cpu_16(ipv4_hdr->packet_id);
> +	sent_seq = rte_be_to_cpu_32(tcp_hdr->sent_seq);
> +
> +	ether_addr_copy(&(eth_hdr->s_addr), &(key.eth_saddr));
> +	ether_addr_copy(&(eth_hdr->d_addr), &(key.eth_daddr));
> +	key.ip_src_addr = ipv4_hdr->src_addr;
> +	key.ip_dst_addr = ipv4_hdr->dst_addr;
> +	key.src_port = tcp_hdr->src_port;
> +	key.dst_port = tcp_hdr->dst_port;
> +	key.recv_ack = tcp_hdr->recv_ack;
> +
> +	/* search for a key */
> +	for (i = 0; i < tbl->max_key_num; i++) {
> +		if ((tbl->keys[i].is_valid == 1) &&
> +				(compare_key(tbl->keys[i].key, key) == 0))
> +			break;

Simplified as:
         for (i = 0; i < tbl->max_key_num; i++)
                 if (tbl->keys[i].is_valid && 
is_same_key(tbl->keys[i].key, key))
                         break;

> +	}
> +
> +	/* can't find a key, so insert a new key and a new item. */
> +	if (i == tbl->max_key_num) {
> +		item_idx = insert_new_item(tbl, pkt, ip_id, sent_seq,
> +				INVALID_ARRAY_INDEX, start_time);
> +		if (item_idx == INVALID_ARRAY_INDEX)
> +			return -1;
> +		if (insert_new_key(tbl, &key, item_idx) ==
> +				INVALID_ARRAY_INDEX) {
> +			/* fail to insert a new key, delete the inserted item */
> +			delete_item(tbl, item_idx);
> +			return -1;
> +		}
> +		return 0;
> +	}
> +
> +	/* traverse all packets in the item group to find one to merge */
> +	cur_idx = tbl->keys[i].start_index;
> +	prev_idx = cur_idx;
> +	do {
> +		cmp = check_seq_option(&(tbl->items[cur_idx]), tcp_hdr,
> +				pkt->l4_len, tcp_dl, ip_id, sent_seq);
> +		if (cmp != 0) {
> +			if (merge_two_tcp4_packets(&(tbl->items[cur_idx]), pkt,
> +						ip_id, sent_seq, cmp) > 0)
> +				return 1;
> +			/*
> +			 * fail to merge two packets since the packet length
> +			 * will be greater than the max value. So insert the
> +			 * packet into the item group.
> +			 */
> +			if (insert_new_item(tbl, pkt, ip_id, sent_seq, prev_idx,
> +						start_time) == INVALID_ARRAY_INDEX)
> +				return -1;
> +			return 0;
> +		}
> +		prev_idx = cur_idx;
> +		cur_idx = tbl->items[cur_idx].next_pkt_idx;
> +	} while (cur_idx != INVALID_ARRAY_INDEX);
> +
> +	/*
> +	 * can't find a packet in the item group to merge,
> +	 * so insert the packet into the item group.
> +	 */
> +	if (insert_new_item(tbl, pkt, ip_id, sent_seq, prev_idx,
> +				start_time) == INVALID_ARRAY_INDEX)
> +		return -1;
> +
> +	return 0;
> +}
> +
> +uint16_t
> +gro_tcp4_tbl_timeout_flush(struct gro_tcp4_tbl *tbl,
> +		uint64_t timeout_cycles,
> +		struct rte_mbuf **out,
> +		uint16_t nb_out)
> +{
> +	uint16_t k = 0;
> +	uint32_t i, j;
> +	uint64_t current_time;
> +
> +	current_time = rte_rdtsc();
> +
> +	for (i = 0; i < tbl->max_key_num; i++) {
> +		/* all keys have been checked, return immediately */
> +		if (tbl->key_num == 0)
> +			return k;
> +
> +		if (tbl->keys[i].is_valid == 0)
> +			continue;
> +
> +		j = tbl->keys[i].start_index;
> +		do {
> +			if ((current_time - tbl->items[j].start_time) >=
> +					timeout_cycles) {
> +				out[k++] = tbl->items[j].firstseg;
> +				update_packet_header(&(tbl->items[j]));
> +				/* delete the item and get the next packet index */
> +				j = delete_item(tbl, j);
> +
> +				/* delete the key as all of packets are flushed */
> +				if (j == INVALID_ARRAY_INDEX) {
> +					tbl->keys[i].is_valid = 0;
> +					tbl->key_num--;
> +				} else
> +					/* update start_index of the key */
> +					tbl->keys[i].start_index = j;
> +
> +				if (k == nb_out)
> +					return k;
> +			} else
> +				/*
> +				 * left packets of this key won't be timeout, so go to
> +				 * check other keys.
> +				 */
> +				break;
> +		} while (j != INVALID_ARRAY_INDEX);
> +	}
> +	return k;
> +}
> +
> +uint32_t
> +gro_tcp4_tbl_get_count(void *tbl)
> +{
> +	struct gro_tcp4_tbl *gro_tbl = tbl;
> +
> +	if (gro_tbl)
> +		return gro_tbl->item_num;
> +
> +	return 0;
> +}
> diff --git a/lib/librte_gro/gro_tcp4.h b/lib/librte_gro/gro_tcp4.h
> new file mode 100644
> index 0000000..4a57451
> --- /dev/null
> +++ b/lib/librte_gro/gro_tcp4.h
> @@ -0,0 +1,206 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2017 Intel Corporation. All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of Intel Corporation nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#ifndef _GRO_TCP4_H_
> +#define _GRO_TCP4_H_
> +
> +#define INVALID_ARRAY_INDEX 0xffffffffUL
> +#define GRO_TCP4_TBL_MAX_ITEM_NUM (UINT32_MAX - 1)
> +
> +/*
> + * the max L3 length of a TCP/IPv4 packet. The L3 length
> + * is the sum of ipv4 header, tcp header and L4 payload.
> + */
> +#define TCP4_MAX_L3_LENGTH UINT16_MAX
> +
> +/* criteria of mergeing packets */
> +struct tcp4_key {
> +	struct ether_addr eth_saddr;
> +	struct ether_addr eth_daddr;
> +	uint32_t ip_src_addr;
> +	uint32_t ip_dst_addr;
> +
> +	uint32_t recv_ack;
> +	uint16_t src_port;
> +	uint16_t dst_port;
> +};
> +
> +struct gro_tcp4_key {
> +	struct tcp4_key key;
> +	/* the index of the first packet in the item group */
> +	uint32_t start_index;
> +	uint8_t is_valid;
> +};
> +
> +struct gro_tcp4_item {
> +	/*
> +	 * first segment of the packet. If the value
> +	 * is NULL, it means the item is empty.
> +	 */
> +	struct rte_mbuf *firstseg;
> +	/* last segment of the packet */
> +	struct rte_mbuf *lastseg;
> +	/*
> +	 * the time when the first packet is inserted
> +	 * into the table. If a packet in the table is
> +	 * merged with an incoming packet, this value
> +	 * won't be updated. We set this value only
> +	 * when the first packet is inserted into the
> +	 * table.
> +	 */
> +	uint64_t start_time;
> +	/*
> +	 * we use next_pkt_idx to chain the packets that
> +	 * have same key value but can't be merged together.
> +	 */
> +	uint32_t next_pkt_idx;
> +	/* the sequence number of the packet */
> +	uint32_t sent_seq;
> +	/* the IP ID of the packet */
> +	uint16_t ip_id;
> +	/* the number of merged packets */
> +	uint16_t nb_merged;
> +};
> +
> +/*
> + * TCP/IPv4 reassembly table structure.
> + */
> +struct gro_tcp4_tbl {
> +	/* item array */
> +	struct gro_tcp4_item *items;
> +	/* key array */
> +	struct gro_tcp4_key *keys;
> +	/* current item number */
> +	uint32_t item_num;
> +	/* current key num */
> +	uint32_t key_num;
> +	/* item array size */
> +	uint32_t max_item_num;
> +	/* key array size */
> +	uint32_t max_key_num;
> +};
> +
> +/**
> + * This function creates a TCP/IPv4 reassembly table.
> + *
> + * @param socket_id
> + *  socket index for allocating TCP/IPv4 reassemblt table
> + * @param max_flow_num
> + *  the maximum number of flows in the TCP/IPv4 GRO table
> + * @param max_item_per_flow
> + *  the maximum packet number per flow.
> + *
> + * @return
> + *  if create successfully, return a pointer which points to the
> + *  created TCP/IPv4 GRO table. Otherwise, return NULL.
> + */
> +void *gro_tcp4_tbl_create(uint16_t socket_id,
> +		uint16_t max_flow_num,
> +		uint16_t max_item_per_flow);
> +
> +/**
> + * This function destroys a TCP/IPv4 reassembly table.
> + *
> + * @param tbl
> + *  a pointer points to the TCP/IPv4 reassembly table.
> + */
> +void gro_tcp4_tbl_destroy(void *tbl);
> +
> +/**
> + * This function searches for a packet in the TCP/IPv4 reassembly table
> + * to merge with the inputted one. To merge two packets is to chain them
> + * together and update packet headers. Packets, whose SYN, FIN, RST, PSH
> + * CWR, ECE or URG bit is set, are returned immediately. Packets which
> + * only have packet headers (i.e. without data) are also returned
> + * immediately. Otherwise, the packet is either merged, or inserted into
> + * the table. Besides, if there is no available space to insert the
> + * packet, this function returns immediately too.
> + *
> + * This function assumes the inputted packet is with correct IPv4 and
> + * TCP checksums. And if two packets are merged, it won't re-calculate
> + * IPv4 and TCP checksums. Besides, if the inputted packet is IP
> + * fragmented, it assumes the packet is complete (with TCP header).
> + *
> + * @param pkt
> + *  packet to reassemble.
> + * @param tbl
> + *  a pointer that points to a TCP/IPv4 reassembly table.
> + * @start_time
> + *  the start time that the packet is inserted into the table
> + *
> + * @return
> + *  if the packet doesn't have data, or SYN, FIN, RST, PSH, CWR, ECE
> + *  or URG bit is set, or there is no available space in the table to
> + *  insert a new item or a new key, return a negative value. If the
> + *  packet is merged successfully, return an positive value. If the
> + *  packet is inserted into the table, return 0.
> + */
> +int32_t gro_tcp4_reassemble(struct rte_mbuf *pkt,
> +		struct gro_tcp4_tbl *tbl,
> +		uint64_t start_time);
> +
> +/**
> + * This function flushes timeout packets in a TCP/IPv4 reassembly table
> + * to applications, and without updating checksums for merged packets.
> + * The max number of flushed timeout packets is the element number of
> + * the array which is used to keep flushed packets.
> + *
> + * @param tbl
> + *  a pointer that points to a TCP GRO table.
> + * @param timeout_cycles
> + *  the maximum time that packets can stay in the table.
> + * @param out
> + *  pointer array which is used to keep flushed packets.
> + * @param nb_out
> + *  the element number of out. It's also the max number of timeout
> + *  packets that can be flushed finally.
> + *
> + * @return
> + *  the number of packets that are returned.
> + */
> +uint16_t gro_tcp4_tbl_timeout_flush(struct gro_tcp4_tbl *tbl,
> +		uint64_t timeout_cycles,
> +		struct rte_mbuf **out,
> +		uint16_t nb_out);
> +
> +/**
> + * This function returns the number of the packets in a TCP/IPv4
> + * reassembly table.
> + *
> + * @param tbl
> + *  pointer points to a TCP/IPv4 reassembly table.
> + *
> + * @return
> + *  the number of packets in the table
> + */
> +uint32_t gro_tcp4_tbl_get_count(void *tbl);
> +#endif
> diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
> index 24e5f2b..7488845 100644
> --- a/lib/librte_gro/rte_gro.c
> +++ b/lib/librte_gro/rte_gro.c
> @@ -32,8 +32,11 @@
>   
>   #include <rte_malloc.h>
>   #include <rte_mbuf.h>
> +#include <rte_cycles.h>
> +#include <rte_ethdev.h>
>   
>   #include "rte_gro.h"
> +#include "gro_tcp4.h"
>   
>   typedef void *(*gro_tbl_create_fn)(uint16_t socket_id,
>   		uint16_t max_flow_num,
> @@ -41,9 +44,12 @@ typedef void *(*gro_tbl_create_fn)(uint16_t socket_id,
>   typedef void (*gro_tbl_destroy_fn)(void *tbl);
>   typedef uint32_t (*gro_tbl_get_count_fn)(void *tbl);
>   
> -static gro_tbl_create_fn tbl_create_fn[RTE_GRO_TYPE_MAX_NUM];
> -static gro_tbl_destroy_fn tbl_destroy_fn[RTE_GRO_TYPE_MAX_NUM];
> -static gro_tbl_get_count_fn tbl_get_count_fn[RTE_GRO_TYPE_MAX_NUM];
> +static gro_tbl_create_fn tbl_create_fn[RTE_GRO_TYPE_MAX_NUM] = {
> +		gro_tcp4_tbl_create, NULL};
> +static gro_tbl_destroy_fn tbl_destroy_fn[RTE_GRO_TYPE_MAX_NUM] = {
> +			gro_tcp4_tbl_destroy, NULL};
> +static gro_tbl_get_count_fn tbl_get_count_fn[RTE_GRO_TYPE_MAX_NUM] = {
> +			gro_tcp4_tbl_get_count, NULL};
>   
>   /*
>    * GRO context structure, which is used to merge packets. It keeps
> @@ -124,27 +130,116 @@ rte_gro_ctx_destroy(void *ctx)
>   }
>   
>   uint16_t
> -rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
> +rte_gro_reassemble_burst(struct rte_mbuf **pkts,
>   		uint16_t nb_pkts,
> -		const struct rte_gro_param *param __rte_unused)
> +		const struct rte_gro_param *param)
>   {
> -	return nb_pkts;
> +	uint16_t i;
> +	uint16_t nb_after_gro = nb_pkts;
> +	uint32_t item_num;
> +
> +	/* allocate a reassembly table for TCP/IPv4 GRO */
> +	struct gro_tcp4_tbl tcp_tbl;
> +	struct gro_tcp4_key tcp_keys[RTE_GRO_MAX_BURST_ITEM_NUM] = {0};
> +	struct gro_tcp4_item tcp_items[RTE_GRO_MAX_BURST_ITEM_NUM] = {0};
> +
> +	struct rte_mbuf *unprocess_pkts[nb_pkts];
> +	uint16_t unprocess_num = 0;
> +	int32_t ret;
> +	uint64_t current_time;
> +
> +	if ((param->gro_types & RTE_GRO_TCP_IPV4) == 0)
> +		return nb_pkts;
> +
> +	/* get the actual number of packets */
> +	item_num = RTE_MIN(nb_pkts, (param->max_flow_num *
> +			param->max_item_per_flow));
> +	item_num = RTE_MIN(item_num, RTE_GRO_MAX_BURST_ITEM_NUM);
> +
> +	tcp_tbl.keys = tcp_keys;
> +	tcp_tbl.items = tcp_items;
> +	tcp_tbl.key_num = 0;
> +	tcp_tbl.item_num = 0;
> +	tcp_tbl.max_key_num = item_num;
> +	tcp_tbl.max_item_num = item_num;
> +
> +	current_time = rte_rdtsc();
> +
> +	for (i = 0; i < nb_pkts; i++) {
> +		if (RTE_ETH_IS_IPV4_HDR(pkts[i]->packet_type) &&
> +				(pkts[i]->packet_type & RTE_PTYPE_L4_TCP)) {

Keep one style to check the ptypes, either macro or just compare the bit 
like:

pkt->packet_type & (RTE_PTYPE_L3_IP | RTE_PTYPE_L4_TCP) == 
(RTE_PTYPE_L3_IP | RTE_PTYPE_L4_TCP)

> +			ret = gro_tcp4_reassemble(pkts[i],
> +					&tcp_tbl,
> +					current_time);
> +			if (ret > 0)
> +				/* merge successfully */
> +				nb_after_gro--;
> +			else if (ret < 0)
> +				unprocess_pkts[unprocess_num++] = pkts[i];
> +		} else
> +			unprocess_pkts[unprocess_num++] = pkts[i];
> +	}
> +
> +	/* re-arrange GROed packets */
> +	if (nb_after_gro < nb_pkts) {
> +		i = gro_tcp4_tbl_timeout_flush(&tcp_tbl, 0, pkts, nb_pkts);
> +		if (unprocess_num > 0) {
> +			memcpy(&pkts[i], unprocess_pkts,
> +					sizeof(struct rte_mbuf *) * unprocess_num);
> +		}
> +	}
> +
> +	return nb_after_gro;
>   }
>   
>   uint16_t
> -rte_gro_reassemble(struct rte_mbuf **pkts __rte_unused,
> +rte_gro_reassemble(struct rte_mbuf **pkts,
>   		uint16_t nb_pkts,
> -		void *ctx __rte_unused)
> +		void *ctx)
>   {
> -	return nb_pkts;
> +	uint16_t i, unprocess_num = 0;
> +	struct rte_mbuf *unprocess_pkts[nb_pkts];
> +	struct gro_ctx *gro_ctx = ctx;
> +	uint64_t current_time;
> +
> +	if ((gro_ctx->gro_types & RTE_GRO_TCP_IPV4) == 0)
> +		return nb_pkts;
> +
> +	current_time = rte_rdtsc();
> +
> +	for (i = 0; i < nb_pkts; i++) {
> +		if (RTE_ETH_IS_IPV4_HDR(pkts[i]->packet_type) &&
> +				(pkts[i]->packet_type & RTE_PTYPE_L4_TCP)) {
> +			if (gro_tcp4_reassemble(pkts[i],
> +						gro_ctx->tbls[RTE_GRO_TCP_IPV4_INDEX],
> +						current_time) < 0)
> +				unprocess_pkts[unprocess_num++] = pkts[i];
> +		} else
> +			unprocess_pkts[unprocess_num++] = pkts[i];
> +	}
> +	if (unprocess_num > 0) {
> +		memcpy(pkts, unprocess_pkts,
> +				sizeof(struct rte_mbuf *) * unprocess_num);
> +	}
> +
> +	return unprocess_num;
>   }
>   
>   uint16_t
> -rte_gro_timeout_flush(void *ctx __rte_unused,
> -		uint64_t gro_types __rte_unused,
> -		struct rte_mbuf **out __rte_unused,
> -		uint16_t max_nb_out __rte_unused)
> +rte_gro_timeout_flush(void *ctx,
> +		uint64_t gro_types,
> +		struct rte_mbuf **out,
> +		uint16_t max_nb_out)
>   {
> +	struct gro_ctx *gro_ctx = ctx;
> +
> +	gro_types = gro_types & gro_ctx->gro_types;
> +	if (gro_types & RTE_GRO_TCP_IPV4) {
> +		return gro_tcp4_tbl_timeout_flush(
> +				gro_ctx->tbls[RTE_GRO_TCP_IPV4_INDEX],
> +				gro_ctx->max_timeout_cycles,
> +				out, max_nb_out);
> +	}
>   	return 0;
>   }
>   
> diff --git a/lib/librte_gro/rte_gro.h b/lib/librte_gro/rte_gro.h
> index 54a6e82..c2140e6 100644
> --- a/lib/librte_gro/rte_gro.h
> +++ b/lib/librte_gro/rte_gro.h
> @@ -45,8 +45,11 @@ extern "C" {
>   /**< max number of supported GRO types */
>   #define RTE_GRO_TYPE_MAX_NUM 64
>   /**< current supported GRO num */
> -#define RTE_GRO_TYPE_SUPPORT_NUM 0
> +#define RTE_GRO_TYPE_SUPPORT_NUM 1
>   
> +/**< TCP/IPv4 GRO flag */
> +#define RTE_GRO_TCP_IPV4_INDEX 0
> +#define RTE_GRO_TCP_IPV4 (1ULL << RTE_GRO_TCP_IPV4_INDEX)
>   
>   struct rte_gro_param {
>   	/**< desired GRO types */



More information about the dev mailing list