[dpdk-dev] [PATCH v7 0/3] Support TCP/IPv4 GRO in DPDK

Jiayu Hu jiayu.hu at intel.com
Mon Jun 26 08:43:47 CEST 2017

Generic Receive Offload (GRO) is a widely used SW-based offloading
technique to reduce per-packet processing overhead. It gains performance
by reassembling small packets into large ones. Therefore, we propose to
support GRO in DPDK.

To enable more flexibility to applications, DPDK GRO is implemented as
a user library. Applications explicitly use the GRO library to merge
small packets into large ones. DPDK GRO provides two reassembly modes:
lightweigth mode and heavyweight mode. If applications want to merge
packets in a simple way, they can select the lightweight mode API. If
applications need more fine-grained controls, they can select the
heavyweigth mode API.

This patchset is to support TCP/IPv4 GRO in DPDK. The first patch is to
provide a GRO API framework. The second patch is to support TCP/IPv4 GRO.
The last patch is to enable TCP/IPv4 GRO in testpmd.

We perform many iperf tests to see the performance gains from DPDK GRO.
The test environment is:
a. two 25Gbps physical ports (p0 and p1) are linked together. Assign p0
	to one networking namespace and assign p1 to DPDK;
b. enable TSO for p0. Run iperf client on p0;
c. launch testpmd with p1 and a vhost-user port, and run it in csum
	forwarding mode. Select TCP HW checksum calculation for the
	vhost-user port in csum forwarding engine. And for better
	performance, we select IPv4 and TCP HW checksum calculation for p1
d. launch a VM with one CPU core and a virtio-net port. The VM OS is
	ubuntu 16.04 whose virtio-net driver supports GRO. Enables RX csum
	offloading and mrg_rxbuf for the VM. Iperf server runs in the VM;
e. to run iperf tests, we need to avoid the csum forwarding engine
	compulsorily changes packet mac addresses. SO in our tests, we
	comment these codes out (line701 ~ line704 in csumonly.c).

In each test, we run iperf with the following three configurations:
	- single flow and single TCP client thread 
	- multiple flows and single TCP client thread
	- single flow and parallel TCP client threads

We run above iperf tests on three scenarios:
	s1: disabling kernel GRO and enabling DPDK GRO
	s2: disabling kernel GRO and disabling DPDK GRO
	s3: enabling kernel GRO and disabling DPDK GRO
Comparing the throughput of s1 with s2, we can see the performance gains
from DPDK GRO. Comparing the throughput of s1 and s3, we can compare DPDK
GRO performance with kernel GRO performance.

Test results:
	- DPDK GRO throughput is almost 2 times than the throughput of no
		DPDK GRO and no kernel GRO;
	- DPDK GRO throughput is almost 1.2 times than the throughput of
		kernel GRO.

Change log
- add a macro 'GRO_MAX_BURST_ITEM_NUM' to avoid stack overflow in
- change macro name (_NB to _NUM)
- add '#ifdef __cplusplus ...' in rte_gro.h
- avoid checksum validation and calculation
- enable to process IP fragmented packets
- add a command in testpmd
- update documents
- modify rte_gro_timeout_flush and rte_gro_reassemble_burst
- rename veriable name
- fix some bugs
- fix coding style issues
- implement DPDK GRO as an application-used library
- introduce lightweight and heavyweight working modes to enable
	fine-grained controls to applications
- replace cuckoo hash tables with simpler table structure
- fix compilation issues.
- provide generic reassembly function;
- implement GRO as a device ability:
add APIs for devices to support GRO;
add APIs for applications to enable/disable GRO;
- update testpmd example. 

Jiayu Hu (3):
  lib: add Generic Receive Offload API framework
  lib/gro: add TCP/IPv4 GRO support
  app/testpmd: enable TCP/IPv4 GRO

 app/test-pmd/cmdline.c                      | 125 +++++++++
 app/test-pmd/config.c                       |  37 +++
 app/test-pmd/csumonly.c                     |   5 +
 app/test-pmd/testpmd.c                      |   3 +
 app/test-pmd/testpmd.h                      |  11 +
 config/common_base                          |   5 +
 doc/guides/rel_notes/release_17_08.rst      |   7 +
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  34 +++
 lib/Makefile                                |   2 +
 lib/librte_gro/Makefile                     |  51 ++++
 lib/librte_gro/rte_gro.c                    | 218 +++++++++++++++
 lib/librte_gro/rte_gro.h                    | 209 +++++++++++++++
 lib/librte_gro/rte_gro_tcp.c                | 394 ++++++++++++++++++++++++++++
 lib/librte_gro/rte_gro_tcp.h                | 191 ++++++++++++++
 lib/librte_gro/rte_gro_version.map          |  12 +
 mk/rte.app.mk                               |   1 +
 16 files changed, 1305 insertions(+)
 create mode 100644 lib/librte_gro/Makefile
 create mode 100644 lib/librte_gro/rte_gro.c
 create mode 100644 lib/librte_gro/rte_gro.h
 create mode 100644 lib/librte_gro/rte_gro_tcp.c
 create mode 100644 lib/librte_gro/rte_gro_tcp.h
 create mode 100644 lib/librte_gro/rte_gro_version.map


