[dpdk-dev] [PATCH] examples/ioat: create sample app on ioat driver usage

Bruce Richardson bruce.richardson at intel.com
Thu Sep 12 11:52:51 CEST 2019


On Mon, Sep 09, 2019 at 10:29:38AM +0200, Marcin Baran wrote:
> From: Pawel Modrak <pawelx.modrak at intel.com>
> 
> A new sample app demonstrating use of driver for CBDMA.
> The app receives packets, performs software or hardware
> copy, changes packets' MAC addresses (if enabled) and
> forwards them. The patch includes sample application
> as well as it's guide.
> 
> Signed-off-by: Pawel Modrak <pawelx.modrak at intel.com>
> Signed-off-by: Marcin Baran <marcinx.baran at intel.com>
> ---

Thanks, Pawel and Marcin. Some comments on doc and code inline below.

>  doc/guides/sample_app_ug/index.rst |    1 +
>  doc/guides/sample_app_ug/intro.rst |    4 +
>  doc/guides/sample_app_ug/ioat.rst  |  691 +++++++++++++++++++
>  examples/Makefile                  |    3 +
>  examples/ioat/Makefile             |   54 ++
>  examples/ioat/ioatfwd.c            | 1010 ++++++++++++++++++++++++++++
>  examples/ioat/meson.build          |   13 +
>  examples/meson.build               |    1 +
>  8 files changed, 1777 insertions(+)
>  create mode 100644 doc/guides/sample_app_ug/ioat.rst
>  create mode 100644 examples/ioat/Makefile
>  create mode 100644 examples/ioat/ioatfwd.c
>  create mode 100644 examples/ioat/meson.build
> 
> diff --git a/doc/guides/sample_app_ug/index.rst b/doc/guides/sample_app_ug/index.rst
> index f23f8f59e..a6a1d9e7a 100644
> --- a/doc/guides/sample_app_ug/index.rst
> +++ b/doc/guides/sample_app_ug/index.rst
> @@ -23,6 +23,7 @@ Sample Applications User Guides
>      ip_reassembly
>      kernel_nic_interface
>      keep_alive
> +    ioat
>      l2_forward_crypto
>      l2_forward_job_stats
>      l2_forward_real_virtual
> diff --git a/doc/guides/sample_app_ug/intro.rst b/doc/guides/sample_app_ug/intro.rst
> index 90704194a..74462312f 100644
> --- a/doc/guides/sample_app_ug/intro.rst
> +++ b/doc/guides/sample_app_ug/intro.rst
> @@ -91,6 +91,10 @@ examples are highlighted below.
>    forwarding, or ``l3fwd`` application does forwarding based on Internet
>    Protocol, IPv4 or IPv6 like a simple router.
>  
> +* :doc:`Hardware packet copying<ioat>`: The Hardware packet copying,
> +  or ``ioatfwd`` application demonstrates how to use IOAT rawdev driver for
> +  copying packets between two threads.
> +
>  * :doc:`Packet Distributor<dist_app>`: The Packet Distributor
>    demonstrates how to distribute packets arriving on an Rx port to different
>    cores for processing and transmission.
> diff --git a/doc/guides/sample_app_ug/ioat.rst b/doc/guides/sample_app_ug/ioat.rst
> new file mode 100644
> index 000000000..378d70b81
> --- /dev/null
> +++ b/doc/guides/sample_app_ug/ioat.rst
> @@ -0,0 +1,691 @@
> +..  SPDX-License-Identifier: BSD-3-Clause
> +    Copyright(c) 2019 Intel Corporation.
> +
> +Sample Application of packet copying using Intel\|reg| QuickData Technology
> +============================================================================

You need a space before the |reg| bit otherwise the reg doesn't get the
symbol replaced. It should be "Intel\ |reg|".

> +
> +Overview
> +--------
> +
> +This sample is intended as a demonstration of the basic components of a DPDK
> +forwarding application and example of how to use IOAT driver API to make
> +packets copies.
> +
> +Also while forwarding, the MAC addresses are affected as follows:
> +
> +*   The source MAC address is replaced by the TX port MAC address
> +
> +*   The destination MAC address is replaced by  02:00:00:00:00:TX_PORT_ID
> +
> +This application can be used to compare performance of using software packet
> +copy with copy done using a DMA device for different sizes of packets.
> +The example will print out statistics each second. The stats shows
> +received/send packets and packets dropped or failed to copy.
> +
> +Compiling the Application
> +-------------------------
> +
> +To compile the sample application see :doc:`compiling`.
> +
> +The application is located in the ``ioat`` sub-directory.
> +
> +
> +Running the Application
> +-----------------------
> +
> +In order to run the hardware copy application, the copying device
> +needs to be bound to user-space IO driver.
> +
> +Refer to the *IOAT Rawdev Driver for Intel\ |reg| QuickData Technology*
> +guide for information on using the driver.
> +
> +The application requires a number of command line options:
> +
> +.. code-block:: console
> +
> +    ./build/ioatfwd [EAL options] -- -p MASK [-C CT] [--[no-]mac-updating]

I think the app uses lower case "c" rather than upper case, as called out
below. Since the "CT" value can only be one of two possibilities, I think
you should explicitly include them, e.g. "[-c <sw|rawdev>]". "rawdev" is
also a rather long name for this parameter, why not just call them sw and
hw?

> +
> +where,
> +
> +*   p MASK: A hexadecimal bitmask of the ports to configure
> +
> +*   c CT: Performed packet copy type: software (sw) or hardware using
> +    DMA (rawdev)
> +
> +*   s RS: size of IOAT rawdev ring for hardware copy mode or rte_ring for
> +    software copy mode
> +

This parameter is missing from the summary above.

> +*   --[no-]mac-updating: Whether MAC address of packets should be changed
> +    or not
> +
> +The application can be launched in 2 different configurations:
> +
> +*   Performing software packet copying
> +
> +*   Performing hardware packet copying

Two thoughts here:
a) is this not obvious from the parameter list
b) is not more that two configurations, given that you can have:
  * sw copy with mac updating
  * sw copy without mac updating
  * etc.
not including the possibly port-mask, ring size and single-core vs two core
configurations.

> +
> +Each port needs 2 lcores: one of them receives incoming traffic and makes
> +a copy of each packet. The second lcore then updates MAC address and sends
> +the copy. For each configuration an additional lcore is needed since
> +master lcore in use which is responsible for configuration, statistics
> +printing and safe deinitialization of all ports and devices.
> +

I believe the app also supports running with 1 or 2 cores total, right?

> +The application can use a maximum of 8 ports.

Why this limitation?

> +
> +To run the application in a Linux environment with 3 lcores (one of them
> +is master lcore), 1 port (port 0), software copying and MAC updating issue
> +the command:
> +
> +.. code-block:: console
> +
> +    $ ./build/ioatfwd -l 0-2 -n 2 -- -p 0x1 --mac-updating -c sw
> +
> +To run the application in a Linux environment with 5 lcores (one of them
> +is master lcore), 2 ports (ports 0 and 1), hardware copying and no MAC
> +updating issue the command:
> +
> +.. code-block:: console
> +
> +    $ ./build/ioatfwd -l 0-4 -n 1 -- -p 0x3 --no-mac-updating -c rawdev
> +
> +Refer to the *DPDK Getting Started Guide* for general information on
> +running applications and the Environment Abstraction Layer (EAL) options.
> +

<snip>

> diff --git a/examples/ioat/ioatfwd.c b/examples/ioat/ioatfwd.c
> new file mode 100644
> index 000000000..8463d82f3
> --- /dev/null
> +++ b/examples/ioat/ioatfwd.c
> @@ -0,0 +1,1010 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2019 Intel Corporation
> + */
> +
> +#include <stdint.h>
> +#include <getopt.h>
> +#include <signal.h>
> +#include <stdbool.h>
> +#include <unistd.h>
> +
> +#include <rte_malloc.h>
> +#include <rte_ethdev.h>
> +#include <rte_rawdev.h>
> +#include <rte_ioat_rawdev.h>
> +
> +/* size of ring used for software copying between rx and tx. */
> +#define RTE_LOGTYPE_IOAT RTE_LOGTYPE_USER1
> +#define MAX_PKT_BURST 32
Seems a low max, assume this is actually the default burst size?

> +#define MEMPOOL_CACHE_SIZE 512
> +#define MIN_POOL_SIZE 65536U
> +#define CMD_LINE_OPT_MAC_UPDATING "mac-updating"
> +#define CMD_LINE_OPT_NO_MAC_UPDATING "no-mac-updating"
> +#define CMD_LINE_OPT_PORTMASK "portmask"
> +#define CMD_LINE_OPT_NB_QUEUE "nb-queue"
> +#define CMD_LINE_OPT_COPY_TYPE "copy-type"
> +#define CMD_LINE_OPT_RING_SIZE "ring-size"
> +
> +/* configurable number of RX/TX ring descriptors */
> +#define RX_DEFAULT_RINGSIZE 1024
> +#define TX_DEFAULT_RINGSIZE 1024
> +
> +/* max number of RX queues per port */
> +#define MAX_RX_QUEUES_COUNT 8
> +
> +struct rxtx_port_config {
> +	/* common config */
> +	uint16_t rxtx_port;
> +	uint16_t nb_queues;
> +	/* for software copy mode */
> +	struct rte_ring *rx_to_tx_ring;
> +	/* for IOAT rawdev copy mode */
> +	uint16_t ioat_ids[MAX_RX_QUEUES_COUNT];
> +};
> +
> +struct rxtx_transmission_config {
> +	struct rxtx_port_config ports[RTE_MAX_ETHPORTS];
> +	uint16_t nb_ports;
> +	uint16_t nb_lcores;
> +};
> +
> +/* per-port statistics struct */
> +struct ioat_port_statistics {
> +	uint64_t rx[RTE_MAX_ETHPORTS];
> +	uint64_t tx[RTE_MAX_ETHPORTS];
> +	uint64_t tx_dropped[RTE_MAX_ETHPORTS];
> +	uint64_t copy_dropped[RTE_MAX_ETHPORTS];
> +};
> +struct ioat_port_statistics port_statistics;
> +
> +struct total_statistics {
> +	uint64_t total_packets_dropped;
> +	uint64_t total_packets_tx;
> +	uint64_t total_packets_rx;
> +	uint64_t total_successful_enqueues;
> +	uint64_t total_failed_enqueues;
> +};
> +
> +typedef enum copy_mode_t {
> +#define COPY_MODE_SW "sw"
> +	COPY_MODE_SW_NUM,
> +#define COPY_MODE_IOAT "rawdev"
> +	COPY_MODE_IOAT_NUM,
> +	COPY_MODE_INVALID_NUM,
> +	COPY_MODE_SIZE_NUM = COPY_MODE_INVALID_NUM
> +} copy_mode_t;
> +
> +/* mask of enabled ports */
> +static uint32_t ioat_enabled_port_mask;
> +
> +/* number of RX queues per port */
> +static uint16_t nb_queues = 1;
> +
> +/* MAC updating enabled by default. */
> +static int mac_updating = 1;
> +
> +/* hardare copy mode enabled by default. */
> +static copy_mode_t copy_mode = COPY_MODE_IOAT_NUM;
> +
> +/* size of IOAT rawdev ring for hardware copy mode or
> + * rte_ring for software copy mode
> + */
> +static unsigned short ring_size = 2048;
> +
> +/* global transmission config */
> +struct rxtx_transmission_config cfg;
> +
> +/* configurable number of RX/TX ring descriptors */
> +static uint16_t nb_rxd = RX_DEFAULT_RINGSIZE;
> +static uint16_t nb_txd = TX_DEFAULT_RINGSIZE;
> +
> +static volatile bool force_quit;
> +
> +/* ethernet addresses of ports */
> +static struct rte_ether_addr ioat_ports_eth_addr[RTE_MAX_ETHPORTS];
> +
> +static struct rte_eth_dev_tx_buffer *tx_buffer[RTE_MAX_ETHPORTS];
> +struct rte_mempool *ioat_pktmbuf_pool;
> +
> +/* Print out statistics for one port. */
> +static void
> +print_port_stats(uint16_t port_id)
> +{
> +	printf("\nStatistics for port %u ------------------------------"
> +		"\nPackets sent: %34"PRIu64
> +		"\nPackets received: %30"PRIu64
> +		"\nPackets dropped on tx: %25"PRIu64
> +		"\nPackets dropped on copy: %23"PRIu64,
> +		port_id,
> +		port_statistics.tx[port_id],
> +		port_statistics.rx[port_id],
> +		port_statistics.tx_dropped[port_id],
> +		port_statistics.copy_dropped[port_id]);
> +}
> +
> +/* Print out statistics for one IOAT rawdev device. */
> +static void
> +print_rawdev_stats(uint32_t dev_id, uint64_t *xstats,
> +	uint16_t nb_xstats, struct rte_rawdev_xstats_name *names_xstats)
> +{
> +	uint16_t i;
> +
> +	printf("\nIOAT channel %u", dev_id);
> +	for (i = 0; i < nb_xstats; i++)
> +		if (strstr(names_xstats[i].name, "enqueues"))
> +			printf("\n\t %s: %*"PRIu64,
> +				names_xstats[i].name,
> +				(int)(37 - strlen(names_xstats[i].name)),
> +				xstats[i]);
> +}
> +
> +static void
> +print_total_stats(struct total_statistics *ts)
> +{
> +	printf("\nAggregate statistics ==============================="
> +		"\nTotal packets sent: %28"PRIu64
> +		"\nTotal packets received: %24"PRIu64
> +		"\nTotal packets dropped: %25"PRIu64,
> +		ts->total_packets_tx,
> +		ts->total_packets_rx,
> +		ts->total_packets_dropped);
> +
> +	if (copy_mode == COPY_MODE_IOAT_NUM) {
> +		printf("\nTotal IOAT successful enqueues: %16"PRIu64
> +			"\nTotal IOAT failed enqueues: %20"PRIu64,
> +			ts->total_successful_enqueues,
> +			ts->total_failed_enqueues);
> +	}
> +
> +	printf("\n====================================================\n");
> +}
> +

For these stats, it would be nice to have deltas i.e. pps, rather than (or
as well as) the raw packet count numbers. Since your main stats loop below
has a "sleep(1)" at the start, just computing the deltas should give a good
enough PPS value.

> +/* Print out statistics on packets dropped. */
> +static void
> +print_stats(char *prgname)
> +{
> +	struct total_statistics ts;
> +	uint32_t i, port_id, dev_id;
> +	struct rte_rawdev_xstats_name *names_xstats;
> +	uint64_t *xstats;
> +	unsigned int *ids_xstats;
> +	unsigned int nb_xstats, id_fail_enq, id_succ_enq;
> +	char status_string[120]; /* to print at the top of the output */
> +	int status_strlen;
> +
> +
> +	const char clr[] = { 27, '[', '2', 'J', '\0' };
> +	const char topLeft[] = { 27, '[', '1', ';', '1', 'H', '\0' };
> +
> +	status_strlen = snprintf(status_string, sizeof(status_string),
> +		"%s, ", prgname);
> +	status_strlen += snprintf(status_string + status_strlen,
> +		sizeof(status_string) - status_strlen,
> +		"Worker Threads = %d, ",
> +		rte_lcore_count() > 2 ? 2 : 1);
> +	status_strlen += snprintf(status_string + status_strlen,
> +		sizeof(status_string) - status_strlen,
> +		"Copy Mode = %s,\n", copy_mode == COPY_MODE_SW_NUM ?
> +		COPY_MODE_SW : COPY_MODE_IOAT);
> +	status_strlen += snprintf(status_string + status_strlen,
> +		sizeof(status_string) - status_strlen,
> +		"Updating MAC = %s, ", mac_updating ?
> +		"enabled" : "disabled");
> +	status_strlen += snprintf(status_string + status_strlen,
> +		sizeof(status_string) - status_strlen,
> +		"Rx Queues = %d, ", nb_queues);
> +	status_strlen += snprintf(status_string + status_strlen,
> +		sizeof(status_string) - status_strlen,
> +		"Ring Size = %d\n", ring_size);
> +
> +	/* Allocate memory for xstats names and values */
> +	nb_xstats = rte_rawdev_xstats_names_get(
> +			cfg.ports[0].ioat_ids[0], NULL, 0);
> +
> +	names_xstats = malloc(sizeof(*names_xstats) * nb_xstats);
> +	if (names_xstats == NULL) {
> +		rte_exit(EXIT_FAILURE,
> +			"Error allocating xstat names memory\n");
> +	}
> +	rte_rawdev_xstats_names_get(cfg.ports[0].ioat_ids[0],
> +			names_xstats, nb_xstats);
> +
> +	ids_xstats = malloc(sizeof(*ids_xstats) * nb_xstats);
> +	if (ids_xstats == NULL) {
> +		rte_exit(EXIT_FAILURE,
> +			"Error allocating xstat ids_xstats memory\n");
> +	}
> +
> +	for (i = 0; i < nb_xstats; i++)
> +		ids_xstats[i] = i;
> +
> +	xstats = malloc(sizeof(*xstats) * nb_xstats);
> +	if (xstats == NULL) {
> +		rte_exit(EXIT_FAILURE,
> +			"Error allocating xstat memory\n");
> +	}
> +
> +	/* Get failed/successful enqueues stats index */
> +	id_fail_enq = id_succ_enq = nb_xstats;
> +	for (i = 0; i < nb_xstats; i++) {
> +		if (!strcmp(names_xstats[i].name, "failed_enqueues"))
> +			id_fail_enq = i;
> +		else if (!strcmp(names_xstats[i].name, "successful_enqueues"))
> +			id_succ_enq = i;
> +		if (id_fail_enq < nb_xstats && id_succ_enq < nb_xstats)
> +			break;
> +	}
> +	if (id_fail_enq == nb_xstats || id_succ_enq == nb_xstats) {
> +		rte_exit(EXIT_FAILURE,
> +			"Error getting failed/successful enqueues stats index\n");
> +	}
> +
> +	while (!force_quit) {
> +		/* Sleep for 1 second each round - init sleep allows reading
> +		 * messages from app startup.
> +		 */
> +		sleep(1);
> +
> +		/* Clear screen and move to top left */
> +		printf("%s%s", clr, topLeft);
> +
> +		memset(&ts, 0, sizeof(struct total_statistics));
> +
> +		printf("%s", status_string);
> +
> +		for (i = 0; i < cfg.nb_ports; i++) {
> +			port_id = cfg.ports[i].rxtx_port;
> +			print_port_stats(port_id);
> +
> +			ts.total_packets_dropped +=
> +				port_statistics.tx_dropped[port_id]
> +				+ port_statistics.copy_dropped[port_id];
> +			ts.total_packets_tx += port_statistics.tx[port_id];
> +			ts.total_packets_rx += port_statistics.rx[port_id];
> +
> +			if (copy_mode == COPY_MODE_IOAT_NUM) {
> +				uint32_t j;
> +
> +				for (j = 0; j < cfg.ports[i].nb_queues; j++) {
> +					dev_id = cfg.ports[i].ioat_ids[j];
> +					rte_rawdev_xstats_get(dev_id,
> +						ids_xstats, xstats, nb_xstats);
> +
> +					print_rawdev_stats(dev_id, xstats,
> +						nb_xstats, names_xstats);
> +
> +					ts.total_successful_enqueues +=
> +						xstats[id_succ_enq];
> +					ts.total_failed_enqueues +=
> +						xstats[id_fail_enq];
> +				}
> +			}
> +		}
> +		printf("\n");
> +
> +		print_total_stats(&ts);
> +	}
> +
> +	free(names_xstats);
> +	free(xstats);
> +	free(ids_xstats);
> +}
<snip>


More information about the dev mailing list