[dpdk-dev] [PATCH] examples/ioat: create sample app on ioat driver usage
Bruce Richardson
bruce.richardson at intel.com
Thu Sep 12 11:52:51 CEST 2019
On Mon, Sep 09, 2019 at 10:29:38AM +0200, Marcin Baran wrote:
> From: Pawel Modrak <pawelx.modrak at intel.com>
>
> A new sample app demonstrating use of driver for CBDMA.
> The app receives packets, performs software or hardware
> copy, changes packets' MAC addresses (if enabled) and
> forwards them. The patch includes sample application
> as well as it's guide.
>
> Signed-off-by: Pawel Modrak <pawelx.modrak at intel.com>
> Signed-off-by: Marcin Baran <marcinx.baran at intel.com>
> ---
Thanks, Pawel and Marcin. Some comments on doc and code inline below.
> doc/guides/sample_app_ug/index.rst | 1 +
> doc/guides/sample_app_ug/intro.rst | 4 +
> doc/guides/sample_app_ug/ioat.rst | 691 +++++++++++++++++++
> examples/Makefile | 3 +
> examples/ioat/Makefile | 54 ++
> examples/ioat/ioatfwd.c | 1010 ++++++++++++++++++++++++++++
> examples/ioat/meson.build | 13 +
> examples/meson.build | 1 +
> 8 files changed, 1777 insertions(+)
> create mode 100644 doc/guides/sample_app_ug/ioat.rst
> create mode 100644 examples/ioat/Makefile
> create mode 100644 examples/ioat/ioatfwd.c
> create mode 100644 examples/ioat/meson.build
>
> diff --git a/doc/guides/sample_app_ug/index.rst b/doc/guides/sample_app_ug/index.rst
> index f23f8f59e..a6a1d9e7a 100644
> --- a/doc/guides/sample_app_ug/index.rst
> +++ b/doc/guides/sample_app_ug/index.rst
> @@ -23,6 +23,7 @@ Sample Applications User Guides
> ip_reassembly
> kernel_nic_interface
> keep_alive
> + ioat
> l2_forward_crypto
> l2_forward_job_stats
> l2_forward_real_virtual
> diff --git a/doc/guides/sample_app_ug/intro.rst b/doc/guides/sample_app_ug/intro.rst
> index 90704194a..74462312f 100644
> --- a/doc/guides/sample_app_ug/intro.rst
> +++ b/doc/guides/sample_app_ug/intro.rst
> @@ -91,6 +91,10 @@ examples are highlighted below.
> forwarding, or ``l3fwd`` application does forwarding based on Internet
> Protocol, IPv4 or IPv6 like a simple router.
>
> +* :doc:`Hardware packet copying<ioat>`: The Hardware packet copying,
> + or ``ioatfwd`` application demonstrates how to use IOAT rawdev driver for
> + copying packets between two threads.
> +
> * :doc:`Packet Distributor<dist_app>`: The Packet Distributor
> demonstrates how to distribute packets arriving on an Rx port to different
> cores for processing and transmission.
> diff --git a/doc/guides/sample_app_ug/ioat.rst b/doc/guides/sample_app_ug/ioat.rst
> new file mode 100644
> index 000000000..378d70b81
> --- /dev/null
> +++ b/doc/guides/sample_app_ug/ioat.rst
> @@ -0,0 +1,691 @@
> +.. SPDX-License-Identifier: BSD-3-Clause
> + Copyright(c) 2019 Intel Corporation.
> +
> +Sample Application of packet copying using Intel\|reg| QuickData Technology
> +============================================================================
You need a space before the |reg| bit otherwise the reg doesn't get the
symbol replaced. It should be "Intel\ |reg|".
> +
> +Overview
> +--------
> +
> +This sample is intended as a demonstration of the basic components of a DPDK
> +forwarding application and example of how to use IOAT driver API to make
> +packets copies.
> +
> +Also while forwarding, the MAC addresses are affected as follows:
> +
> +* The source MAC address is replaced by the TX port MAC address
> +
> +* The destination MAC address is replaced by 02:00:00:00:00:TX_PORT_ID
> +
> +This application can be used to compare performance of using software packet
> +copy with copy done using a DMA device for different sizes of packets.
> +The example will print out statistics each second. The stats shows
> +received/send packets and packets dropped or failed to copy.
> +
> +Compiling the Application
> +-------------------------
> +
> +To compile the sample application see :doc:`compiling`.
> +
> +The application is located in the ``ioat`` sub-directory.
> +
> +
> +Running the Application
> +-----------------------
> +
> +In order to run the hardware copy application, the copying device
> +needs to be bound to user-space IO driver.
> +
> +Refer to the *IOAT Rawdev Driver for Intel\ |reg| QuickData Technology*
> +guide for information on using the driver.
> +
> +The application requires a number of command line options:
> +
> +.. code-block:: console
> +
> + ./build/ioatfwd [EAL options] -- -p MASK [-C CT] [--[no-]mac-updating]
I think the app uses lower case "c" rather than upper case, as called out
below. Since the "CT" value can only be one of two possibilities, I think
you should explicitly include them, e.g. "[-c <sw|rawdev>]". "rawdev" is
also a rather long name for this parameter, why not just call them sw and
hw?
> +
> +where,
> +
> +* p MASK: A hexadecimal bitmask of the ports to configure
> +
> +* c CT: Performed packet copy type: software (sw) or hardware using
> + DMA (rawdev)
> +
> +* s RS: size of IOAT rawdev ring for hardware copy mode or rte_ring for
> + software copy mode
> +
This parameter is missing from the summary above.
> +* --[no-]mac-updating: Whether MAC address of packets should be changed
> + or not
> +
> +The application can be launched in 2 different configurations:
> +
> +* Performing software packet copying
> +
> +* Performing hardware packet copying
Two thoughts here:
a) is this not obvious from the parameter list
b) is not more that two configurations, given that you can have:
* sw copy with mac updating
* sw copy without mac updating
* etc.
not including the possibly port-mask, ring size and single-core vs two core
configurations.
> +
> +Each port needs 2 lcores: one of them receives incoming traffic and makes
> +a copy of each packet. The second lcore then updates MAC address and sends
> +the copy. For each configuration an additional lcore is needed since
> +master lcore in use which is responsible for configuration, statistics
> +printing and safe deinitialization of all ports and devices.
> +
I believe the app also supports running with 1 or 2 cores total, right?
> +The application can use a maximum of 8 ports.
Why this limitation?
> +
> +To run the application in a Linux environment with 3 lcores (one of them
> +is master lcore), 1 port (port 0), software copying and MAC updating issue
> +the command:
> +
> +.. code-block:: console
> +
> + $ ./build/ioatfwd -l 0-2 -n 2 -- -p 0x1 --mac-updating -c sw
> +
> +To run the application in a Linux environment with 5 lcores (one of them
> +is master lcore), 2 ports (ports 0 and 1), hardware copying and no MAC
> +updating issue the command:
> +
> +.. code-block:: console
> +
> + $ ./build/ioatfwd -l 0-4 -n 1 -- -p 0x3 --no-mac-updating -c rawdev
> +
> +Refer to the *DPDK Getting Started Guide* for general information on
> +running applications and the Environment Abstraction Layer (EAL) options.
> +
<snip>
> diff --git a/examples/ioat/ioatfwd.c b/examples/ioat/ioatfwd.c
> new file mode 100644
> index 000000000..8463d82f3
> --- /dev/null
> +++ b/examples/ioat/ioatfwd.c
> @@ -0,0 +1,1010 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2019 Intel Corporation
> + */
> +
> +#include <stdint.h>
> +#include <getopt.h>
> +#include <signal.h>
> +#include <stdbool.h>
> +#include <unistd.h>
> +
> +#include <rte_malloc.h>
> +#include <rte_ethdev.h>
> +#include <rte_rawdev.h>
> +#include <rte_ioat_rawdev.h>
> +
> +/* size of ring used for software copying between rx and tx. */
> +#define RTE_LOGTYPE_IOAT RTE_LOGTYPE_USER1
> +#define MAX_PKT_BURST 32
Seems a low max, assume this is actually the default burst size?
> +#define MEMPOOL_CACHE_SIZE 512
> +#define MIN_POOL_SIZE 65536U
> +#define CMD_LINE_OPT_MAC_UPDATING "mac-updating"
> +#define CMD_LINE_OPT_NO_MAC_UPDATING "no-mac-updating"
> +#define CMD_LINE_OPT_PORTMASK "portmask"
> +#define CMD_LINE_OPT_NB_QUEUE "nb-queue"
> +#define CMD_LINE_OPT_COPY_TYPE "copy-type"
> +#define CMD_LINE_OPT_RING_SIZE "ring-size"
> +
> +/* configurable number of RX/TX ring descriptors */
> +#define RX_DEFAULT_RINGSIZE 1024
> +#define TX_DEFAULT_RINGSIZE 1024
> +
> +/* max number of RX queues per port */
> +#define MAX_RX_QUEUES_COUNT 8
> +
> +struct rxtx_port_config {
> + /* common config */
> + uint16_t rxtx_port;
> + uint16_t nb_queues;
> + /* for software copy mode */
> + struct rte_ring *rx_to_tx_ring;
> + /* for IOAT rawdev copy mode */
> + uint16_t ioat_ids[MAX_RX_QUEUES_COUNT];
> +};
> +
> +struct rxtx_transmission_config {
> + struct rxtx_port_config ports[RTE_MAX_ETHPORTS];
> + uint16_t nb_ports;
> + uint16_t nb_lcores;
> +};
> +
> +/* per-port statistics struct */
> +struct ioat_port_statistics {
> + uint64_t rx[RTE_MAX_ETHPORTS];
> + uint64_t tx[RTE_MAX_ETHPORTS];
> + uint64_t tx_dropped[RTE_MAX_ETHPORTS];
> + uint64_t copy_dropped[RTE_MAX_ETHPORTS];
> +};
> +struct ioat_port_statistics port_statistics;
> +
> +struct total_statistics {
> + uint64_t total_packets_dropped;
> + uint64_t total_packets_tx;
> + uint64_t total_packets_rx;
> + uint64_t total_successful_enqueues;
> + uint64_t total_failed_enqueues;
> +};
> +
> +typedef enum copy_mode_t {
> +#define COPY_MODE_SW "sw"
> + COPY_MODE_SW_NUM,
> +#define COPY_MODE_IOAT "rawdev"
> + COPY_MODE_IOAT_NUM,
> + COPY_MODE_INVALID_NUM,
> + COPY_MODE_SIZE_NUM = COPY_MODE_INVALID_NUM
> +} copy_mode_t;
> +
> +/* mask of enabled ports */
> +static uint32_t ioat_enabled_port_mask;
> +
> +/* number of RX queues per port */
> +static uint16_t nb_queues = 1;
> +
> +/* MAC updating enabled by default. */
> +static int mac_updating = 1;
> +
> +/* hardare copy mode enabled by default. */
> +static copy_mode_t copy_mode = COPY_MODE_IOAT_NUM;
> +
> +/* size of IOAT rawdev ring for hardware copy mode or
> + * rte_ring for software copy mode
> + */
> +static unsigned short ring_size = 2048;
> +
> +/* global transmission config */
> +struct rxtx_transmission_config cfg;
> +
> +/* configurable number of RX/TX ring descriptors */
> +static uint16_t nb_rxd = RX_DEFAULT_RINGSIZE;
> +static uint16_t nb_txd = TX_DEFAULT_RINGSIZE;
> +
> +static volatile bool force_quit;
> +
> +/* ethernet addresses of ports */
> +static struct rte_ether_addr ioat_ports_eth_addr[RTE_MAX_ETHPORTS];
> +
> +static struct rte_eth_dev_tx_buffer *tx_buffer[RTE_MAX_ETHPORTS];
> +struct rte_mempool *ioat_pktmbuf_pool;
> +
> +/* Print out statistics for one port. */
> +static void
> +print_port_stats(uint16_t port_id)
> +{
> + printf("\nStatistics for port %u ------------------------------"
> + "\nPackets sent: %34"PRIu64
> + "\nPackets received: %30"PRIu64
> + "\nPackets dropped on tx: %25"PRIu64
> + "\nPackets dropped on copy: %23"PRIu64,
> + port_id,
> + port_statistics.tx[port_id],
> + port_statistics.rx[port_id],
> + port_statistics.tx_dropped[port_id],
> + port_statistics.copy_dropped[port_id]);
> +}
> +
> +/* Print out statistics for one IOAT rawdev device. */
> +static void
> +print_rawdev_stats(uint32_t dev_id, uint64_t *xstats,
> + uint16_t nb_xstats, struct rte_rawdev_xstats_name *names_xstats)
> +{
> + uint16_t i;
> +
> + printf("\nIOAT channel %u", dev_id);
> + for (i = 0; i < nb_xstats; i++)
> + if (strstr(names_xstats[i].name, "enqueues"))
> + printf("\n\t %s: %*"PRIu64,
> + names_xstats[i].name,
> + (int)(37 - strlen(names_xstats[i].name)),
> + xstats[i]);
> +}
> +
> +static void
> +print_total_stats(struct total_statistics *ts)
> +{
> + printf("\nAggregate statistics ==============================="
> + "\nTotal packets sent: %28"PRIu64
> + "\nTotal packets received: %24"PRIu64
> + "\nTotal packets dropped: %25"PRIu64,
> + ts->total_packets_tx,
> + ts->total_packets_rx,
> + ts->total_packets_dropped);
> +
> + if (copy_mode == COPY_MODE_IOAT_NUM) {
> + printf("\nTotal IOAT successful enqueues: %16"PRIu64
> + "\nTotal IOAT failed enqueues: %20"PRIu64,
> + ts->total_successful_enqueues,
> + ts->total_failed_enqueues);
> + }
> +
> + printf("\n====================================================\n");
> +}
> +
For these stats, it would be nice to have deltas i.e. pps, rather than (or
as well as) the raw packet count numbers. Since your main stats loop below
has a "sleep(1)" at the start, just computing the deltas should give a good
enough PPS value.
> +/* Print out statistics on packets dropped. */
> +static void
> +print_stats(char *prgname)
> +{
> + struct total_statistics ts;
> + uint32_t i, port_id, dev_id;
> + struct rte_rawdev_xstats_name *names_xstats;
> + uint64_t *xstats;
> + unsigned int *ids_xstats;
> + unsigned int nb_xstats, id_fail_enq, id_succ_enq;
> + char status_string[120]; /* to print at the top of the output */
> + int status_strlen;
> +
> +
> + const char clr[] = { 27, '[', '2', 'J', '\0' };
> + const char topLeft[] = { 27, '[', '1', ';', '1', 'H', '\0' };
> +
> + status_strlen = snprintf(status_string, sizeof(status_string),
> + "%s, ", prgname);
> + status_strlen += snprintf(status_string + status_strlen,
> + sizeof(status_string) - status_strlen,
> + "Worker Threads = %d, ",
> + rte_lcore_count() > 2 ? 2 : 1);
> + status_strlen += snprintf(status_string + status_strlen,
> + sizeof(status_string) - status_strlen,
> + "Copy Mode = %s,\n", copy_mode == COPY_MODE_SW_NUM ?
> + COPY_MODE_SW : COPY_MODE_IOAT);
> + status_strlen += snprintf(status_string + status_strlen,
> + sizeof(status_string) - status_strlen,
> + "Updating MAC = %s, ", mac_updating ?
> + "enabled" : "disabled");
> + status_strlen += snprintf(status_string + status_strlen,
> + sizeof(status_string) - status_strlen,
> + "Rx Queues = %d, ", nb_queues);
> + status_strlen += snprintf(status_string + status_strlen,
> + sizeof(status_string) - status_strlen,
> + "Ring Size = %d\n", ring_size);
> +
> + /* Allocate memory for xstats names and values */
> + nb_xstats = rte_rawdev_xstats_names_get(
> + cfg.ports[0].ioat_ids[0], NULL, 0);
> +
> + names_xstats = malloc(sizeof(*names_xstats) * nb_xstats);
> + if (names_xstats == NULL) {
> + rte_exit(EXIT_FAILURE,
> + "Error allocating xstat names memory\n");
> + }
> + rte_rawdev_xstats_names_get(cfg.ports[0].ioat_ids[0],
> + names_xstats, nb_xstats);
> +
> + ids_xstats = malloc(sizeof(*ids_xstats) * nb_xstats);
> + if (ids_xstats == NULL) {
> + rte_exit(EXIT_FAILURE,
> + "Error allocating xstat ids_xstats memory\n");
> + }
> +
> + for (i = 0; i < nb_xstats; i++)
> + ids_xstats[i] = i;
> +
> + xstats = malloc(sizeof(*xstats) * nb_xstats);
> + if (xstats == NULL) {
> + rte_exit(EXIT_FAILURE,
> + "Error allocating xstat memory\n");
> + }
> +
> + /* Get failed/successful enqueues stats index */
> + id_fail_enq = id_succ_enq = nb_xstats;
> + for (i = 0; i < nb_xstats; i++) {
> + if (!strcmp(names_xstats[i].name, "failed_enqueues"))
> + id_fail_enq = i;
> + else if (!strcmp(names_xstats[i].name, "successful_enqueues"))
> + id_succ_enq = i;
> + if (id_fail_enq < nb_xstats && id_succ_enq < nb_xstats)
> + break;
> + }
> + if (id_fail_enq == nb_xstats || id_succ_enq == nb_xstats) {
> + rte_exit(EXIT_FAILURE,
> + "Error getting failed/successful enqueues stats index\n");
> + }
> +
> + while (!force_quit) {
> + /* Sleep for 1 second each round - init sleep allows reading
> + * messages from app startup.
> + */
> + sleep(1);
> +
> + /* Clear screen and move to top left */
> + printf("%s%s", clr, topLeft);
> +
> + memset(&ts, 0, sizeof(struct total_statistics));
> +
> + printf("%s", status_string);
> +
> + for (i = 0; i < cfg.nb_ports; i++) {
> + port_id = cfg.ports[i].rxtx_port;
> + print_port_stats(port_id);
> +
> + ts.total_packets_dropped +=
> + port_statistics.tx_dropped[port_id]
> + + port_statistics.copy_dropped[port_id];
> + ts.total_packets_tx += port_statistics.tx[port_id];
> + ts.total_packets_rx += port_statistics.rx[port_id];
> +
> + if (copy_mode == COPY_MODE_IOAT_NUM) {
> + uint32_t j;
> +
> + for (j = 0; j < cfg.ports[i].nb_queues; j++) {
> + dev_id = cfg.ports[i].ioat_ids[j];
> + rte_rawdev_xstats_get(dev_id,
> + ids_xstats, xstats, nb_xstats);
> +
> + print_rawdev_stats(dev_id, xstats,
> + nb_xstats, names_xstats);
> +
> + ts.total_successful_enqueues +=
> + xstats[id_succ_enq];
> + ts.total_failed_enqueues +=
> + xstats[id_fail_enq];
> + }
> + }
> + }
> + printf("\n");
> +
> + print_total_stats(&ts);
> + }
> +
> + free(names_xstats);
> + free(xstats);
> + free(ids_xstats);
> +}
<snip>
More information about the dev
mailing list