[dpdk-dev] [PATCH v2] librte_pmd_packet: add PMD for AF_PACKET-based virtual devices
    Zhou, Danny 
    danny.zhou at intel.com
       
    Tue Jul 15 02:15:49 CEST 2014
    
    
  
According to my performance measurement results for 64B small packet, 1 queue perf. is better than 16 queues (1.35M pps vs. 0.93M pps) which make sense to me as for 16 queues case more CPU cycles (16 queues' 87% vs. 1 queue' 80%) in kernel land needed for NAPI-enabled ixgbe driver to switch between polling and interrupt modes in order to service per-queue rx interrupts, so more context switch overhead involved. Also, since the eth_packet_rx/eth_packet_tx routines involves in two memory copies between DPDK mbuf and pbuf for each packet, it can hardly achieve high performance unless packet are directly DMA to mbuf which needs ixgbe driver to support.
> -----Original Message-----
> From: John W. Linville [mailto:linville at tuxdriver.com]
> Sent: Tuesday, July 15, 2014 2:25 AM
> To: dev at dpdk.org
> Cc: Thomas Monjalon; Richardson, Bruce; Zhou, Danny
> Subject: [PATCH v2] librte_pmd_packet: add PMD for AF_PACKET-based virtual
> devices
> 
> This is a Linux-specific virtual PMD driver backed by an AF_PACKET socket.  This
> implementation uses mmap'ed ring buffers to limit copying and user/kernel
> transitions.  The PACKET_FANOUT_HASH behavior of AF_PACKET is used for
> frame reception.  In the current implementation, Tx and Rx queues are always paired,
> and therefore are always equal in number -- changing this would be a Simple Matter
> Of Programming.
> 
> Interfaces of this type are created with a command line option like
> "--vdev=eth_packet0,iface=...".  There are a number of options availabe as
> arguments:
> 
>  - Interface is chosen by "iface" (required)
>  - Number of queue pairs set by "qpairs" (optional, default: 1)
>  - AF_PACKET MMAP block size set by "blocksz" (optional, default: 4096)
>  - AF_PACKET MMAP frame size set by "framesz" (optional, default: 2048)
>  - AF_PACKET MMAP frame count set by "framecnt" (optional, default: 512)
> 
> Signed-off-by: John W. Linville <linville at tuxdriver.com>
> ---
> This PMD is intended to provide a means for using DPDK on a broad range of
> hardware without hardware-specific PMDs and (hopefully) with better performance
> than what PCAP offers in Linux.  This might be useful as a development platform for
> DPDK applications when DPDK-supported hardware is expensive or unavailable.
> 
> New in v2:
> 
> -- fixup some style issues found by check patch
> -- use if_index as part of fanout group ID
> -- set default number of queue pairs to 1
> 
>  config/common_bsdapp                   |   5 +
>  config/common_linuxapp                 |   5 +
>  lib/Makefile                           |   1 +
>  lib/librte_eal/linuxapp/eal/Makefile   |   1 +
>  lib/librte_pmd_packet/Makefile         |  60 +++
>  lib/librte_pmd_packet/rte_eth_packet.c | 826
> +++++++++++++++++++++++++++++++++
> lib/librte_pmd_packet/rte_eth_packet.h |  55 +++
>  mk/rte.app.mk                          |   4 +
>  8 files changed, 957 insertions(+)
>  create mode 100644 lib/librte_pmd_packet/Makefile  create mode 100644
> lib/librte_pmd_packet/rte_eth_packet.c
>  create mode 100644 lib/librte_pmd_packet/rte_eth_packet.h
> 
> diff --git a/config/common_bsdapp b/config/common_bsdapp index
> 943dce8f1ede..c317f031278e 100644
> --- a/config/common_bsdapp
> +++ b/config/common_bsdapp
> @@ -226,6 +226,11 @@ CONFIG_RTE_LIBRTE_PMD_PCAP=y
> CONFIG_RTE_LIBRTE_PMD_BOND=y
> 
>  #
> +# Compile software PMD backed by AF_PACKET sockets (Linux only) #
> +CONFIG_RTE_LIBRTE_PMD_PACKET=n
> +
> +#
>  # Do prefetch of packet data within PMD driver receive function  #
> CONFIG_RTE_PMD_PACKET_PREFETCH=y diff --git a/config/common_linuxapp
> b/config/common_linuxapp index 7bf5d80d4e26..f9e7bc3015ec 100644
> --- a/config/common_linuxapp
> +++ b/config/common_linuxapp
> @@ -249,6 +249,11 @@ CONFIG_RTE_LIBRTE_PMD_PCAP=n
> CONFIG_RTE_LIBRTE_PMD_BOND=y
> 
>  #
> +# Compile software PMD backed by AF_PACKET sockets (Linux only) #
> +CONFIG_RTE_LIBRTE_PMD_PACKET=y
> +
> +#
>  # Compile Xen PMD
>  #
>  CONFIG_RTE_LIBRTE_PMD_XENVIRT=n
> diff --git a/lib/Makefile b/lib/Makefile index 10c5bb3045bc..930fadf29898 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -47,6 +47,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_I40E_PMD) +=
> librte_pmd_i40e
>  DIRS-$(CONFIG_RTE_LIBRTE_PMD_BOND) += librte_pmd_bond
>  DIRS-$(CONFIG_RTE_LIBRTE_PMD_RING) += librte_pmd_ring
>  DIRS-$(CONFIG_RTE_LIBRTE_PMD_PCAP) += librte_pmd_pcap
> +DIRS-$(CONFIG_RTE_LIBRTE_PMD_PACKET) += librte_pmd_packet
>  DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += librte_pmd_virtio
>  DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += librte_pmd_vmxnet3
>  DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += librte_pmd_xenvirt diff --git
> a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
> index 756d6b0c9301..feed24a63272 100644
> --- a/lib/librte_eal/linuxapp/eal/Makefile
> +++ b/lib/librte_eal/linuxapp/eal/Makefile
> @@ -44,6 +44,7 @@ CFLAGS += -I$(RTE_SDK)/lib/librte_ether  CFLAGS +=
> -I$(RTE_SDK)/lib/librte_ivshmem  CFLAGS += -I$(RTE_SDK)/lib/librte_pmd_ring
> CFLAGS += -I$(RTE_SDK)/lib/librte_pmd_pcap
> +CFLAGS += -I$(RTE_SDK)/lib/librte_pmd_packet
>  CFLAGS += -I$(RTE_SDK)/lib/librte_pmd_xenvirt
>  CFLAGS += $(WERROR_FLAGS) -O3
> 
> diff --git a/lib/librte_pmd_packet/Makefile b/lib/librte_pmd_packet/Makefile new file
> mode 100644 index 000000000000..e1266fb992cd
> --- /dev/null
> +++ b/lib/librte_pmd_packet/Makefile
> @@ -0,0 +1,60 @@
> +#   BSD LICENSE
> +#
> +#   Copyright(c) 2014 John W. Linville <linville at redhat.com>
> +#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
> +#   Copyright(c) 2014 6WIND S.A.
> +#   All rights reserved.
> +#
> +#   Redistribution and use in source and binary forms, with or without
> +#   modification, are permitted provided that the following conditions
> +#   are met:
> +#
> +#     * Redistributions of source code must retain the above copyright
> +#       notice, this list of conditions and the following disclaimer.
> +#     * Redistributions in binary form must reproduce the above copyright
> +#       notice, this list of conditions and the following disclaimer in
> +#       the documentation and/or other materials provided with the
> +#       distribution.
> +#     * Neither the name of Intel Corporation nor the names of its
> +#       contributors may be used to endorse or promote products derived
> +#       from this software without specific prior written permission.
> +#
> +#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
> CONTRIBUTORS
> +#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT
> NOT
> +#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
> FITNESS FOR
> +#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
> COPYRIGHT
> +#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
> INCIDENTAL,
> +#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
> NOT
> +#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
> LOSS OF USE,
> +#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
> AND ON ANY
> +#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> +#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
> THE USE
> +#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
> DAMAGE.
> +
> +include $(RTE_SDK)/mk/rte.vars.mk
> +
> +#
> +# library name
> +#
> +LIB = librte_pmd_packet.a
> +
> +CFLAGS += -O3
> +CFLAGS += $(WERROR_FLAGS)
> +
> +#
> +# all source are stored in SRCS-y
> +#
> +SRCS-$(CONFIG_RTE_LIBRTE_PMD_PACKET) += rte_eth_packet.c
> +
> +#
> +# Export include files
> +#
> +SYMLINK-y-include += rte_eth_packet.h
> +
> +# this lib depends upon:
> +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_PACKET) += lib/librte_mbuf
> +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_PACKET) += lib/librte_ether
> +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_PACKET) += lib/librte_malloc
> +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_PACKET) += lib/librte_kvargs
> +
> +include $(RTE_SDK)/mk/rte.lib.mk
> diff --git a/lib/librte_pmd_packet/rte_eth_packet.c
> b/lib/librte_pmd_packet/rte_eth_packet.c
> new file mode 100644
> index 000000000000..9c82d16e730f
> --- /dev/null
> +++ b/lib/librte_pmd_packet/rte_eth_packet.c
> @@ -0,0 +1,826 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2014 John W. Linville <linville at tuxdriver.com>
> + *
> + *   Originally based upon librte_pmd_pcap code:
> + *
> + *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
> + *   Copyright(c) 2014 6WIND S.A.
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of Intel Corporation nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
> CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT
> NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
> FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
> COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
> INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
> NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
> LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
> AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
> TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
> THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
> DAMAGE.
> + */
> +
> +#include <rte_mbuf.h>
> +#include <rte_ethdev.h>
> +#include <rte_malloc.h>
> +#include <rte_kvargs.h>
> +#include <rte_dev.h>
> +
> +#include <linux/if_ether.h>
> +#include <linux/if_packet.h>
> +#include <arpa/inet.h>
> +#include <net/if.h>
> +#include <sys/types.h>
> +#include <sys/socket.h>
> +#include <sys/ioctl.h>
> +#include <sys/mman.h>
> +#include <unistd.h>
> +#include <poll.h>
> +
> +#include "rte_eth_packet.h"
> +
> +#define ETH_PACKET_IFACE_ARG		"iface"
> +#define ETH_PACKET_NUM_Q_ARG		"qpairs"
> +#define ETH_PACKET_BLOCKSIZE_ARG	"blocksz"
> +#define ETH_PACKET_FRAMESIZE_ARG	"framesz"
> +#define ETH_PACKET_FRAMECOUNT_ARG	"framecnt"
> +
> +#define DFLT_BLOCK_SIZE		(1 << 12)
> +#define DFLT_FRAME_SIZE		(1 << 11)
> +#define DFLT_FRAME_COUNT	(1 << 9)
> +
> +struct pkt_rx_queue {
> +	int sockfd;
> +
> +	struct iovec *rd;
> +	uint8_t *map;
> +	unsigned int framecount;
> +	unsigned int framenum;
> +
> +	struct rte_mempool *mb_pool;
> +
> +	volatile unsigned long rx_pkts;
> +	volatile unsigned long err_pkts;
> +};
> +
> +struct pkt_tx_queue {
> +	int sockfd;
> +
> +	struct iovec *rd;
> +	uint8_t *map;
> +	unsigned int framecount;
> +	unsigned int framenum;
> +
> +	volatile unsigned long tx_pkts;
> +	volatile unsigned long err_pkts;
> +};
> +
> +struct pmd_internals {
> +	unsigned nb_queues;
> +
> +	int if_index;
> +	struct ether_addr eth_addr;
> +
> +	struct tpacket_req req;
> +
> +	struct pkt_rx_queue rx_queue[RTE_PMD_PACKET_MAX_RINGS];
> +	struct pkt_tx_queue tx_queue[RTE_PMD_PACKET_MAX_RINGS];
> +};
> +
> +static const char *valid_arguments[] = {
> +	ETH_PACKET_IFACE_ARG,
> +	ETH_PACKET_NUM_Q_ARG,
> +	ETH_PACKET_BLOCKSIZE_ARG,
> +	ETH_PACKET_FRAMESIZE_ARG,
> +	ETH_PACKET_FRAMECOUNT_ARG,
> +	NULL
> +};
> +
> +static const char *drivername = "AF_PACKET PMD";
> +
> +static struct rte_eth_link pmd_link = {
> +	.link_speed = 10000,
> +	.link_duplex = ETH_LINK_FULL_DUPLEX,
> +	.link_status = 0
> +};
> +
> +static uint16_t
> +eth_packet_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) {
> +	unsigned i;
> +	struct tpacket2_hdr *ppd;
> +	struct rte_mbuf *mbuf;
> +	uint8_t *pbuf;
> +	struct pkt_rx_queue *pkt_q = queue;
> +	uint16_t num_rx = 0;
> +	unsigned int framecount, framenum;
> +
> +	if (unlikely(nb_pkts == 0))
> +		return 0;
> +
> +	/*
> +	 * Reads the given number of packets from the AF_PACKET socket one by
> +	 * one and copies the packet data into a newly allocated mbuf.
> +	 */
> +	framecount = pkt_q->framecount;
> +	framenum = pkt_q->framenum;
> +	for (i = 0; i < nb_pkts; i++) {
> +		/* point at the next incoming frame */
> +		ppd = (struct tpacket2_hdr *) pkt_q->rd[framenum].iov_base;
> +		if ((ppd->tp_status & TP_STATUS_USER) == 0)
> +			break;
> +
> +		/* allocate the next mbuf */
> +		mbuf = rte_pktmbuf_alloc(pkt_q->mb_pool);
> +		if (unlikely(mbuf == NULL))
> +			break;
> +
> +		/* packet will fit in the mbuf, go ahead and receive it */
> +		mbuf->pkt.pkt_len = mbuf->pkt.data_len = ppd->tp_snaplen;
> +		pbuf = (uint8_t *) ppd + ppd->tp_mac;
> +		memcpy(mbuf->pkt.data, pbuf, mbuf->pkt.data_len);
> +
> +		/* release incoming frame and advance ring buffer */
> +		ppd->tp_status = TP_STATUS_KERNEL;
> +		if (++framenum >= framecount)
> +			framenum = 0;
> +
> +		/* account for the receive frame */
> +		bufs[i] = mbuf;
> +		num_rx++;
> +	}
> +	pkt_q->framenum = framenum;
> +	pkt_q->rx_pkts += num_rx;
> +	return num_rx;
> +}
> +
> +/*
> + * Callback to handle sending packets through a real NIC.
> + */
> +static uint16_t
> +eth_packet_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) {
> +	struct tpacket2_hdr *ppd;
> +	struct rte_mbuf *mbuf;
> +	uint8_t *pbuf;
> +	unsigned int framecount, framenum;
> +	struct pollfd pfd;
> +	struct pkt_tx_queue *pkt_q = queue;
> +	uint16_t num_tx = 0;
> +	int i;
> +
> +	if (unlikely(nb_pkts == 0))
> +		return 0;
> +
> +	memset(&pfd, 0, sizeof(pfd));
> +	pfd.fd = pkt_q->sockfd;
> +	pfd.events = POLLOUT;
> +	pfd.revents = 0;
> +
> +	framecount = pkt_q->framecount;
> +	framenum = pkt_q->framenum;
> +	ppd = (struct tpacket2_hdr *) pkt_q->rd[framenum].iov_base;
> +	for (i = 0; i < nb_pkts; i++) {
> +		/* point at the next incoming frame */
> +		if ((ppd->tp_status != TP_STATUS_AVAILABLE) &&
> +		    (poll(&pfd, 1, -1) < 0))
> +				continue;
> +
> +		/* copy the tx frame data */
> +		mbuf = bufs[num_tx];
> +		pbuf = (uint8_t *) ppd + TPACKET2_HDRLEN -
> +			sizeof(struct sockaddr_ll);
> +		memcpy(pbuf, mbuf->pkt.data, mbuf->pkt.data_len);
> +		ppd->tp_len = ppd->tp_snaplen = mbuf->pkt.data_len;
> +
> +		/* release incoming frame and advance ring buffer */
> +		ppd->tp_status = TP_STATUS_SEND_REQUEST;
> +		if (++framenum >= framecount)
> +			framenum = 0;
> +		ppd = (struct tpacket2_hdr *) pkt_q->rd[framenum].iov_base;
> +
> +		num_tx++;
> +		rte_pktmbuf_free(mbuf);
> +	}
> +
> +	/* kick-off transmits */
> +	sendto(pkt_q->sockfd, NULL, 0, MSG_DONTWAIT, NULL, 0);
> +
> +	pkt_q->framenum = framenum;
> +	pkt_q->tx_pkts += num_tx;
> +	pkt_q->err_pkts += nb_pkts - num_tx;
> +	return num_tx;
> +}
> +
> +static int
> +eth_dev_start(struct rte_eth_dev *dev)
> +{
> +	dev->data->dev_link.link_status = 1;
> +	return 0;
> +}
> +
> +/*
> + * This function gets called when the current port gets stopped.
> + */
> +static void
> +eth_dev_stop(struct rte_eth_dev *dev)
> +{
> +	unsigned i;
> +	int sockfd;
> +	struct pmd_internals *internals = dev->data->dev_private;
> +
> +	for (i = 0; i < internals->nb_queues; i++) {
> +		sockfd = internals->rx_queue[i].sockfd;
> +		if (sockfd != -1)
> +			close(sockfd);
> +		sockfd = internals->tx_queue[i].sockfd;
> +		if (sockfd != -1)
> +			close(sockfd);
> +	}
> +
> +	dev->data->dev_link.link_status = 0;
> +}
> +
> +static int
> +eth_dev_configure(struct rte_eth_dev *dev __rte_unused) {
> +	return 0;
> +}
> +
> +static void
> +eth_dev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info
> +*dev_info) {
> +	struct pmd_internals *internals = dev->data->dev_private;
> +
> +	dev_info->driver_name = drivername;
> +	dev_info->if_index = internals->if_index;
> +	dev_info->max_mac_addrs = 1;
> +	dev_info->max_rx_pktlen = (uint32_t)ETH_FRAME_LEN;
> +	dev_info->max_rx_queues = (uint16_t)internals->nb_queues;
> +	dev_info->max_tx_queues = (uint16_t)internals->nb_queues;
> +	dev_info->min_rx_bufsize = 0;
> +	dev_info->pci_dev = NULL;
> +}
> +
> +static void
> +eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *igb_stats)
> +{
> +	unsigned i, imax;
> +	unsigned long rx_total = 0, tx_total = 0, tx_err_total = 0;
> +	const struct pmd_internals *internal = dev->data->dev_private;
> +
> +	memset(igb_stats, 0, sizeof(*igb_stats));
> +
> +	imax = (internal->nb_queues < RTE_ETHDEV_QUEUE_STAT_CNTRS ?
> +	        internal->nb_queues : RTE_ETHDEV_QUEUE_STAT_CNTRS);
> +	for (i = 0; i < imax; i++) {
> +		igb_stats->q_ipackets[i] = internal->rx_queue[i].rx_pkts;
> +		rx_total += igb_stats->q_ipackets[i];
> +	}
> +
> +	imax = (internal->nb_queues < RTE_ETHDEV_QUEUE_STAT_CNTRS ?
> +	        internal->nb_queues : RTE_ETHDEV_QUEUE_STAT_CNTRS);
> +	for (i = 0; i < imax; i++) {
> +		igb_stats->q_opackets[i] = internal->tx_queue[i].tx_pkts;
> +		igb_stats->q_errors[i] = internal->tx_queue[i].err_pkts;
> +		tx_total += igb_stats->q_opackets[i];
> +		tx_err_total += igb_stats->q_errors[i];
> +	}
> +
> +	igb_stats->ipackets = rx_total;
> +	igb_stats->opackets = tx_total;
> +	igb_stats->oerrors = tx_err_total;
> +}
> +
> +static void
> +eth_stats_reset(struct rte_eth_dev *dev) {
> +	unsigned i;
> +	struct pmd_internals *internal = dev->data->dev_private;
> +
> +	for (i = 0; i < internal->nb_queues; i++)
> +		internal->rx_queue[i].rx_pkts = 0;
> +
> +	for (i = 0; i < internal->nb_queues; i++) {
> +		internal->tx_queue[i].tx_pkts = 0;
> +		internal->tx_queue[i].err_pkts = 0;
> +	}
> +}
> +
> +static void
> +eth_dev_close(struct rte_eth_dev *dev __rte_unused) { }
> +
> +static void
> +eth_queue_release(void *q __rte_unused) { }
> +
> +static int
> +eth_link_update(struct rte_eth_dev *dev __rte_unused,
> +                int wait_to_complete __rte_unused) {
> +	return 0;
> +}
> +
> +static int
> +eth_rx_queue_setup(struct rte_eth_dev *dev,
> +                   uint16_t rx_queue_id,
> +                   uint16_t nb_rx_desc __rte_unused,
> +                   unsigned int socket_id __rte_unused,
> +                   const struct rte_eth_rxconf *rx_conf __rte_unused,
> +                   struct rte_mempool *mb_pool) {
> +	struct pmd_internals *internals = dev->data->dev_private;
> +	struct pkt_rx_queue *pkt_q = &internals->rx_queue[rx_queue_id];
> +	struct rte_pktmbuf_pool_private *mbp_priv;
> +	uint16_t buf_size;
> +
> +	pkt_q->mb_pool = mb_pool;
> +
> +	/* Now get the space available for data in the mbuf */
> +	mbp_priv = rte_mempool_get_priv(pkt_q->mb_pool);
> +	buf_size = (uint16_t) (mbp_priv->mbuf_data_room_size -
> +	                       RTE_PKTMBUF_HEADROOM);
> +
> +	if (ETH_FRAME_LEN > buf_size) {
> +		RTE_LOG(ERR, PMD,
> +			"%s: %d bytes will not fit in mbuf (%d bytes)\n",
> +			dev->data->name, ETH_FRAME_LEN, buf_size);
> +		return -ENOMEM;
> +	}
> +
> +	dev->data->rx_queues[rx_queue_id] = pkt_q;
> +
> +	return 0;
> +}
> +
> +static int
> +eth_tx_queue_setup(struct rte_eth_dev *dev,
> +                   uint16_t tx_queue_id,
> +                   uint16_t nb_tx_desc __rte_unused,
> +                   unsigned int socket_id __rte_unused,
> +                   const struct rte_eth_txconf *tx_conf __rte_unused) {
> +
> +	struct pmd_internals *internals = dev->data->dev_private;
> +
> +	dev->data->tx_queues[tx_queue_id] = &internals->tx_queue[tx_queue_id];
> +	return 0;
> +}
> +
> +static struct eth_dev_ops ops = {
> +	.dev_start = eth_dev_start,
> +	.dev_stop = eth_dev_stop,
> +	.dev_close = eth_dev_close,
> +	.dev_configure = eth_dev_configure,
> +	.dev_infos_get = eth_dev_info,
> +	.rx_queue_setup = eth_rx_queue_setup,
> +	.tx_queue_setup = eth_tx_queue_setup,
> +	.rx_queue_release = eth_queue_release,
> +	.tx_queue_release = eth_queue_release,
> +	.link_update = eth_link_update,
> +	.stats_get = eth_stats_get,
> +	.stats_reset = eth_stats_reset,
> +};
> +
> +/*
> + * Opens an AF_PACKET socket
> + */
> +static int
> +open_packet_iface(const char *key __rte_unused,
> +                  const char *value __rte_unused,
> +                  void *extra_args)
> +{
> +	int *sockfd = extra_args;
> +
> +	/* Open an AF_PACKET socket... */
> +	*sockfd = socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL));
> +	if (*sockfd == -1) {
> +		RTE_LOG(ERR, PMD, "Could not open AF_PACKET socket\n");
> +		return -1;
> +	}
> +
> +	return 0;
> +}
> +
> +static int
> +rte_pmd_init_internals(const char *name,
> +                       const int sockfd,
> +                       const unsigned nb_queues,
> +                       unsigned int blocksize,
> +                       unsigned int blockcnt,
> +                       unsigned int framesize,
> +                       unsigned int framecnt,
> +                       const unsigned numa_node,
> +                       struct pmd_internals **internals,
> +                       struct rte_eth_dev **eth_dev,
> +                       struct rte_kvargs *kvlist) {
> +	struct rte_eth_dev_data *data = NULL;
> +	struct rte_pci_device *pci_dev = NULL;
> +	struct rte_kvargs_pair *pair = NULL;
> +	struct ifreq ifr;
> +	size_t ifnamelen;
> +	unsigned k_idx;
> +	struct sockaddr_ll sockaddr;
> +	struct tpacket_req *req;
> +	struct pkt_rx_queue *rx_queue;
> +	struct pkt_tx_queue *tx_queue;
> +	int rc, tpver, discard, bypass;
> +	unsigned int i, q, rdsize;
> +	int qsockfd, fanout_arg;
> +
> +	for (k_idx = 0; k_idx < kvlist->count; k_idx++) {
> +		pair = &kvlist->pairs[k_idx];
> +		if (strstr(pair->key, ETH_PACKET_IFACE_ARG) != NULL)
> +			break;
> +	}
> +	if (pair == NULL) {
> +		RTE_LOG(ERR, PMD,
> +			"%s: no interface specified for AF_PACKET ethdev\n",
> +		        name);
> +		goto error;
> +	}
> +
> +	RTE_LOG(INFO, PMD,
> +		"%s: creating AF_PACKET-backed ethdev on numa socket %u\n",
> +		name, numa_node);
> +
> +	/*
> +	 * now do all data allocation - for eth_dev structure, dummy pci driver
> +	 * and internal (private) data
> +	 */
> +	data = rte_zmalloc_socket(name, sizeof(*data), 0, numa_node);
> +	if (data == NULL)
> +		goto error;
> +
> +	pci_dev = rte_zmalloc_socket(name, sizeof(*pci_dev), 0, numa_node);
> +	if (pci_dev == NULL)
> +		goto error;
> +
> +	*internals = rte_zmalloc_socket(name, sizeof(**internals),
> +	                                0, numa_node);
> +	if (*internals == NULL)
> +		goto error;
> +
> +	req = &((*internals)->req);
> +
> +	req->tp_block_size = blocksize;
> +	req->tp_block_nr = blockcnt;
> +	req->tp_frame_size = framesize;
> +	req->tp_frame_nr = framecnt;
> +
> +	ifnamelen = strlen(pair->value);
> +	if (ifnamelen < sizeof(ifr.ifr_name)) {
> +		memcpy(ifr.ifr_name, pair->value, ifnamelen);
> +		ifr.ifr_name[ifnamelen] = '\0';
> +	} else {
> +		RTE_LOG(ERR, PMD,
> +			"%s: I/F name too long (%s)\n",
> +			name, pair->value);
> +		goto error;
> +	}
> +	if (ioctl(sockfd, SIOCGIFINDEX, &ifr) == -1) {
> +		RTE_LOG(ERR, PMD,
> +			"%s: ioctl failed (SIOCGIFINDEX)\n",
> +		        name);
> +		goto error;
> +	}
> +	(*internals)->if_index = ifr.ifr_ifindex;
> +
> +	if (ioctl(sockfd, SIOCGIFHWADDR, &ifr) == -1) {
> +		RTE_LOG(ERR, PMD,
> +			"%s: ioctl failed (SIOCGIFHWADDR)\n",
> +		        name);
> +		goto error;
> +	}
> +	memcpy(&(*internals)->eth_addr, ifr.ifr_hwaddr.sa_data, ETH_ALEN);
> +
> +	memset(&sockaddr, 0, sizeof(sockaddr));
> +	sockaddr.sll_family = AF_PACKET;
> +	sockaddr.sll_protocol = htons(ETH_P_ALL);
> +	sockaddr.sll_ifindex = (*internals)->if_index;
> +
> +	fanout_arg = (getpid() ^ (*internals)->if_index) & 0xffff;
> +	fanout_arg |= (PACKET_FANOUT_HASH | PACKET_FANOUT_FLAG_DEFRAG |
> +	               PACKET_FANOUT_FLAG_ROLLOVER) << 16;
> +
> +	for (q = 0; q < nb_queues; q++) {
> +		/* Open an AF_PACKET socket for this queue... */
> +		qsockfd = socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL));
> +		if (qsockfd == -1) {
> +			RTE_LOG(ERR, PMD,
> +			        "%s: could not open AF_PACKET socket\n",
> +			        name);
> +			return -1;
> +		}
> +
> +		tpver = TPACKET_V2;
> +		rc = setsockopt(qsockfd, SOL_PACKET, PACKET_VERSION,
> +				&tpver, sizeof(tpver));
> +		if (rc == -1) {
> +			RTE_LOG(ERR, PMD,
> +				"%s: could not set PACKET_VERSION on AF_PACKET "
> +				"socket for %s\n", name, pair->value);
> +			goto error;
> +		}
> +
> +		discard = 1;
> +		rc = setsockopt(qsockfd, SOL_PACKET, PACKET_LOSS,
> +				&discard, sizeof(discard));
> +		if (rc == -1) {
> +			RTE_LOG(ERR, PMD,
> +				"%s: could not set PACKET_LOSS on "
> +			        "AF_PACKET socket for %s\n", name, pair->value);
> +			goto error;
> +		}
> +
> +		bypass = 1;
> +		rc = setsockopt(qsockfd, SOL_PACKET, PACKET_QDISC_BYPASS,
> +				&bypass, sizeof(bypass));
> +		if (rc == -1) {
> +			RTE_LOG(ERR, PMD,
> +				"%s: could not set PACKET_QDISC_BYPASS "
> +			        "on AF_PACKET socket for %s\n", name,
> +			        pair->value);
> +			goto error;
> +		}
> +
> +		rc = setsockopt(qsockfd, SOL_PACKET, PACKET_RX_RING, req,
> sizeof(*req));
> +		if (rc == -1) {
> +			RTE_LOG(ERR, PMD,
> +				"%s: could not set PACKET_RX_RING on AF_PACKET "
> +				"socket for %s\n", name, pair->value);
> +			goto error;
> +		}
> +
> +		rc = setsockopt(qsockfd, SOL_PACKET, PACKET_TX_RING, req,
> sizeof(*req));
> +		if (rc == -1) {
> +			RTE_LOG(ERR, PMD,
> +				"%s: could not set PACKET_TX_RING on AF_PACKET "
> +				"socket for %s\n", name, pair->value);
> +			goto error;
> +		}
> +
> +		rx_queue = &((*internals)->rx_queue[q]);
> +		rx_queue->framecount = req->tp_frame_nr;
> +
> +		rx_queue->map = mmap(NULL, 2 * req->tp_block_size * req->tp_block_nr,
> +				    PROT_READ | PROT_WRITE, MAP_SHARED |
> MAP_LOCKED,
> +				    qsockfd, 0);
> +		if (rx_queue->map == MAP_FAILED) {
> +			RTE_LOG(ERR, PMD,
> +				"%s: call to mmap failed on AF_PACKET socket for %s\n",
> +				name, pair->value);
> +			goto error;
> +		}
> +
> +		/* rdsize is same for both Tx and Rx */
> +		rdsize = req->tp_frame_nr * sizeof(*(rx_queue->rd));
> +
> +		rx_queue->rd = rte_zmalloc_socket(name, rdsize, 0, numa_node);
> +		for (i = 0; i < req->tp_frame_nr; ++i) {
> +			rx_queue->rd[i].iov_base = rx_queue->map + (i * framesize);
> +			rx_queue->rd[i].iov_len = req->tp_frame_size;
> +		}
> +		rx_queue->sockfd = qsockfd;
> +
> +		tx_queue = &((*internals)->tx_queue[q]);
> +		tx_queue->framecount = req->tp_frame_nr;
> +
> +		tx_queue->map = rx_queue->map + req->tp_block_size *
> +req->tp_block_nr;
> +
> +		tx_queue->rd = rte_zmalloc_socket(name, rdsize, 0, numa_node);
> +		for (i = 0; i < req->tp_frame_nr; ++i) {
> +			tx_queue->rd[i].iov_base = tx_queue->map + (i * framesize);
> +			tx_queue->rd[i].iov_len = req->tp_frame_size;
> +		}
> +		tx_queue->sockfd = qsockfd;
> +
> +		rc = bind(qsockfd, (const struct sockaddr*)&sockaddr, sizeof(sockaddr));
> +		if (rc == -1) {
> +			RTE_LOG(ERR, PMD,
> +				"%s: could not bind AF_PACKET socket to %s\n",
> +			        name, pair->value);
> +			goto error;
> +		}
> +
> +		rc = setsockopt(qsockfd, SOL_PACKET, PACKET_FANOUT,
> +				&fanout_arg, sizeof(fanout_arg));
> +		if (rc == -1) {
> +			RTE_LOG(ERR, PMD,
> +				"%s: could not set PACKET_FANOUT on AF_PACKET socket "
> +				"for %s\n", name, pair->value);
> +			goto error;
> +		}
> +	}
> +
> +	/* reserve an ethdev entry */
> +	*eth_dev = rte_eth_dev_allocate(name);
> +	if (*eth_dev == NULL)
> +		goto error;
> +
> +	/*
> +	 * now put it all together
> +	 * - store queue data in internals,
> +	 * - store numa_node info in pci_driver
> +	 * - point eth_dev_data to internals and pci_driver
> +	 * - and point eth_dev structure to new eth_dev_data structure
> +	 */
> +
> +	(*internals)->nb_queues = nb_queues;
> +
> +	data->dev_private = *internals;
> +	data->port_id = (*eth_dev)->data->port_id;
> +	data->nb_rx_queues = (uint16_t)nb_queues;
> +	data->nb_tx_queues = (uint16_t)nb_queues;
> +	data->dev_link = pmd_link;
> +	data->mac_addrs = &(*internals)->eth_addr;
> +
> +	pci_dev->numa_node = numa_node;
> +
> +	(*eth_dev)->data = data;
> +	(*eth_dev)->dev_ops = &ops;
> +	(*eth_dev)->pci_dev = pci_dev;
> +
> +	return 0;
> +
> +error:
> +	if (data)
> +		rte_free(data);
> +	if (pci_dev)
> +		rte_free(pci_dev);
> +	for (q = 0; q < nb_queues; q++) {
> +		if ((*internals)->rx_queue[q].rd)
> +			rte_free((*internals)->rx_queue[q].rd);
> +		if ((*internals)->tx_queue[q].rd)
> +			rte_free((*internals)->tx_queue[q].rd);
> +	}
> +	if (*internals)
> +		rte_free(*internals);
> +	return -1;
> +}
> +
> +static int
> +rte_eth_from_packet(const char *name,
> +                    int const *sockfd,
> +                    const unsigned numa_node,
> +                    struct rte_kvargs *kvlist) {
> +	struct pmd_internals *internals = NULL;
> +	struct rte_eth_dev *eth_dev = NULL;
> +	struct rte_kvargs_pair *pair = NULL;
> +	unsigned k_idx;
> +	unsigned int blockcount;
> +	unsigned int blocksize = DFLT_BLOCK_SIZE;
> +	unsigned int framesize = DFLT_FRAME_SIZE;
> +	unsigned int framecount = DFLT_FRAME_COUNT;
> +	unsigned int qpairs = 1;
> +
> +	/* do some parameter checking */
> +	if (*sockfd < 0)
> +		return -1;
> +
> +	/*
> +	 * Walk arguments for configurable settings
> +	 */
> +	for (k_idx = 0; k_idx < kvlist->count; k_idx++) {
> +		pair = &kvlist->pairs[k_idx];
> +		if (strstr(pair->key, ETH_PACKET_NUM_Q_ARG) != NULL) {
> +			qpairs = atoi(pair->value);
> +			if (qpairs < 1 ||
> +			    qpairs > RTE_PMD_PACKET_MAX_RINGS) {
> +				RTE_LOG(ERR, PMD,
> +					"%s: invalid qpairs value\n",
> +				        name);
> +				return -1;
> +			}
> +			continue;
> +		}
> +		if (strstr(pair->key, ETH_PACKET_BLOCKSIZE_ARG) != NULL) {
> +			blocksize = atoi(pair->value);
> +			if (!blocksize) {
> +				RTE_LOG(ERR, PMD,
> +					"%s: invalid blocksize value\n",
> +				        name);
> +				return -1;
> +			}
> +			continue;
> +		}
> +		if (strstr(pair->key, ETH_PACKET_FRAMESIZE_ARG) != NULL) {
> +			framesize = atoi(pair->value);
> +			if (!framesize) {
> +				RTE_LOG(ERR, PMD,
> +					"%s: invalid framesize value\n",
> +				        name);
> +				return -1;
> +			}
> +			continue;
> +		}
> +		if (strstr(pair->key, ETH_PACKET_FRAMECOUNT_ARG) != NULL) {
> +			framecount = atoi(pair->value);
> +			if (!framecount) {
> +				RTE_LOG(ERR, PMD,
> +					"%s: invalid framecount value\n",
> +				        name);
> +				return -1;
> +			}
> +			continue;
> +		}
> +	}
> +
> +	if (framesize > blocksize) {
> +		RTE_LOG(ERR, PMD,
> +			"%s: AF_PACKET MMAP frame size exceeds block size!\n",
> +		        name);
> +		return -1;
> +	}
> +
> +	blockcount = framecount / (blocksize / framesize);
> +	if (!blockcount) {
> +		RTE_LOG(ERR, PMD,
> +			"%s: invalid AF_PACKET MMAP parameters\n", name);
> +		return -1;
> +	}
> +
> +	RTE_LOG(INFO, PMD, "%s: AF_PACKET MMAP parameters:\n", name);
> +	RTE_LOG(INFO, PMD, "%s:\tblock size %d\n", name, blocksize);
> +	RTE_LOG(INFO, PMD, "%s:\tblock count %d\n", name, blockcount);
> +	RTE_LOG(INFO, PMD, "%s:\tframe size %d\n", name, framesize);
> +	RTE_LOG(INFO, PMD, "%s:\tframe count %d\n", name, framecount);
> +
> +	if (rte_pmd_init_internals(name, *sockfd, qpairs,
> +	                           blocksize, blockcount,
> +	                           framesize, framecount,
> +	                           numa_node, &internals, ð_dev,
> +	                           kvlist) < 0)
> +		return -1;
> +
> +	eth_dev->rx_pkt_burst = eth_packet_rx;
> +	eth_dev->tx_pkt_burst = eth_packet_tx;
> +
> +	return 0;
> +}
> +
> +int
> +rte_pmd_packet_devinit(const char *name, const char *params) {
> +	unsigned numa_node;
> +	int ret;
> +	struct rte_kvargs *kvlist;
> +	int sockfd = -1;
> +
> +	RTE_LOG(INFO, PMD, "Initializing pmd_packet for %s\n", name);
> +
> +	numa_node = rte_socket_id();
> +
> +	kvlist = rte_kvargs_parse(params, valid_arguments);
> +	if (kvlist == NULL)
> +		return -1;
> +
> +	/*
> +	 * If iface argument is passed we open the NICs and use them for
> +	 * reading / writing
> +	 */
> +	if (rte_kvargs_count(kvlist, ETH_PACKET_IFACE_ARG) == 1) {
> +
> +		ret = rte_kvargs_process(kvlist, ETH_PACKET_IFACE_ARG,
> +		                         &open_packet_iface, &sockfd);
> +		if (ret < 0)
> +			return -1;
> +	}
> +
> +	ret = rte_eth_from_packet(name, &sockfd, numa_node, kvlist);
> +	close(sockfd); /* no longer needed */
> +
> +	if (ret < 0)
> +		return -1;
> +
> +	return 0;
> +}
> +
> +static struct rte_driver pmd_packet_drv = {
> +	.name = "eth_packet",
> +	.type = PMD_VDEV,
> +	.init = rte_pmd_packet_devinit,
> +};
> +
> +PMD_REGISTER_DRIVER(pmd_packet_drv);
> diff --git a/lib/librte_pmd_packet/rte_eth_packet.h
> b/lib/librte_pmd_packet/rte_eth_packet.h
> new file mode 100644
> index 000000000000..f685611da3e9
> --- /dev/null
> +++ b/lib/librte_pmd_packet/rte_eth_packet.h
> @@ -0,0 +1,55 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of Intel Corporation nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
> CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT
> NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
> FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
> COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
> INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
> NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
> LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
> AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
> TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
> THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
> DAMAGE.
> + */
> +
> +#ifndef _RTE_ETH_PACKET_H_
> +#define _RTE_ETH_PACKET_H_
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#define RTE_ETH_PACKET_PARAM_NAME "eth_packet"
> +
> +#define RTE_PMD_PACKET_MAX_RINGS 16
> +
> +/**
> + * For use by the EAL only. Called as part of EAL init to set up any
> +dummy NICs
> + * configured on command line.
> + */
> +int rte_pmd_packet_devinit(const char *name, const char *params);
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif
> diff --git a/mk/rte.app.mk b/mk/rte.app.mk index 34dff2a02a05..a6994c4dbe93
> 100644
> --- a/mk/rte.app.mk
> +++ b/mk/rte.app.mk
> @@ -210,6 +210,10 @@ ifeq ($(CONFIG_RTE_LIBRTE_PMD_PCAP),y)  LDLIBS
> += -lrte_pmd_pcap -lpcap  endif
> 
> +ifeq ($(CONFIG_RTE_LIBRTE_PMD_PACKET),y)
> +LDLIBS += -lrte_pmd_packet
> +endif
> +
>  endif # plugins
> 
>  LDLIBS += $(EXECENV_LDLIBS)
> --
> 1.9.3
    
    
More information about the dev
mailing list