[dpdk-dev] [PATCH] librte_pmd_packet: add PMD for AF_PACKET-based virtual devices

Stephen Hemminger stephen at networkplumber.org
Fri Jul 11 17:16:23 CEST 2014


On Fri, 11 Jul 2014 15:06:25 +0000
"Richardson, Bruce" <bruce.richardson at intel.com> wrote:

> > -----Original Message-----
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of John W. Linville
> > Sent: Friday, July 11, 2014 7:49 AM
> > To: Stephen Hemminger
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH] librte_pmd_packet: add PMD for AF_PACKET-
> > based virtual devices
> > 
> > On Fri, Jul 11, 2014 at 06:11:47AM -0700, Stephen Hemminger wrote:
> > > On Thu, 10 Jul 2014 16:32:49 -0400
> > > "John W. Linville" <linville at tuxdriver.com> wrote:
> > >
> > > > This is a Linux-specific virtual PMD driver backed by an AF_PACKET
> > > > socket.  This implementation uses mmap'ed ring buffers to limit copying
> > > > and user/kernel transitions.  The PACKET_FANOUT_HASH behavior of
> > > > AF_PACKET is used for frame reception.  In the current implementation,
> > > > Tx and Rx queues are always paired, and therefore are always equal
> > > > in number -- changing this would be a Simple Matter Of Programming.
> > > >
> > > > Interfaces of this type are created with a command line option like
> > > > "--vdev=eth_packet0,iface=...".  There are a number of options availabe
> > > > as arguments:
> > > >
> > > >  - Interface is chosen by "iface" (required)
> > > >  - Number of queue pairs set by "qpairs" (optional, default: 16)
> > > >  - AF_PACKET MMAP block size set by "blocksz" (optional, default: 4096)
> > > >  - AF_PACKET MMAP frame size set by "framesz" (optional, default: 2048)
> > > >  - AF_PACKET MMAP frame count set by "framecnt" (optional, default: 512)
> > > >
> > > > Signed-off-by: John W. Linville <linville at tuxdriver.com>
> > > > ---
> > > > This PMD is intended to provide a means for using DPDK on a broad
> > > > range of hardware without hardware-specific PMDs and (hopefully)
> > > > with better performance than what PCAP offers in Linux.  This might
> > > > be useful as a development platform for DPDK applications when
> > > > DPDK-supported hardware is expensive or unavailable.
> > > >
> > > >  config/common_bsdapp                   |   5 +
> > > >  config/common_linuxapp                 |   5 +
> > > >  lib/Makefile                           |   1 +
> > > >  lib/librte_eal/linuxapp/eal/Makefile   |   1 +
> > > >  lib/librte_pmd_packet/Makefile         |  60 +++
> > > >  lib/librte_pmd_packet/rte_eth_packet.c | 826
> > +++++++++++++++++++++++++++++++++
> > > >  lib/librte_pmd_packet/rte_eth_packet.h |  55 +++
> > > >  mk/rte.app.mk                          |   4 +
> > > >  8 files changed, 957 insertions(+)
> > > >  create mode 100644 lib/librte_pmd_packet/Makefile
> > > >  create mode 100644 lib/librte_pmd_packet/rte_eth_packet.c
> > > >  create mode 100644 lib/librte_pmd_packet/rte_eth_packet.h
> > > >
> > > > diff --git a/config/common_bsdapp b/config/common_bsdapp
> > > > index 943dce8f1ede..c317f031278e 100644
> > > > --- a/config/common_bsdapp
> > > > +++ b/config/common_bsdapp
> > > > @@ -226,6 +226,11 @@ CONFIG_RTE_LIBRTE_PMD_PCAP=y
> > > >  CONFIG_RTE_LIBRTE_PMD_BOND=y
> > > >
> > > >  #
> > > > +# Compile software PMD backed by AF_PACKET sockets (Linux only)
> > > > +#
> > > > +CONFIG_RTE_LIBRTE_PMD_PACKET=n
> > > > +
> > > > +#
> > > >  # Do prefetch of packet data within PMD driver receive function
> > > >  #
> > > >  CONFIG_RTE_PMD_PACKET_PREFETCH=y
> > > > diff --git a/config/common_linuxapp b/config/common_linuxapp
> > > > index 7bf5d80d4e26..f9e7bc3015ec 100644
> > > > --- a/config/common_linuxapp
> > > > +++ b/config/common_linuxapp
> > > > @@ -249,6 +249,11 @@ CONFIG_RTE_LIBRTE_PMD_PCAP=n
> > > >  CONFIG_RTE_LIBRTE_PMD_BOND=y
> > > >
> > > >  #
> > > > +# Compile software PMD backed by AF_PACKET sockets (Linux only)
> > > > +#
> > > > +CONFIG_RTE_LIBRTE_PMD_PACKET=y
> > > > +
> > > > +#
> > > >  # Compile Xen PMD
> > > >  #
> > > >  CONFIG_RTE_LIBRTE_PMD_XENVIRT=n
> > > > diff --git a/lib/Makefile b/lib/Makefile
> > > > index 10c5bb3045bc..930fadf29898 100644
> > > > --- a/lib/Makefile
> > > > +++ b/lib/Makefile
> > > > @@ -47,6 +47,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_I40E_PMD) +=
> > librte_pmd_i40e
> > > >  DIRS-$(CONFIG_RTE_LIBRTE_PMD_BOND) += librte_pmd_bond
> > > >  DIRS-$(CONFIG_RTE_LIBRTE_PMD_RING) += librte_pmd_ring
> > > >  DIRS-$(CONFIG_RTE_LIBRTE_PMD_PCAP) += librte_pmd_pcap
> > > > +DIRS-$(CONFIG_RTE_LIBRTE_PMD_PACKET) += librte_pmd_packet
> > > >  DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += librte_pmd_virtio
> > > >  DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += librte_pmd_vmxnet3
> > > >  DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += librte_pmd_xenvirt
> > > > diff --git a/lib/librte_eal/linuxapp/eal/Makefile
> > b/lib/librte_eal/linuxapp/eal/Makefile
> > > > index 756d6b0c9301..feed24a63272 100644
> > > > --- a/lib/librte_eal/linuxapp/eal/Makefile
> > > > +++ b/lib/librte_eal/linuxapp/eal/Makefile
> > > > @@ -44,6 +44,7 @@ CFLAGS += -I$(RTE_SDK)/lib/librte_ether
> > > >  CFLAGS += -I$(RTE_SDK)/lib/librte_ivshmem
> > > >  CFLAGS += -I$(RTE_SDK)/lib/librte_pmd_ring
> > > >  CFLAGS += -I$(RTE_SDK)/lib/librte_pmd_pcap
> > > > +CFLAGS += -I$(RTE_SDK)/lib/librte_pmd_packet
> > > >  CFLAGS += -I$(RTE_SDK)/lib/librte_pmd_xenvirt
> > > >  CFLAGS += $(WERROR_FLAGS) -O3
> > > >
> > > > diff --git a/lib/librte_pmd_packet/Makefile
> > b/lib/librte_pmd_packet/Makefile
> > > > new file mode 100644
> > > > index 000000000000..e1266fb992cd
> > > > --- /dev/null
> > > > +++ b/lib/librte_pmd_packet/Makefile
> > > > @@ -0,0 +1,60 @@
> > > > +#   BSD LICENSE
> > > > +#
> > > > +#   Copyright(c) 2014 John W. Linville <linville at redhat.com>
> > > > +#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
> > > > +#   Copyright(c) 2014 6WIND S.A.
> > > > +#   All rights reserved.
> > > > +#
> > > > +#   Redistribution and use in source and binary forms, with or without
> > > > +#   modification, are permitted provided that the following conditions
> > > > +#   are met:
> > > > +#
> > > > +#     * Redistributions of source code must retain the above copyright
> > > > +#       notice, this list of conditions and the following disclaimer.
> > > > +#     * Redistributions in binary form must reproduce the above copyright
> > > > +#       notice, this list of conditions and the following disclaimer in
> > > > +#       the documentation and/or other materials provided with the
> > > > +#       distribution.
> > > > +#     * Neither the name of Intel Corporation nor the names of its
> > > > +#       contributors may be used to endorse or promote products derived
> > > > +#       from this software without specific prior written permission.
> > > > +#
> > > > +#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
> > CONTRIBUTORS
> > > > +#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT
> > NOT
> > > > +#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
> > FITNESS FOR
> > > > +#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
> > COPYRIGHT
> > > > +#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
> > INCIDENTAL,
> > > > +#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
> > NOT
> > > > +#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
> > LOSS OF USE,
> > > > +#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
> > AND ON ANY
> > > > +#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
> > TORT
> > > > +#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
> > OF THE USE
> > > > +#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
> > DAMAGE.
> > > > +
> > > > +include $(RTE_SDK)/mk/rte.vars.mk
> > > > +
> > > > +#
> > > > +# library name
> > > > +#
> > > > +LIB = librte_pmd_packet.a
> > > > +
> > > > +CFLAGS += -O3
> > > > +CFLAGS += $(WERROR_FLAGS)
> > > > +
> > > > +#
> > > > +# all source are stored in SRCS-y
> > > > +#
> > > > +SRCS-$(CONFIG_RTE_LIBRTE_PMD_PACKET) += rte_eth_packet.c
> > > > +
> > > > +#
> > > > +# Export include files
> > > > +#
> > > > +SYMLINK-y-include += rte_eth_packet.h
> > > > +
> > > > +# this lib depends upon:
> > > > +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_PACKET) += lib/librte_mbuf
> > > > +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_PACKET) += lib/librte_ether
> > > > +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_PACKET) += lib/librte_malloc
> > > > +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_PACKET) += lib/librte_kvargs
> > > > +
> > > > +include $(RTE_SDK)/mk/rte.lib.mk
> > > > diff --git a/lib/librte_pmd_packet/rte_eth_packet.c
> > b/lib/librte_pmd_packet/rte_eth_packet.c
> > > > new file mode 100644
> > > > index 000000000000..fceb6258aad6
> > > > --- /dev/null
> > > > +++ b/lib/librte_pmd_packet/rte_eth_packet.c
> > > > @@ -0,0 +1,826 @@
> > > > +/*-
> > > > + *   BSD LICENSE
> > > > + *
> > > > + *   Copyright(c) 2014 John W. Linville <linville at tuxdriver.com>
> > > > + *
> > > > + *   Originally based upon librte_pmd_pcap code:
> > > > + *
> > > > + *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
> > > > + *   Copyright(c) 2014 6WIND S.A.
> > > > + *   All rights reserved.
> > > > + *
> > > > + *   Redistribution and use in source and binary forms, with or without
> > > > + *   modification, are permitted provided that the following conditions
> > > > + *   are met:
> > > > + *
> > > > + *     * Redistributions of source code must retain the above copyright
> > > > + *       notice, this list of conditions and the following disclaimer.
> > > > + *     * Redistributions in binary form must reproduce the above copyright
> > > > + *       notice, this list of conditions and the following disclaimer in
> > > > + *       the documentation and/or other materials provided with the
> > > > + *       distribution.
> > > > + *     * Neither the name of Intel Corporation nor the names of its
> > > > + *       contributors may be used to endorse or promote products derived
> > > > + *       from this software without specific prior written permission.
> > > > + *
> > > > + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
> > CONTRIBUTORS
> > > > + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT
> > NOT
> > > > + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
> > FITNESS FOR
> > > > + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
> > COPYRIGHT
> > > > + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
> > INCIDENTAL,
> > > > + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
> > BUT NOT
> > > > + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
> > LOSS OF USE,
> > > > + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
> > AND ON ANY
> > > > + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
> > TORT
> > > > + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
> > OF THE USE
> > > > + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
> > DAMAGE.
> > > > + */
> > > > +
> > > > +#include <rte_mbuf.h>
> > > > +#include <rte_ethdev.h>
> > > > +#include <rte_malloc.h>
> > > > +#include <rte_kvargs.h>
> > > > +#include <rte_dev.h>
> > > > +
> > > > +#include <linux/if_ether.h>
> > > > +#include <linux/if_packet.h>
> > > > +#include <arpa/inet.h>
> > > > +#include <net/if.h>
> > > > +#include <sys/types.h>
> > > > +#include <sys/socket.h>
> > > > +#include <sys/ioctl.h>
> > > > +#include <sys/mman.h>
> > > > +#include <unistd.h>
> > > > +#include <poll.h>
> > > > +
> > > > +#include "rte_eth_packet.h"
> > > > +
> > > > +#define ETH_PACKET_IFACE_ARG		"iface"
> > > > +#define ETH_PACKET_NUM_Q_ARG		"qpairs"
> > > > +#define ETH_PACKET_BLOCKSIZE_ARG	"blocksz"
> > > > +#define ETH_PACKET_FRAMESIZE_ARG	"framesz"
> > > > +#define ETH_PACKET_FRAMECOUNT_ARG	"framecnt"
> > > > +
> > > > +#define DFLT_BLOCK_SIZE		(1 << 12)
> > > > +#define DFLT_FRAME_SIZE		(1 << 11)
> > > > +#define DFLT_FRAME_COUNT	(1 << 9)
> > > > +
> > > > +struct pkt_rx_queue {
> > > > +	int sockfd;
> > > > +
> > > > +	struct iovec *rd;
> > > > +	uint8_t *map;
> > > > +	unsigned int framecount;
> > > > +	unsigned int framenum;
> > > > +
> > > > +	struct rte_mempool *mb_pool;
> > > > +
> > > > +	volatile unsigned long rx_pkts;
> > > > +	volatile unsigned long err_pkts;
> > >
> > > Use of volatile will generate slow code, don't think
> > > it is necessary, especially when only one CPU can use a queue
> > > at a time.
> > 
> > That is a good point, worth checking out.  FWIW, those lines are
> > boilerplate originally copied from the pcap PMD. :-)
> > 
> 
> 
> Yes, I agree it's worth checking out if there is a performance impact, but if we assume that the stats for RX/TX are possibly going to be read by another core, they really should be volatile for correctness.

Since only one core does update, that is not necessary. add will generate
valid value. and reader will read a valid value.
Only if two cpu's are using same queue would it be possible to for two add's
to collide; but DPDK queue documentation specifically says queue's are not MP safe.


More information about the dev mailing list