[dpdk-dev] [RFC PATCH 1/4] Add example pktdev implementation

Bruce Richardson bruce.richardson at intel.com
Tue Apr 21 11:23:37 CEST 2015


On Tue, Apr 21, 2015 at 09:40:05AM +0100, Ananyev, Konstantin wrote:
> > -----Original Message-----
> > From: Richardson, Bruce
> > Sent: Monday, April 20, 2015 4:03 PM
> > To: Ananyev, Konstantin
> > Cc: dev at dpdk.org; Wiles, Keith
> > Subject: Re: [dpdk-dev] [RFC PATCH 1/4] Add example pktdev implementation
> > 
> > On Mon, Apr 20, 2015 at 12:26:43PM +0100, Ananyev, Konstantin wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bruce Richardson
> > > > Sent: Friday, April 17, 2015 4:17 PM
> > > > To: dev at dpdk.org; Wiles, Keith
> > > > Subject: [dpdk-dev] [RFC PATCH 1/4] Add example pktdev implementation
> > > >
> > > > This commit demonstrates what a minimal API for all packet handling
> > > > types would look like. It simply provides the necessary parts for
> > > > receiving and transmiting packets, and is based off the ethdev
> > > > implementation.
> > > > ---
> > > >  config/common_bsdapp           |   5 ++
> > > >  config/common_linuxapp         |   5 ++
> > > >  lib/Makefile                   |   1 +
> > > >  lib/librte_pktdev/Makefile     |  56 ++++++++++++++++
> > > >  lib/librte_pktdev/rte_pktdev.c |  35 ++++++++++
> > > >  lib/librte_pktdev/rte_pktdev.h | 144 +++++++++++++++++++++++++++++++++++++++++
> > > >  6 files changed, 246 insertions(+)
> > > >  create mode 100644 lib/librte_pktdev/Makefile
> > > >  create mode 100644 lib/librte_pktdev/rte_pktdev.c
> > > >  create mode 100644 lib/librte_pktdev/rte_pktdev.h
> > > >
> > > > diff --git a/config/common_bsdapp b/config/common_bsdapp
> > > > index 8ff4dc2..d2b932c 100644
> > > > --- a/config/common_bsdapp
> > > > +++ b/config/common_bsdapp
> > > > @@ -132,6 +132,11 @@ CONFIG_RTE_LIBRTE_EAL_VMWARE_TSC_MAP_SUPPORT=y
> > > >  CONFIG_RTE_LIBRTE_KVARGS=y
> > > >
> > > >  #
> > > > +# Compile generic packet handling device library
> > > > +#
> > > > +CONFIG_RTE_LIBRTE_PKTDEV=y
> > > > +
> > > > +#
> > > >  # Compile generic ethernet library
> > > >  #
> > > >  CONFIG_RTE_LIBRTE_ETHER=y
> > > > diff --git a/config/common_linuxapp b/config/common_linuxapp
> > > > index 09a58ac..5bda416 100644
> > > > --- a/config/common_linuxapp
> > > > +++ b/config/common_linuxapp
> > > > @@ -129,6 +129,11 @@ CONFIG_RTE_LIBRTE_EAL_VMWARE_TSC_MAP_SUPPORT=y
> > > >  CONFIG_RTE_LIBRTE_KVARGS=y
> > > >
> > > >  #
> > > > +# Compile generic packet handling device library
> > > > +#
> > > > +CONFIG_RTE_LIBRTE_PKTDEV=y
> > > > +
> > > > +#
> > > >  # Compile generic ethernet library
> > > >  #
> > > >  CONFIG_RTE_LIBRTE_ETHER=y
> > > > diff --git a/lib/Makefile b/lib/Makefile
> > > > index d94355d..4db5ee0 100644
> > > > --- a/lib/Makefile
> > > > +++ b/lib/Makefile
> > > > @@ -32,6 +32,7 @@
> > > >  include $(RTE_SDK)/mk/rte.vars.mk
> > > >
> > > >  DIRS-y += librte_compat
> > > > +DIRS-$(CONFIG_RTE_LIBRTE_PKTDEV) += librte_pktdev
> > > >  DIRS-$(CONFIG_RTE_LIBRTE_EAL) += librte_eal
> > > >  DIRS-$(CONFIG_RTE_LIBRTE_MALLOC) += librte_malloc
> > > >  DIRS-$(CONFIG_RTE_LIBRTE_RING) += librte_ring
> > > > diff --git a/lib/librte_pktdev/Makefile b/lib/librte_pktdev/Makefile
> > > > new file mode 100644
> > > > index 0000000..2d3b3a1
> > > > --- /dev/null
> > > > +++ b/lib/librte_pktdev/Makefile
> > > > @@ -0,0 +1,56 @@
> > > > +#   BSD LICENSE
> > > > +#
> > > > +#   Copyright(c) 2015 Intel Corporation. All rights reserved.
> > > > +#   All rights reserved.
> > > > +#
> > > > +#   Redistribution and use in source and binary forms, with or without
> > > > +#   modification, are permitted provided that the following conditions
> > > > +#   are met:
> > > > +#
> > > > +#     * Redistributions of source code must retain the above copyright
> > > > +#       notice, this list of conditions and the following disclaimer.
> > > > +#     * Redistributions in binary form must reproduce the above copyright
> > > > +#       notice, this list of conditions and the following disclaimer in
> > > > +#       the documentation and/or other materials provided with the
> > > > +#       distribution.
> > > > +#     * Neither the name of Intel Corporation nor the names of its
> > > > +#       contributors may be used to endorse or promote products derived
> > > > +#       from this software without specific prior written permission.
> > > > +#
> > > > +#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> > > > +#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> > > > +#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> > > > +#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> > > > +#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> > > > +#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> > > > +#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> > > > +#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> > > > +#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> > > > +#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> > > > +#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> > > > +
> > > > +include $(RTE_SDK)/mk/rte.vars.mk
> > > > +
> > > > +#
> > > > +# library name
> > > > +#
> > > > +LIB = libpktdev.a
> > > > +
> > > > +CFLAGS += -O3
> > > > +CFLAGS += $(WERROR_FLAGS)
> > > > +
> > > > +EXPORT_MAP := rte_pktdev_version.map
> > > > +
> > > > +LIBABIVER := 1
> > > > +
> > > > +SRCS-y += rte_pktdev.c
> > > > +
> > > > +#
> > > > +# Export include files
> > > > +#
> > > > +SYMLINK-y-include += rte_pktdev.h
> > > > +
> > > > +# this lib depends upon no others:
> > > > +DEPDIRS-y +=
> > > > +
> > > > +include $(RTE_SDK)/mk/rte.lib.mk
> > > > diff --git a/lib/librte_pktdev/rte_pktdev.c b/lib/librte_pktdev/rte_pktdev.c
> > > > new file mode 100644
> > > > index 0000000..4c32d86
> > > > --- /dev/null
> > > > +++ b/lib/librte_pktdev/rte_pktdev.c
> > > > @@ -0,0 +1,36 @@
> > > > +/*-
> > > > + *   BSD LICENSE
> > > > + *
> > > > + *   Copyright(c) 2015 Intel Corporation. All rights reserved.
> > > > + *   All rights reserved.
> > > > + *
> > > > + *   Redistribution and use in source and binary forms, with or without
> > > > + *   modification, are permitted provided that the following conditions
> > > > + *   are met:
> > > > + *
> > > > + *     * Redistributions of source code must retain the above copyright
> > > > + *       notice, this list of conditions and the following disclaimer.
> > > > + *     * Redistributions in binary form must reproduce the above copyright
> > > > + *       notice, this list of conditions and the following disclaimer in
> > > > + *       the documentation and/or other materials provided with the
> > > > + *       distribution.
> > > > + *     * Neither the name of Intel Corporation nor the names of its
> > > > + *       contributors may be used to endorse or promote products derived
> > > > + *       from this software without specific prior written permission.
> > > > + *
> > > > + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> > > > + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> > > > + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> > > > + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> > > > + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> > > > + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> > > > + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> > > > + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> > > > + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> > > > + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> > > > + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> > > > + */
> > > > +
> > > > +#include "rte_pktdev.h"
> > > > +
> > > > +/* For future use */
> > > > diff --git a/lib/librte_pktdev/rte_pktdev.h b/lib/librte_pktdev/rte_pktdev.h
> > > > new file mode 100644
> > > > index 0000000..8a5699a
> > > > --- /dev/null
> > > > +++ b/lib/librte_pktdev/rte_pktdev.h
> > > > @@ -0,0 +1,144 @@
> > > > +/*-
> > > > + *   BSD LICENSE
> > > > + *
> > > > + *   Copyright(c) 2015 Intel Corporation. All rights reserved.
> > > > + *   All rights reserved.
> > > > + *
> > > > + *   Redistribution and use in source and binary forms, with or without
> > > > + *   modification, are permitted provided that the following conditions
> > > > + *   are met:
> > > > + *
> > > > + *     * Redistributions of source code must retain the above copyright
> > > > + *       notice, this list of conditions and the following disclaimer.
> > > > + *     * Redistributions in binary form must reproduce the above copyright
> > > > + *       notice, this list of conditions and the following disclaimer in
> > > > + *       the documentation and/or other materials provided with the
> > > > + *       distribution.
> > > > + *     * Neither the name of Intel Corporation nor the names of its
> > > > + *       contributors may be used to endorse or promote products derived
> > > > + *       from this software without specific prior written permission.
> > > > + *
> > > > + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> > > > + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> > > > + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> > > > + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> > > > + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> > > > + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> > > > + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> > > > + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> > > > + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> > > > + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> > > > + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> > > > + */
> > > > +
> > > > +#ifndef _RTE_PKTDEV_H_
> > > > +#define _RTE_PKTDEV_H_
> > > > +
> > > > +#include <stdint.h>
> > > > +
> > > > +/**
> > > > + * @file
> > > > + *
> > > > + * RTE Packet Processing Device API
> > > > + */
> > > > +
> > > > +#ifdef __cplusplus
> > > > +extern "C" {
> > > > +#endif
> > > > +
> > > > +/* forward definition of mbuf structure. We don't need full mbuf header here */
> > > > +struct rte_mbuf;
> > > > +
> > > > +#define RTE_PKT_NAME_MAX_LEN (32)
> > > > +
> > > > +typedef uint16_t (*pkt_rx_burst_t)(void *rxq,
> > > > +				   struct rte_mbuf **rx_pkts,
> > > > +				   uint16_t nb_pkts);
> > > > +/**< @internal Retrieve packets from a queue of a device. */
> > > > +
> > > > +typedef uint16_t (*pkt_tx_burst_t)(void *txq,
> > > > +				   struct rte_mbuf **tx_pkts,
> > > > +				   uint16_t nb_pkts);
> > > > +/**< @internal Send packets on a queue of a device. */
> > > > +
> > > > +#define RTE_PKT_DEV_HDR(structname) struct { \
> > > > +	pkt_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive function. */ \
> > > > +	pkt_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit function. */ \
> > > > +	struct structname ## _data *data;  /**< Pointer to device data */ \
> > > > +}
> > > > +
> > > > +#define RTE_PKT_DEV_DATA_HDR struct { \
> > > > +	char name[RTE_PKT_NAME_MAX_LEN]; /**< Unique identifier name */ \
> > > > +\
> > > > +	void **rx_queues; /**< Array of pointers to RX queues. */ \
> > > > +	void **tx_queues; /**< Array of pointers to TX queues. */ \
> > > > +	uint16_t nb_rx_queues; /**< Number of RX queues. */ \
> > > > +	uint16_t nb_tx_queues; /**< Number of TX queues. */ \
> > > > +}
> > > > +
> > > > +struct rte_pkt_dev {
> > > > +	RTE_PKT_DEV_HDR(rte_pkt_dev);
> > > > +};
> > > > +
> > > > +struct rte_pkt_dev_data {
> > > > +	RTE_PKT_DEV_DATA_HDR;
> > > > +};
> > > > +
> > > > +/**
> > > > + *
> > > > + * Retrieve a burst of input packets from a receive queue of a
> > > > + * device. The retrieved packets are stored in *rte_mbuf* structures whose
> > > > + * pointers are supplied in the *rx_pkts* array.
> > > > + *
> > > > + * @param dev
> > > > + *   The device to be polled for packets
> > > > + * @param queue_id
> > > > + *   The index of the receive queue from which to retrieve input packets.
> > > > + * @param rx_pkts
> > > > + *   The address of an array of pointers to *rte_mbuf* structures that
> > > > + *   must be large enough to store *nb_pkts* pointers in it.
> > > > + * @param nb_pkts
> > > > + *   The maximum number of packets to retrieve.
> > > > + * @return
> > > > + *   The number of packets actually retrieved, which is the number
> > > > + *   of pointers to *rte_mbuf* structures effectively supplied to the
> > > > + *   *rx_pkts* array.
> > > > + */
> > > > +static inline uint16_t
> > > > +rte_pkt_rx_burst(struct rte_pkt_dev *dev, uint16_t queue_id,
> > > > +		 struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
> > > > +{
> > > > +	return (*dev->rx_pkt_burst)(dev->data->rx_queues[queue_id],
> > > > +			rx_pkts, nb_pkts);
> > > > +}
> > > > +
> > > > +/**
> > > > + * Send a burst of output packets on a transmit queue of a device.
> > > > + *
> > > > + * @param dev
> > > > + *   The device to be given the packets.
> > > > + * @param queue_id
> > > > + *   The index of the queue through which output packets must be sent.
> > > > + * @param tx_pkts
> > > > + *   The address of an array of *nb_pkts* pointers to *rte_mbuf* structures
> > > > + *   which contain the output packets.
> > > > + * @param nb_pkts
> > > > + *   The maximum number of packets to transmit.
> > > > + * @return
> > > > + *   The number of output packets actually stored in transmit descriptors of
> > > > + *   the transmit ring. The return value can be less than the value of the
> > > > + *   *tx_pkts* parameter when the transmit ring is full or has been filled up.
> > > > + */
> > > > +static inline uint16_t
> > > > +rte_pkt_tx_burst(struct rte_pkt_dev *dev, uint16_t queue_id,
> > > > +		 struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
> > > > +{
> > > > +	return (*dev->tx_pkt_burst)(dev->data->tx_queues[queue_id], tx_pkts, nb_pkts);
> > > > +}
> > >
> > > That one looks much more lightweight, then Keith one :)
> > > I have a question here:
> > > Why are you guys so confident, that all foreseeable devices would fit into current eth_dev rx_burst/tx_burst API?
> > > As I understand, QAT devices have HW request/response ring pairs, so the idea probably is to make tx_burst()
> > > populate an request ring and rx_burst() to read from response ring, right?
> > > Though right now rte_mbuf contains a lot of flags and data fields that are specific for ethdev, and probably have no sense for crypto dev.
> > > From other side, for QAT devices, I suppose you'll need crypto specific flags and data:
> > > encrypt/decrypt, source and destination buffer addresses, some cipher specific data, etc.
> > > Wonder do you plan to fit all that into current rte_mbuf structure, or do you plan to have some superset structure,
> > > that would consist of rte_mbuf plus some extra stuff, or ... ?
> > > Konstantin
> > >
> > >
> > 
> > From my point of view, the purpose of a pktdev API to to define a common API
> > for devices that read/write (i.e. RX/TX) mbufs. If other devices don't fit that
> > model, then they don't use it. However, we already have a number of different
> > device types that already read/write mbufs, and there are likely more in future,
> > so it would be nice to have a common API that can be used across those types so
> > the data path can be source-neutral when processing mbufs. So far, we have
> > three proposals on how that might be achieved. :-)
> 
> Ok, probably I misunderstood previous RFCs about pktdev.
> So pktdev is not going to be a generic abstraction for any foreseeable device types: eth, crypto, compress, dpi, etc?
> It would represent only subset of dev types: real eth devices, plus SW emulated ones 
> that can mimic rx_burst/tx_burst, but don't support any other eth specific ops (kni, pmd_ring)?
> Basically what we support right now?
> And for future device types to support (crypto, etc) there would be something else? 
>

Not necessarily. I would expect future device types to also use this model -
but I wouldn't want to be prescriptive, if it really doesn't make sense.
I was really hoping to get multiple-benefit from a pktdev abstraction:
* commonality of interface across existing pkt-handling types
* framework for any new types to actually use.

> > 
> > As for the crypto, there is still a lot of investigation work to be done here,
> > and Keith is probably better placed than me to comment on it. But if you look at any
> > application using DPDK for packet processing, the mbuf is still going to be at
> > the heart of it, so whatever the actual crypto API is like, it's going to have
> > to have to fit into an "mbuf world" anyway.
> > If the crypto-dev APIs don't work
> > with mbufs, then it's pushing work onto the app (or higher level libs) to extract
> > the meaningful fields from the mbuf to pass to the dev.
> 
> As I understand application (or some upper layer lib) would have to do that work anyway: 
> RX packet, find and read related to this packet crypto data, and pass all that info to the crypto dev, right?
> I can hardly imagine, that in reality you always would be able to do:
> 
> n = rx_burst(eth_dev, pkts, n);
> n = tx_burst(crypto_dev, pkts, n);
> 
> without any processing of incoming packets in between.
> The difference is only how that information will be passed into crypto device:
> Would you need to to update an mbuf, or fill some other structure.  

Possibly not. However, I would hope we may be able to get to something like:

 n = rx_burst(eth_dev, pkts, n);
 for (i = 0; i < n; i++)
	 bufs[i]->crypto_context = lookup_sa(bufs[i]);
 n = tx_burst(crypto_dev, pkts, n);

> 
> > Therefore, in a DPDK
> > library, I would think it better to have the dev work with mbufs directly, as
> > the rest of DPDK does.
> 
> I wouldn't say that all DPDK libs accepts data as mbufs only.
> Only those where it makes sense: PMDs, ip_frag/reassembly, pipeline fw.
> Let say rte_hash and rte_acl accept input data in a neutral way.
> You wouldn't expect rte_hash_lookup() or rte_acl_classify() to accept pointer(s) to the mbuf.
> if you'll have a SW implemented encrypt()/decrypt() routines, you probably wouldn't require an mbuf as input parameter either.
> Wouldn't it be more plausible to create a new struct rte_crypto_buf (or something) to pass information to/from crypto devices.
> Then we can provide for it an ability to attach/detach to rte_mbuf(s) buf_addr(s).
> That way we can probably make crypto_buf to attach to 2 mbufs at once (source and destination).

Maybe what you suggest is the way to go. However, on the hash side, I actually
think it would be very useful to have our hash library have APIs to work directly
off mbufs. For a single lookup pulling out a key is ok, but when working off a
full burst of traffic, doing optimized code to extract keys from packets to pass
as an array into a hash is awkward for the app, and also prevents full code
pipelining in the lookup. [Ideally, you want your key extraction to be running
in parallel with lookups of previously extracted keys. Something a bit similar
to what is done for lookups in lib/librte_table/rte_table_hash_key16.c]

/Bruce
 


More information about the dev mailing list