[dpdk-dev] [RFC] Adding multiple device types to DPDK.
Wiles, Keith
keith.wiles at intel.com
Thu Apr 2 16:16:27 CEST 2015
Hi All, just to make a comment on my own email :-)
On 4/1/15, 7:44 AM, "Wiles, Keith" <keith.wiles at intel.com> wrote:
>Hi all, (hoping format of the text is maintained)
>
>Bruce and myself are submitting this RFC in hopes of providing discussion
>points for the idea. Please do not get carried away with the code
>included, it was to help everyone understand the proposal/RFC.
>
>The RFC is to describe a proposed change we are looking to make to DPDK to
>add more device types. We would like to add in to DPDK the idea of a
>generic packet-device or ³pktdev², which can be thought of as a thin layer
>for all device classes. For other device types such as potentially a
>³cryptodev² or ³dpidev². One of the main goals is to not effect
>performance and not require any current application to be modified. The
>pktdev layer is providing a light framework for developers to add a device
>to DPDK.
>
>Reason for Change
>-----------------
>
>The reason why we are looking to introduce these concepts to DPDK are:
>
>* Expand the scope of DPDK so that it can provide APIs for more than just
>packet acquisition and transmission, but also provide APIs that can be
>used to work with other hardware and software offloads, such as
>cryptographic accelerators, or accelerated libraries for cryptographic
>functions. [The reason why both software and hardware are mentioned is so
>that the same APIs can be used whether or not a hardware accelerator is
>actually available].
>* Provide a minimal common basis for device abstraction in DPDK, that can
>be used to unify the different types of packet I/O devices already
>existing in DPDK. To this end, the ethdev APIs are a good starting point,
>but the ethdev library contains too many functions which are NIC-specific
>to be a general-purpose set of APIs across all devices.
> Note: The idea was previously touched on here:
>http://permalink.gmane.org/gmane.comp.networking.dpdk.devel/13545
>
>Description of Proposed Change
>------------------------------
>
>The basic idea behind "pktdev" is to abstract out a few common routines
>and structures/members of structures by starting with ethdev structures as
>a starting point, cut it down to little more than a few members in each
>structure then possible add just rx_burst and tx_burst. Then use the
>structures as a starting point for writing a device type. Currently we
>have the rx_burst/tx_burst routines moved to the pktdev and it see like
>move a couple more common functions maybe resaonable. It could be the
>Rx/Tx routines in pktdev should be left as is, but in the code below is a
>possible reason to abstract a few routines into a common set of files.
>
>From there, we have the ethdev type which adds in the existing functions
>specific to Ethernet devices, and also, for example, a cryptodev which may
>add in functions specific for cryptographic offload. As now, with the
>ethdev, the specific drivers provide concrete implementations of the
>functionality exposed by the interface. This hierarchy is shown in the
>diagram below, using the existing ethdev and ixgbe drivers as a reference,
>alongside a hypothetical cryptodev class and driver implementation
>(catchingly called) "X":
>
> ,---------------------.
> | struct rte_pktdev |
> +---------------------+
> | rte_pkt_rx_burst() |
> .-------| rte_pkt_tx_burst() |-----------.
> | `---------------------' |
> | |
> | |
> ,-------------------------------. ,------------------------------.
> | struct rte_ethdev | | struct rte_cryptodev |
> +-------------------------------+ +------------------------------+
> | rte_eth_dev_configure() | | rte_crypto_init_sym_session()|
> | rte_eth_allmulticast_enable() | | rte_crypto_del_sym_session() |
> | rte_eth_filter_ctrl() | | |
> `-------------------------------' `---------------.--------------'
> | |
> | |
> ,---------'---------------------. ,---------------'--------------.
> | struct rte_pmd_ixgbe | | struct rte_pmd_X |
> +-------------------------------+ +------------------------------+
> | .configure -> ixgbe_configure | | .init_session -> X_init_ses()|
> | .tx_burst -> ixgbe_xmit_pkts | | .tx_burst -> X_handle_pkts() |
> `-------------------------------' `------------------------------'
>
>We are not attempting to create a real class model here only looking at
>creating a very basic common set of APIs and structures for other device
>types.
>
>In terms of code changes for this, we obviously need to add in new
>interface libraries for pktdev and cryptodev. The pktdev library can
>define a skeleton structure for the first few elements of the nested
>structures to ensure consistency. Each of the defines below illustrate the
>common members in device structures, which gives some basic structure the
>device framework. Each of the defines are placed at the top of the devices
>matching structures and allows the devices to contain common and private
>data. The pkdev structures overlay the first common set of members for
>each device type.
>
>For example:
>------------
>
>We are using macros to reduce code changes to DPDK, but nested structures
>are a better solution:
>
>#define RTE_PKT_COMMON_DEV(_t)
> \
> pkt_rx_burst_t rx_pkt_burst; /**< Pointer to PMD
>receive function. */ \
> pkt_tx_burst_t tx_pkt_burst; /**< Pointer to PMD
>transmit function. */ \
> struct rte_##_t##_dev_data *data; /**< Pointer to device
>data */ \
> const struct _t##_driver *driver; /**< Driver for this
>device */ \
> struct _t##_dev_ops *dev_ops; /**< Functions exported by
>PMD */ \
> struct rte_pci_device *pci_dev; /**< PCI info. supplied by
>probing */ \
> /** User application callback for interrupts if present */
> \
> struct rte_##_t##_dev_cb_list link_intr_cbs;
> \
> /**
> \
> * User-supplied functions called from rx_burst to post-process
> \
> * received packets before passing them to the user
> \
> */
> \
> struct rte_##_t##_rxtx_callback **post_rx_burst_cbs;
> \
> /**
> \
> * User-supplied functions called from tx_burst to pre-process
> \
> * received packets before passing them to the driver for
>transmission. \
> */
> \
> struct rte_##_t##_rxtx_callback **pre_tx_burst_cbs;
> \
> enum rte_pkt_dev_type dev_type; /**< Flag indicating the
>device type */ \
> uint8_t attached /**< Flag indicating the
>port is attached */
> /* Possible alignment or a hole in the structure */
>
>#define RTE_PKT_NAME_MAX_LEN (32)
>
>#define RTE_PKT_COMMON_DEV_DATA
> \
> char name[RTE_PKT_NAME_MAX_LEN]; /**< Unique identifier name */
> \
>
> \
> void **rx_queues; /**< Array of pointers to RX queues.
>*/ \
> void **tx_queues; /**< Array of pointers to TX queues.
>*/ \
> uint16_t nb_rx_queues; /**< Number of RX queues. */
> \
> uint16_t nb_tx_queues; /**< Number of TX queues. */
> \
>
> \
> uint16_t flags; /**< Bit fields for xyzdev's to use.
>*/ \
> uint16_t mtu; /**< Maximum Transmission Unit. */
> \
> uint8_t unit_id; /**< Unit ID for this instance */
> \
> uint8_t _filler[7]; /* alignment filler */
> \
>
> \
> /* 64bit alignment starts here */
> \
> void *dev_private; /**< PMD-specific private data */
> \
> uint64_t rx_mbuf_alloc_failed; /**< RX ring mbuf allocation failures.
>*/ \
> uint32_t min_rx_buf_size; /**< Common rx buffer size handled by
>all queues */ \
> uint32_t _pad0
>
>#define port_id unit_id
>
>#define RTE_PKT_COMMON_DEV_INFO
> \
> struct rte_pci_device *pci_dev; /**< Device PCI information.
>*/ \
> const char *driver_name; /**< Device Driver name. */
> \
> unsigned int if_index; /**< Index to bound host
>interface, or 0 if none. */ \
> /* Use if_indextoname() to translate into an interface name. */
> \
> uint32_t _pad0
>
>The above is attempting to collect the common members to be place into the
>top of private device structures as we feel these members should be fairly
>common among the device types.
>
>/**
>* @internal
>* The generic data structure associated with each device.
>*
>* Pointers to burst-oriented packet receive and transmit functions are
>* located at the beginning of the structure, along with the pointer to
>* where all the data elements for the particular device are stored in
>shared
>* memory. This split allows the function pointer and driver data to be
>per-
>* process, while the actual configuration data for the device is shared.
>*/
>struct rte_pkt_dev {
> RTE_PKT_COMMON_DEV(pkt);
>};
>
>/**
>* @internal
>* The data part, with no function pointers, associated with each device.
>*
>* This structure is safe to place in shared memory to be common among
>different
>* processes in a multi-process configuration.
>*/
>struct rte_pkt_dev_data {
> RTE_PKT_COMMON_DEV_DATA;
>};
>
>------
>
>The existing ethdev code can then have a minor updates such as those shown
>below:
>
>struct rte_eth_dev_info {
> RTE_PKT_COMMON_DEV_INFO;
>
> /* Private device data maybe here */
> uint32_t min_rx_bufsize; /**< Minimum size of RX buffer. */
> uint32_t max_rx_pktlen; /**< Maximum configurable length of RX pkt. */
> ...
>
>struct rte_eth_dev_data {
> RTE_PKT_COMMON_DEV_DATA; /**< Define located in <rte_pkt.h> */
>
> /* Private device data maybe here */
> struct rte_eth_dev_sriov sriov; /**< SRIOV data */
>
> struct rte_eth_link dev_link; /**< Link-level information & status */
> ...
>
>struct rte_eth_dev {
> RTE_PKT_COMMON_DEV(eth);
> /* Private device data maybe here */
>};
>
>/* Bit defines for flags in common pkt structure */
>#define promiscuous 0x0008 /**< RX promiscuous mode ON(1) / OFF(0).
>*/
>#define scattered_rx 0x0004 /**< RX of scattered packets is ON(1) /
>OFF(0) */
>#define all_multicast 0x0002 /**< RX all multicast mode ON(1) /
>OFF(0). */
>#define dev_started 0x0001 /**< Device state: STARTED(1)/STOPPED(0)
>*/
>
>The advantage of doing a common set of member is the existing ethdev
>structures and APIs can remain exactly the same, but every ethdev is also
>a pktdev, which can be used as either as appropriate. Similarly for a type
>of crypto devices, or dpi devices (or software rings or KNI devices, if we
>so desire), we can base them off this common minimal framework and use
>them all in a similar manner.
>
>Moving some basic common functions and structures into a common set of
>files gives everyone a clean starting point for a new device plus adding a
>light framework. The pktdev code is normally not called directly from the
>application, but called from the device itself via a define in the device
>header files. The pktdev RX/TX routines can be called from the
>application, but the application needs to get the device structure pointer
>based on the port id.
>
>The cryptodev API maybe very different from other devices and following
>some type of Open Crypto API. The goal is not to restrict the device API,
>but try to give some type of structure to tghe design. Does it make sense
>to have a mbuf based Rx/Tx API, maybe not. Could the mbuf based APIs be
>hidden in the pktdev code, very possible. We have a lot of options here.
Adding some modified API looking very close to the OpenCrypto API or Linux
Kernel Crypto API is a good way to extend and create a common API for
crypto in DPDK. Following a standard set of APIs should help adoption.
>
>How the two Rx/Tx routines are defined:
>---------------------------------------
>
>/**
> *
> * Retrieve a burst of input packets from a receive queue of an Ethernet
> * device.
> *
><SNIP>
> */
>#define rte_eth_rx_burst(_pid, _qid, _pkts, _nb_pkts) \
> rte_pkt_rx_burst((struct rte_pkt_dev *)&rte_eth_devices[_pid], _qid,
>_pkts, _nb_pkts)
>
>/**
> * Send a burst of output packets on a transmit queue of an Ethernet
>device.
> *
><SNIP>
> */
>#define rte_eth_tx_burst(_pid, _qid, _pkts, _nb_pkts) \
> rte_pkt_tx_burst((struct rte_pkt_dev *)&rte_eth_device[_pid], _qid,
>_pkts, _nb_pkts)
>
>A snip of code showing some advantages and use case of using pktdev API:
>------------------------------------------------------------------------
>
>Not the complete code and it has not been tested and is only an example
>how one could use the design.
>
>/*
> * The lcore main. This is the main thread that does the work, reading
>from
> * an input port and writing to an output port.
> */
>static __attribute__((noreturn)) void
>do_work(const struct pipeline_params *p)
>{
> printf("\nCore %u forwarding packets. %s -> %s\n",
> rte_lcore_id(),
> p->src->data->name,
> p->dst->data->name);
>
> /* Run until the application is quit or killed. */
> for (;;) {
> /*
> * Receive packets on a src device and forward them on out
> * the dst device.
> */
> /* Get burst of RX packets, from first port of pair. */
> struct rte_mbuf *bufs[BURST_SIZE];
> const uint16_t nb_rx = rte_pkt_rx_burst(p->src, 0,
> bufs, BURST_SIZE);
>
> if (unlikely(nb_rx == 0))
> continue;
>
> /* Send burst of TX packets, to second port of pair. */
> const uint16_t nb_tx = rte_pkt_tx_burst(p->dst, 0,
> bufs, nb_rx);
>
> /* Free any unsent packets. */
> if (unlikely(nb_tx < nb_rx)) {
> uint16_t buf;
> for (buf = nb_tx; buf < nb_rx; buf++)
> rte_pktmbuf_free(bufs[buf]);
> }
> }
>}
>
>/*
> * The main function, which does initialization and calls the per-lcore
> * functions.
> */
>int
>main(int argc, char *argv[])
>{
> struct pipeline_params p[RTE_MAX_LCORE];
> struct rte_mempool *mbuf_pool;
> unsigned nb_ports, lcore_id;
> uint8_t portid;
>
> /* Initialize the Environment Abstraction Layer (EAL). */
> int ret = rte_eal_init(argc, argv);
> if (ret < 0)
> rte_exit(EXIT_FAILURE, "Error with EAL initialization\n");
>
> argc -= ret;
> argv += ret;
>
> /* Check that there is an even number of ports to send/receive on. */
> nb_ports = rte_eth_dev_count();
> if (nb_ports < 2 || (nb_ports & 1))
> rte_exit(EXIT_FAILURE, "Error: number of ports must be even\n");
>
> /* Creates a new mempool in memory to hold the mbufs. */
> mbuf_pool = rte_mempool_create("MBUF_POOL",
> NUM_MBUFS * nb_ports,
> MBUF_SIZE,
> MBUF_CACHE_SIZE,
> sizeof(struct rte_pktmbuf_pool_private),
> rte_pktmbuf_pool_init, NULL,
> rte_pktmbuf_init, NULL,
> rte_socket_id(),
> 0);
>
> if (mbuf_pool == NULL)
> rte_exit(EXIT_FAILURE, "Cannot create mbuf pool\n");
>
> /* Initialize all ports. */
> for (portid = 0; portid < nb_ports; portid++)
> if (port_init(portid, mbuf_pool) != 0)
> rte_exit(EXIT_FAILURE, "Cannot init port %"PRIu8 "\n",
> portid);
>
> struct rte_pkt_dev *in = rte_eth_get_dev(0);
> RTE_LCORE_FOREACH_SLAVE(lcore_id){
> char name[RTE_RING_NAMESIZE];
> snprintf(name, sizeof(name), "RING_from_%u", lcore_id);
> struct rte_pkt_dev *out = rte_ring_get_dev(
> rte_ring_create(name, 4096, rte_socket_id(), 0));
>
> p[lcore_id].src = in;
> p[lcore_id].dst = out;
> rte_eal_remote_launch((lcore_function_t *)do_work,
> &p[lcore_id], lcore_id);
> in = out; // next pipeline stage reads from my output.
> }
> //now finish pipeline on master lcore
> lcore_id = rte_lcore_id();
> p[lcore_id].src = in;
> p[lcore_id].dst = rte_eth_get_dev(1);
> do_work(&p[lcore_id]);
>
> return 0;
>}
>
>
>Changes to rte_ethdev.[ch]
>--------------------------
>
>The most changes to rte_ethdev.[ch] was to use the new defines from
>rte_pkt.h. All of the references to the globals in ethdev had to be
>replaced with a reference to a global structure in ethdev. Moving the
>global or private data into a device specific structure seemed reasonable
>to reduce name space issues with new devices. The rx_burst/tx_burst
>routines were removed as they now exist in the rte_pktdev.c file. If we
>use nested structures instead of macros then more of the code will need to
>be converted or macros used to convert the members to address the nested
>structures.
>
>Example:
>#define rx_pkt_burst dev_data.rx_pkt_burst
>#define tx_pkt_burst dev_data.tx_pkt_burst
>
>
>Impact to Existing Applications
>-------------------------------
>
>None. The existing APIs should all remain unchanged, only the underlying
>library code needs to change. [Obviously changes to apps will be needed to
>take advantage of new device classes as we make them available].
>
>The crypto API could be similar to the Open Crypto APIs and they seem
>reasonable, but also using mbufs to hold data is just trying to use that
>container type to provide some common structure to the system. Some of the
>crypto data with be in the form of packets and some in the form of chunks
>of data, which the API should account for in the design.
>
>My goal is to provide a light weight framework for adding more devices and
>not try to make everthing look like Ethernet device.
>
>Regards,
>++Keith and Bruce
>
>
More information about the dev
mailing list