[dpdk-dev] [RFC] Adding multiple device types to DPDK.

Wiles, Keith keith.wiles at intel.com
Thu Apr 2 16:16:27 CEST 2015


Hi All, just to make a comment on my own email :-)

On 4/1/15, 7:44 AM, "Wiles, Keith" <keith.wiles at intel.com> wrote:

>Hi all, (hoping format of the text is maintained)
>
>Bruce and myself are submitting this RFC in hopes of providing discussion
>points for the idea. Please do not get carried away with the code
>included, it was to help everyone understand the proposal/RFC.
>
>The RFC is to describe a proposed change we are looking to make to DPDK to
>add more device types. We would like to add in to DPDK the idea of a
>generic packet-device or ³pktdev², which can be thought of as a thin layer
>for all device classes. For other device types such as potentially a
>³cryptodev² or ³dpidev². One of the main goals is to not effect
>performance and not require any current application to be modified. The
>pktdev layer is providing a light framework for developers to add a device
>to DPDK.
>
>Reason for Change
>-----------------
>
>The reason why we are looking to introduce these concepts to DPDK are:
>
>* Expand the scope of DPDK so that it can provide APIs for more than just
>packet acquisition and transmission, but also provide APIs that can be
>used to work with other hardware and software offloads, such as
>cryptographic accelerators, or accelerated libraries for cryptographic
>functions. [The reason why both software and hardware are mentioned is so
>that the same APIs can be used whether or not a hardware accelerator is
>actually available].
>* Provide a minimal common basis for device abstraction in DPDK, that can
>be used to unify the different types of packet I/O devices already
>existing in DPDK. To this end, the ethdev APIs are a good starting point,
>but the ethdev library contains too many functions which are NIC-specific
>to be a general-purpose set of APIs across all devices.
>     Note: The idea was previously touched on here:
>http://permalink.gmane.org/gmane.comp.networking.dpdk.devel/13545
>
>Description of Proposed Change
>------------------------------
>
>The basic idea behind "pktdev" is to abstract out a few common routines
>and structures/members of structures by starting with ethdev structures as
>a starting point, cut it down to little more than a few members in each
>structure then possible add just rx_burst and tx_burst. Then use the
>structures as a starting point for writing a device type. Currently we
>have the rx_burst/tx_burst routines moved to the pktdev and it see like
>move a couple more common functions maybe resaonable. It could be the
>Rx/Tx routines in pktdev should be left as is, but in the code below is a
>possible reason to abstract a few routines into a common set of files.
>
>From there, we have the ethdev type which adds in the existing functions
>specific to Ethernet devices, and also, for example, a cryptodev which may
>add in functions specific for cryptographic offload. As now, with the
>ethdev, the specific drivers provide concrete implementations of the
>functionality exposed by the interface. This hierarchy is shown in the
>diagram below, using the existing ethdev and ixgbe drivers as a reference,
>alongside a hypothetical cryptodev class and driver implementation
>(catchingly called) "X":
>
>                    ,---------------------.
>                    | struct rte_pktdev   |
>                    +---------------------+
>                    | rte_pkt_rx_burst()  |
>            .-------| rte_pkt_tx_burst()  |-----------.
>            |       `---------------------'           |
>            |                                         |
>            |                                         |
>  ,-------------------------------.    ,------------------------------.
>  |    struct rte_ethdev          |    |      struct rte_cryptodev    |
>  +-------------------------------+    +------------------------------+
>  | rte_eth_dev_configure()       |    | rte_crypto_init_sym_session()|
>  | rte_eth_allmulticast_enable() |    | rte_crypto_del_sym_session() |
>  | rte_eth_filter_ctrl()         |    |                              |
>  `-------------------------------'    `---------------.--------------'
>            |                                          |
>            |                                          |
>  ,---------'---------------------.    ,---------------'--------------.
>  |    struct rte_pmd_ixgbe       |    |      struct rte_pmd_X        |
>  +-------------------------------+    +------------------------------+
>  | .configure -> ixgbe_configure |    | .init_session -> X_init_ses()|
>  | .tx_burst  -> ixgbe_xmit_pkts |    | .tx_burst -> X_handle_pkts() |
>  `-------------------------------'    `------------------------------'
>
>We are not attempting to create a real class model here only looking at
>creating a very basic common set of APIs and structures for other device
>types.
>
>In terms of code changes for this, we obviously need to add in new
>interface libraries for pktdev and cryptodev. The pktdev library can
>define a skeleton structure for the first few elements of the nested
>structures to ensure consistency. Each of the defines below illustrate the
>common members in device structures, which gives some basic structure the
>device framework. Each of the defines are placed at the top of the devices
>matching structures and allows the devices to contain common and private
>data. The pkdev structures overlay the first common set of members for
>each device type.
>
>For example:
>------------
>
>We are using macros to reduce code changes to DPDK, but nested structures
>are a better solution:
>
>#define RTE_PKT_COMMON_DEV(_t)
>                 \
>    pkt_rx_burst_t              rx_pkt_burst;   /**< Pointer to PMD
>receive function. */    \
>    pkt_tx_burst_t              tx_pkt_burst;   /**< Pointer to PMD
>transmit function. */   \
>    struct rte_##_t##_dev_data  *data;          /**< Pointer to device
>data */              \
>    const struct _t##_driver    *driver;        /**< Driver for this
>device */              \
>    struct _t##_dev_ops         *dev_ops;       /**< Functions exported by
>PMD */           \
>    struct rte_pci_device       *pci_dev;       /**< PCI info. supplied by
>probing */       \
>    /** User application callback for interrupts if present */
>                 \
>    struct rte_##_t##_dev_cb_list   link_intr_cbs;
>                 \
>    /**           
>                 \
>     * User-supplied functions called from rx_burst to post-process
>                 \
>     * received packets before passing them to the user
>                 \
>     */           
>                 \
>    struct rte_##_t##_rxtx_callback **post_rx_burst_cbs;
>                 \
>    /**           
>                 \
>     * User-supplied functions called from tx_burst to pre-process
>                 \
>     * received packets before passing them to the driver for
>transmission.                 \
>     */           
>                 \
>    struct rte_##_t##_rxtx_callback **pre_tx_burst_cbs;
>                 \
>    enum rte_pkt_dev_type       dev_type;       /**< Flag indicating the
>device type */     \
>    uint8_t                     attached        /**< Flag indicating the
>port is attached */
>    /* Possible alignment or a hole in the structure */
>
>#define RTE_PKT_NAME_MAX_LEN (32)
>
>#define RTE_PKT_COMMON_DEV_DATA
>     \
>    char name[RTE_PKT_NAME_MAX_LEN]; /**< Unique identifier name */
>     \
>                  
>     \
>    void **rx_queues;               /**< Array of pointers to RX queues.
>*/     \
>    void **tx_queues;               /**< Array of pointers to TX queues.
>*/     \
>    uint16_t nb_rx_queues;          /**< Number of RX queues. */
>     \
>    uint16_t nb_tx_queues;          /**< Number of TX queues. */
>     \
>                  
>     \
>    uint16_t flags;                 /**< Bit fields for xyzdev's to use.
>*/     \
>    uint16_t mtu;                   /**< Maximum Transmission Unit. */
>     \
>    uint8_t unit_id;                /**< Unit ID for this instance */
>     \
>    uint8_t _filler[7];             /* alignment filler */
>     \
>                  
>     \
>    /* 64bit alignment starts here */
>     \
>    void    *dev_private;           /**< PMD-specific private data */
>     \
>    uint64_t rx_mbuf_alloc_failed;  /**< RX ring mbuf allocation failures.
>*/   \
>    uint32_t min_rx_buf_size;       /**< Common rx buffer size handled by
>all queues */ \
>    uint32_t _pad0
>
>#define port_id     unit_id
>
>#define RTE_PKT_COMMON_DEV_INFO
>     \
>    struct rte_pci_device   *pci_dev;       /**< Device PCI information.
>*/     \
>    const char              *driver_name;   /**< Device Driver name. */
>     \
>    unsigned int            if_index;       /**< Index to bound host
>interface, or 0 if none. */ \
>        /* Use if_indextoname() to translate into an interface name. */
>     \
>    uint32_t _pad0
>
>The above is attempting to collect the common members to be place into the
>top of private device structures as we feel these members should be fairly
>common among the device types.
>
>/**
>* @internal
>* The generic data structure associated with each device.
>*
>* Pointers to burst-oriented packet receive and transmit functions are
>* located at the beginning of the structure, along with the pointer to
>* where all the data elements for the particular device are stored in
>shared
>* memory. This split allows the function pointer and driver data to be
>per-
>* process, while the actual configuration data for the device is shared.
>*/
>struct rte_pkt_dev {
>    RTE_PKT_COMMON_DEV(pkt);
>};
>
>/**
>* @internal
>* The data part, with no function pointers, associated with each device.
>*
>* This structure is safe to place in shared memory to be common among
>different
>* processes in a multi-process configuration.
>*/
>struct rte_pkt_dev_data {
>    RTE_PKT_COMMON_DEV_DATA;
>};
>
>------
>
>The existing ethdev code can then have a minor updates such as those shown
>below:
>
>struct rte_eth_dev_info {
>    RTE_PKT_COMMON_DEV_INFO;
>
>    /* Private device data maybe here */
>    uint32_t min_rx_bufsize; /**< Minimum size of RX buffer. */
>    uint32_t max_rx_pktlen; /**< Maximum configurable length of RX pkt. */
>    ...
>
>struct rte_eth_dev_data {
>    RTE_PKT_COMMON_DEV_DATA; /**< Define located in <rte_pkt.h> */
>
>    /* Private device data maybe here */
>    struct rte_eth_dev_sriov sriov; /**< SRIOV data */
>
>    struct rte_eth_link dev_link; /**< Link-level information & status */
>    ...
>
>struct rte_eth_dev {
>    RTE_PKT_COMMON_DEV(eth);
>    /* Private device data maybe here */
>};
>
>/* Bit defines for flags in common pkt structure */
>#define promiscuous     0x0008   /**< RX promiscuous mode ON(1) / OFF(0).
>*/
>#define scattered_rx    0x0004   /**< RX of scattered packets is ON(1) /
>OFF(0) */
>#define all_multicast   0x0002   /**< RX all multicast mode ON(1) /
>OFF(0). */
>#define dev_started     0x0001   /**< Device state: STARTED(1)/STOPPED(0)
>*/ 
>
>The advantage of doing a common set of member is the existing ethdev
>structures and APIs can remain exactly the same, but every ethdev is also
>a pktdev, which can be used as either as appropriate. Similarly for a type
>of crypto devices, or dpi devices (or software rings or KNI devices, if we
>so desire), we can base them off this common minimal framework and use
>them all in a similar manner.
>
>Moving some basic common functions and structures into a common set of
>files gives everyone a clean starting point for a new device plus adding a
>light framework. The pktdev code is normally not called directly from the
>application, but called from the device itself via a define in the device
>header files. The pktdev RX/TX routines can be called from the
>application, but the application needs to get the device structure pointer
>based on the port id.
>
>The cryptodev API maybe very different from other devices and following
>some type of Open Crypto API. The goal is not to restrict the device API,
>but try to give some type of structure to tghe design. Does it make sense
>to have a mbuf based Rx/Tx API, maybe not. Could the mbuf based APIs be
>hidden in the pktdev code, very possible. We have a lot of options here.

Adding some modified API looking very close to the OpenCrypto API or Linux
Kernel Crypto API is a good way to extend and create a common API for
crypto in DPDK. Following a standard set of APIs should help adoption.
>
>How the two Rx/Tx routines are defined:
>---------------------------------------
>
>/**
> *
> * Retrieve a burst of input packets from a receive queue of an Ethernet
> * device.
> *
><SNIP>
> */
>#define rte_eth_rx_burst(_pid, _qid, _pkts, _nb_pkts) \
>    rte_pkt_rx_burst((struct rte_pkt_dev *)&rte_eth_devices[_pid], _qid,
>_pkts, _nb_pkts)
>
>/**
> * Send a burst of output packets on a transmit queue of an Ethernet
>device.
> *
><SNIP>
> */
>#define rte_eth_tx_burst(_pid, _qid, _pkts, _nb_pkts) \
>    rte_pkt_tx_burst((struct rte_pkt_dev *)&rte_eth_device[_pid], _qid,
>_pkts, _nb_pkts)
>
>A snip of code showing some advantages and use case of using pktdev API:
>------------------------------------------------------------------------
>
>Not the complete code and it has not been tested and is only an example
>how one could use the design.
>
>/*
> * The lcore main. This is the main thread that does the work, reading
>from
> * an input port and writing to an output port.
> */
>static __attribute__((noreturn)) void
>do_work(const struct pipeline_params *p)
>{
>    printf("\nCore %u forwarding packets. %s -> %s\n",
>            rte_lcore_id(),
>            p->src->data->name,
>            p->dst->data->name);
>
>    /* Run until the application is quit or killed. */
>    for (;;) {
>        /*
>         * Receive packets on a src device and forward them on out
>         * the dst device.
>         */
>        /* Get burst of RX packets, from first port of pair. */
>        struct rte_mbuf *bufs[BURST_SIZE];
>        const uint16_t nb_rx = rte_pkt_rx_burst(p->src, 0,
>                bufs, BURST_SIZE);
>
>        if (unlikely(nb_rx == 0))
>            continue;
>
>        /* Send burst of TX packets, to second port of pair. */
>        const uint16_t nb_tx = rte_pkt_tx_burst(p->dst, 0,
>                bufs, nb_rx);
>
>        /* Free any unsent packets. */
>        if (unlikely(nb_tx < nb_rx)) {
>            uint16_t buf;
>            for (buf = nb_tx; buf < nb_rx; buf++)
>                rte_pktmbuf_free(bufs[buf]);
>        }
>    }
>}
>
>/*
> * The main function, which does initialization and calls the per-lcore
> * functions.
> */
>int
>main(int argc, char *argv[])
>{
>    struct pipeline_params p[RTE_MAX_LCORE];
>    struct rte_mempool *mbuf_pool;
>    unsigned nb_ports, lcore_id;
>    uint8_t portid;
>
>    /* Initialize the Environment Abstraction Layer (EAL). */
>    int ret = rte_eal_init(argc, argv);
>    if (ret < 0)
>        rte_exit(EXIT_FAILURE, "Error with EAL initialization\n");
>
>    argc -= ret;
>    argv += ret;
>
>    /* Check that there is an even number of ports to send/receive on. */
>    nb_ports = rte_eth_dev_count();
>    if (nb_ports < 2 || (nb_ports & 1))
>        rte_exit(EXIT_FAILURE, "Error: number of ports must be even\n");
>
>    /* Creates a new mempool in memory to hold the mbufs. */
>    mbuf_pool = rte_mempool_create("MBUF_POOL",
>                       NUM_MBUFS * nb_ports,
>                       MBUF_SIZE,
>                       MBUF_CACHE_SIZE,
>                       sizeof(struct rte_pktmbuf_pool_private),
>                       rte_pktmbuf_pool_init, NULL,
>                       rte_pktmbuf_init,      NULL,
>                       rte_socket_id(),
>                       0);
>
>    if (mbuf_pool == NULL)
>        rte_exit(EXIT_FAILURE, "Cannot create mbuf pool\n");
>
>    /* Initialize all ports. */
>    for (portid = 0; portid < nb_ports; portid++)
>        if (port_init(portid, mbuf_pool) != 0)
>            rte_exit(EXIT_FAILURE, "Cannot init port %"PRIu8 "\n",
>                    portid);
>
>    struct rte_pkt_dev *in = rte_eth_get_dev(0);
>    RTE_LCORE_FOREACH_SLAVE(lcore_id){
>        char name[RTE_RING_NAMESIZE];
>        snprintf(name, sizeof(name), "RING_from_%u", lcore_id);
>        struct rte_pkt_dev *out = rte_ring_get_dev(
>                rte_ring_create(name, 4096, rte_socket_id(), 0));
>
>        p[lcore_id].src = in;
>        p[lcore_id].dst = out;
>        rte_eal_remote_launch((lcore_function_t *)do_work,
>                &p[lcore_id], lcore_id);
>        in = out; // next pipeline stage reads from my output.
>    }
>    //now finish pipeline on master lcore
>    lcore_id = rte_lcore_id();
>    p[lcore_id].src = in;
>    p[lcore_id].dst = rte_eth_get_dev(1);
>    do_work(&p[lcore_id]);
>
>    return 0;
>}
>
>
>Changes to rte_ethdev.[ch]
>--------------------------
>
>The most changes to rte_ethdev.[ch] was to use the new defines from
>rte_pkt.h. All of the references to the globals in ethdev had to be
>replaced with a reference to a global structure in ethdev. Moving the
>global or private data into a device specific structure seemed reasonable
>to reduce name space issues with new devices. The rx_burst/tx_burst
>routines were removed as they now exist in the rte_pktdev.c file. If we
>use nested structures instead of macros then more of the code will need to
>be converted or macros used to convert the members to address the nested
>structures.
>
>Example:
>#define rx_pkt_burst    dev_data.rx_pkt_burst
>#define tx_pkt_burst    dev_data.tx_pkt_burst
>
>
>Impact to Existing Applications
>-------------------------------
>
>None. The existing APIs should all remain unchanged, only the underlying
>library code needs to change. [Obviously changes to apps will be needed to
>take advantage of new device classes as we make them available].
>
>The crypto API could be similar to the Open Crypto APIs and they seem
>reasonable, but also using mbufs to hold data is just trying to use that
>container type to provide some common structure to the system. Some of the
>crypto data with be in the form of packets and some in the form of chunks
>of data, which the API should account for in the design.
>
>My goal is to provide a light weight framework for adding more devices and
>not try to make everthing look like Ethernet device.
>
>Regards,
>++Keith and Bruce
>
>



More information about the dev mailing list