[dpdk-dev] [RFC PATCH 0/4] pktdev

Wiles, Keith keith.wiles at intel.com
Fri Apr 17 21:50:40 CEST 2015


Hi Marc and Bruce,

On 4/17/15, 1:49 PM, "Marc Sune" <marc.sune at bisdn.de> wrote:

>
>
>On 17/04/15 17:16, Bruce Richardson wrote:
>> Hi all,
>>
>> to continue this discussion a bit more, here is my, slightly different,
>>slant
>> on what a pktdev abstraction may look like.
>>
>> The primary objective I had in mind when drafting this is to provide the
>> minimal abstraction that can be *easily* used as a common device
>>abstraction for
>> existing (and future) device types to be passed to dataplane code. The
>>patchset
>> demonstrates this by defining a minimal interface for pktdev - since I
>>firmly
>> believe the interface should be as small as possible - and then showing
>>how that
>> common interface can be used to unify rings and ethdevs under a common
>>API for the
>> datapath. I believe any attempt to unify things much beyond this to the
>>control
>> plane or setup phase is not worth doing - at least not initially - as at
>> init time the code always needs to be aware of the underlying resource
>>type in
>> order to configure it properly for dataplane use.
>>
>> The overall objective I look to achieve is illustrated by the final
>>patch in
>> the series, which is a sample app where the same code is used for all
>>cores,
>> irrespective of the underlying device type.
>>
>> To get to that point, patch 1 defines the minimal API - just RX and TX.
>>The .c
>> file in the library is empty for simplicity, though I would see some
>> functionality moving there when/if it makes sense e.g. the callback
>>support
>> from ethdev, as is done in Keith's patchset.
>> Patch 2 then makes very minimal changes to ethdev to allow ethdevs to
>>be used
>> as pktdevs, and to make use of the pktdev functions when appropriate
>> Patch 3 was, for me, the key test for this implementation - how hard
>>was it to
>> make an rte_ring usable as a pktdev too. Two single-line functions for
>>RX/TX
>> and a separate "converter" function proved to be all that was necessary
>>here -
>> and I believe simpler solutions may be possible too, as the extra
>>structures
>> allocated on conversion could be merged into the rte_ring structure
>>itself and
>> initialized on ring creation if we prefer that option. It is
>>hoped/presumed that
>> wrapping other structures, such as KNI, may prove to be just as easily
>>done.
>> [Not attempted yet - left as an exercise for the reader :-)].
>>
>> Now, in terms of pktdev vs ethdev, there is nothing in this proposal
>>that
>> cannot also be done using ethdev AFAIK. However, pktdev as outlined here
>> should make the process far easier than trying to create a full PMD for
>>something.
>> All NIC specific functions, including things like stop/start, are
>>stripped out,
>> as they don't make sense for an rte_ring or other software objects.
>> Also, the other thing this provides is that we can move away from just
>>using
>> port ids. Instead in the same way as we now reference
>>rings/mempools/KNIs etc
>> via pointer, we can do the same with ethernet ports as pktdevs on the
>>data path.
>> There was discussion previously on moving beyond 8-bit port ids. If we
>>look to
>> use ethdev as a common abstraction, I feel that change will soon have
>>to be made
>> causing a large amount of code churn.
>
>Hi Richard,
>
>First thank you both for taking the time to look at this. I did not not
>reply to Keith because you Richard summarized most of my concerns.
>
>I had a brief look to this second proposal. It is more aligned to what I
>had in mind. But still I feel it is slightly too complicated. I don't
>like much the necessary (in your approach) MACRO-like pkt_dev_data
>struct. It is also slightly inconvenient that the user has to do:
>
>+	struct rte_pkt_dev *in = rte_eth_get_dev(0);
>
>+		struct rte_pkt_dev *out = rte_ring_get_dev(
>+				rte_ring_create(name, 4096, rte_socket_id(), 0));
>
>
>
>What about something like (~pseudo-code):
>
>rte_pkt_dev_data.h:
>
>    enum rte_pkt_dev_type{
>         RTE_PKT_DEV_ETH,
>         RTE_PKT_DEV_RING,
>         RTE_PKT_DEV_KNI,
>         //Keep adding as more PMDs are supported
>    };
>
>
>    //This struct may be redundant if there is nothing more
>    struct rte_pkt_dev_data{
>         enum rte_pkt_dev_type;
>         //Placeholder, maybe we need more...
>    };
>
>    //Make RX/TX pktdev APIs more readable, but not really needed
>    typedef void pkt_dev_t;
>
>(In all PMDs and e.g. KNI and RINGs):
>
>  struct rte_eth_dev {
>     struct rte_pkt_dev_data pkt_dev;//
><++++++++++++++++++++++++++++++++++
>     eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive function. */
>     eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit function.
>*/
>     struct rte_eth_dev_data *data;  /**< Pointer to device data */
>     /...
>
>rte_pkt_dev.h:
>
>       #include <rte_ethdev.h>
>       //Include PMD (and non-PMD) TX/RX headers...
>
>    static inline uint16_t
>    rte_pkt_tx_burst(pkt_dev_t* dev, uint16_t queue_id,
>              struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
>    {
>         switch (((struct rte_pkt_dev_data*)dev)->type){
>             case  RTE_PKT_DEV_ETH:
>                 struct rte_eth_dev* eth_dev = (struct
>rte_eth_dev*)pkt_dev;
>                 rte_pkt_tx_burst(eth_dev, queue_id, tx_pkts, nb_pkts);
>                 break;
>             case RTE_PKT_DEV_RING:
>                 //...
>         }
>    }
>
>    //...

I do not see the functional different from my proposal other then a bit
more search/replace, which does effect some of the PMD¹s. The changes to
the PMD is just to use the macros as we now have a nested structure.

The nesting and movement some structures and code to a common location is
to provide a better way to add other devices. Moving code into a common
set of structures and APIs should only improve the code as we get more
devices added to DPDK. The basic flow and design is the same.

>From rte_common_device.h (format screwed with cut/paste and not all of the
code)


struct rte_dev_data {
	char name[RTE_DEV_NAME_MAX_LEN]; /**< Unique identifier name */

	void	**rx_queues;            /**< Array of pointers to RX queues. */
	void	**tx_queues;            /**< Array of pointers to TX queues. */

	uint16_t nb_rx_queues;          /**< Number of RX queues. */
	uint16_t nb_tx_queues;          /**< Number of TX queues. */

	uint16_t mtu;                   /**< Maximum Transmission Unit. */
	uint8_t  unit_id;               /**< Unit/Port ID for this instance */
	uint8_t  _pad0;             	/* alignment filler */

	void	*dev_private;           /**< PMD-specific private data */
	uint64_t rx_mbuf_alloc_failed;  /**< RX ring mbuf allocation failures. */
	uint32_t min_rx_buf_size;       /**< Common rx buffer size handled by all
queues */
};

struct rte_dev_info {
	struct rte_pci_device   *pci_dev;   /**< Device PCI information. */
	const char      *driver_name;   	/**< Device Driver name. */
	unsigned int    if_index;       	/**< Index to bound host interface, or 0
if none. */
	/* Use if_indextoname() to translate into an interface name. */
	unsigned int	_pad0;
};



struct rte_dev {
	dev_rx_burst_t          rx_pkt_burst;   /**< Pointer to PMD receive
function. */
	dev_tx_burst_t          tx_pkt_burst;   /**< Pointer to PMD transmit
function. */
	void				  	*data;      	/**< Pointer to device data */
	struct rte_dev_global	*gbl_data;      /**< Pointer to device global data
*/
	const struct rte_dev_drv *driver;       /**< Driver for this device */
	void					*dev_ops;       /**< Functions exported by PMD */
	struct rte_pci_device   *pci_dev;       /**< PCI info. supplied by
probing */
	struct rte_dev_info		*dev_info;		/**< Pointer to the device info
structure */

	/** User application callback for interrupts if present */
	struct rte_dev_cb_list  link_intr_cbs;
	/**
	* User-supplied functions called from rx_burst to post-process
	* received packets before passing them to the user
	*/
	struct rte_dev_rxtx_callback **post_rx_burst_cbs;
	/**
	* User-supplied functions called from tx_burst to pre-process
	* received packets before passing them to the driver for transmission.
	*/
	struct rte_dev_rxtx_callback **pre_tx_burst_cbs;
	enum rte_dev_type		dev_type;       /**< Flag indicating the device type */
	uint8_t                 attached;        /**< Flag indicating the port is
attached */
};

>From rte_ethdev.h

struct rte_eth_dev_info {
	struct rte_dev_info	di;	/**< Common device information */

	/* Rest of structure is for private device data */
	uint32_t min_rx_bufsize; /**< Minimum size of RX buffer. */
	uint32_t max_rx_pktlen; /**< Maximum configurable length of RX pkt. */
	uint16_t max_rx_queues; /**< Maximum number of RX queues. */
	uint16_t max_tx_queues; /**< Maximum number of TX queues. */
	uint32_t max_mac_addrs; /**< Maximum number of MAC addresses. */
	uint32_t max_hash_mac_addrs;
	/** Maximum number of hash MAC addresses for MTA and UTA. */
	uint16_t max_vfs; /**< Maximum number of VFs. */
	uint16_t max_vmdq_pools; /**< Maximum number of VMDq pools. */
	uint32_t rx_offload_capa; /**< Device RX offload capabilities. */
	uint32_t tx_offload_capa; /**< Device TX offload capabilities. */
	uint16_t reta_size;
	/**< Device redirection table size, the total number of entries. */
	/** Bit mask of RSS offloads, the bit offset also means flow type */
	uint64_t flow_type_rss_offloads;
	struct rte_eth_rxconf default_rxconf; /**< Default RX configuration */
	struct rte_eth_txconf default_txconf; /**< Default TX configuration */
	uint16_t vmdq_queue_base; /**< First queue ID for VMDQ pools. */
	uint16_t vmdq_queue_num; /**< Queue number for VMDQ pools. */
	uint16_t vmdq_pool_base; /**< First ID of VMDQ pools. */
};

struct rte_eth_dev_data {
	struct rte_dev_data		dd;			/**< Common device data */

	/* Rest of the structure is private data */
	struct rte_eth_dev_sriov sriov;    /**< SRIOV data */

	struct rte_eth_link dev_link;
	/**< Link-level information & status */

	struct rte_eth_conf dev_conf;   /**< Configuration applied to device. */
	struct ether_addr* mac_addrs;	/**< Device Ethernet Link address. */
	uint64_t mac_pool_sel[ETH_NUM_RECEIVE_MAC_ADDR];
	/** bitmap array of associating Ethernet MAC addresses to pools */
	struct ether_addr* hash_mac_addrs;

	/** Device Ethernet MAC addresses of hash filtering. */
	uint8_t promiscuous  : 1,   /**< RX promiscuous mode ON(1) / OFF(0). */
		scattered_rx : 1,	/**< RX of scattered packets is ON(1) / OFF(0) */
		all_multicast: 1,   /**< RX all multicast mode ON(1) / OFF(0). */
		dev_started  : 1;   /**< Device state: STARTED(1) / STOPPED(0). */
};

In the case of rte_ethdev.h the rte_eth_dev turns into a macro '#define
rte_eth_dev rte_dev', in other devices they may have a rte_xyz_dev
structure rte_dev as the first member and then private members. Also later
if we find something private to ethdev then a structure can be created.


Most of the code changes outside of these changes are because using a
macro to fix the code was easier. Rewriting the functions to use real
variables instead of casting void pointer could yield without the macros.
Not sure as the performance impact in rewriting some of these functions to
eliminate the macros.

This is the heart of what I am proposing and it just so happens to allow
Bruce¹ ideas as well. I have another patch I have not sent that includes
Bruce¹s ideas using my suggestions.

Being able to use a single API for all devices does have its benefits IMO,
but I want to cleanup ethdev to allow for other devices to be added to the
system.

At some point (when the cryptodev is done) I want to add it to this
framework, plus add more APIs for crypto similar to Linux Crypto API or
something we can agree is a reasonable set of APIs for DPDK.


Regards,

++Keith
>
>
>Maybe it is oversimplified? With this approach tough, we would barely
>touch the PMDs and the functionality can be built on top of the existing
>PMDs.
>
>Thoughts?
>Marc
>
>
>>
>> Bruce Richardson (4):
>>    Add example pktdev implementation
>>    Make ethdev explicitly a subclass of pktdev
>>    add support for a ring to be a pktdev
>>    example app showing pktdevs used in a chain
>>
>>   config/common_bsdapp           |   5 +
>>   config/common_linuxapp         |   5 +
>>   examples/pktdev/Makefile       |  57 +++++++++++
>>   examples/pktdev/basicfwd.c     | 222
>>+++++++++++++++++++++++++++++++++++++++++
>>   lib/Makefile                   |   1 +
>>   lib/librte_ether/rte_ethdev.h  |  26 ++---
>>   lib/librte_pktdev/Makefile     |  56 +++++++++++
>>   lib/librte_pktdev/rte_pktdev.c |  35 +++++++
>>   lib/librte_pktdev/rte_pktdev.h | 144 ++++++++++++++++++++++++++
>>   lib/librte_ring/rte_ring.c     |  41 ++++++++
>>   lib/librte_ring/rte_ring.h     |   3 +
>>   11 files changed, 582 insertions(+), 13 deletions(-)
>>   create mode 100644 examples/pktdev/Makefile
>>   create mode 100644 examples/pktdev/basicfwd.c
>>   create mode 100644 lib/librte_pktdev/Makefile
>>   create mode 100644 lib/librte_pktdev/rte_pktdev.c
>>   create mode 100644 lib/librte_pktdev/rte_pktdev.h
>>
>



More information about the dev mailing list