[dpdk-dev] [RFC] Adding multiple device types to DPDK.

Wiles, Keith keith.wiles at intel.com
Sat Apr 4 00:32:01 CEST 2015


Hi Neil,

On 4/3/15, 12:00 PM, "Neil Horman" <nhorman at tuxdriver.com> wrote:

>On Wed, Apr 01, 2015 at 12:44:54PM +0000, Wiles, Keith wrote:
>> Hi all, (hoping format of the text is maintained)
>> 
>> Bruce and myself are submitting this RFC in hopes of providing
>>discussion
>> points for the idea. Please do not get carried away with the code
>> included, it was to help everyone understand the proposal/RFC.
>> 
>> The RFC is to describe a proposed change we are looking to make to DPDK
>>to
>> add more device types. We would like to add in to DPDK the idea of a
>> generic packet-device or ?pktdev?, which can be thought of as a thin
>>layer
>> for all device classes. For other device types such as potentially a
>> ?cryptodev? or ?dpidev?. One of the main goals is to not effect
>> performance and not require any current application to be modified. The
>> pktdev layer is providing a light framework for developers to add a
>>device
>> to DPDK.
>> 
>> Reason for Change
>> -----------------
>> 
>> The reason why we are looking to introduce these concepts to DPDK are:
>> 
>> * Expand the scope of DPDK so that it can provide APIs for more than
>>just
>> packet acquisition and transmission, but also provide APIs that can be
>> used to work with other hardware and software offloads, such as
>> cryptographic accelerators, or accelerated libraries for cryptographic
>> functions. [The reason why both software and hardware are mentioned is
>>so
>> that the same APIs can be used whether or not a hardware accelerator is
>> actually available].
>> * Provide a minimal common basis for device abstraction in DPDK, that
>>can
>> be used to unify the different types of packet I/O devices already
>> existing in DPDK. To this end, the ethdev APIs are a good starting
>>point,
>> but the ethdev library contains too many functions which are
>>NIC-specific
>> to be a general-purpose set of APIs across all devices.
>>      Note: The idea was previously touched on here:
>> http://permalink.gmane.org/gmane.comp.networking.dpdk.devel/13545
>> 
>> Description of Proposed Change
>> ------------------------------
>> 
>> The basic idea behind "pktdev" is to abstract out a few common routines
>> and structures/members of structures by starting with ethdev structures
>>as
>> a starting point, cut it down to little more than a few members in each
>> structure then possible add just rx_burst and tx_burst. Then use the
>> structures as a starting point for writing a device type. Currently we
>> have the rx_burst/tx_burst routines moved to the pktdev and it see like
>> move a couple more common functions maybe resaonable. It could be the
>> Rx/Tx routines in pktdev should be left as is, but in the code below is
>>a
>> possible reason to abstract a few routines into a common set of files.
>> 
>> >From there, we have the ethdev type which adds in the existing
>>functions
>> specific to Ethernet devices, and also, for example, a cryptodev which
>>may
>> add in functions specific for cryptographic offload. As now, with the
>> ethdev, the specific drivers provide concrete implementations of the
>> functionality exposed by the interface. This hierarchy is shown in the
>> diagram below, using the existing ethdev and ixgbe drivers as a
>>reference,
>> alongside a hypothetical cryptodev class and driver implementation
>> (catchingly called) "X":
>> 
>>                     ,---------------------.
>>                     | struct rte_pktdev   |
>>                     +---------------------+
>>                     | rte_pkt_rx_burst()  |
>>             .-------| rte_pkt_tx_burst()  |-----------.
>>             |       `---------------------'           |
>>             |                                         |
>>             |                                         |
>>   ,-------------------------------.    ,------------------------------.
>>   |    struct rte_ethdev          |    |      struct rte_cryptodev    |
>>   +-------------------------------+    +------------------------------+
>>   | rte_eth_dev_configure()       |    | rte_crypto_init_sym_session()|
>>   | rte_eth_allmulticast_enable() |    | rte_crypto_del_sym_session() |
>>   | rte_eth_filter_ctrl()         |    |                              |
>>   `-------------------------------'    `---------------.--------------'
>>             |                                          |
>>             |                                          |
>>   ,---------'---------------------.    ,---------------'--------------.
>>   |    struct rte_pmd_ixgbe       |    |      struct rte_pmd_X        |
>>   +-------------------------------+    +------------------------------+
>>   | .configure -> ixgbe_configure |    | .init_session -> X_init_ses()|
>>   | .tx_burst  -> ixgbe_xmit_pkts |    | .tx_burst -> X_handle_pkts() |
>>   `-------------------------------'    `------------------------------'
>> 
>> We are not attempting to create a real class model here only looking at
>> creating a very basic common set of APIs and structures for other device
>> types.
>> 
>> In terms of code changes for this, we obviously need to add in new
>> interface libraries for pktdev and cryptodev. The pktdev library can
>> define a skeleton structure for the first few elements of the nested
>> structures to ensure consistency. Each of the defines below illustrate
>>the
>> common members in device structures, which gives some basic structure
>>the
>> device framework. Each of the defines are placed at the top of the
>>devices
>> matching structures and allows the devices to contain common and private
>> data. The pkdev structures overlay the first common set of members for
>> each device type.
>> 
>
>
>Keith and I discussed this offline, and for the purposes of completeness
>I'll
>offer my rebuttal to this proposal here.
>
>In short, I don't think the segregation of the transmit and receive
>routines
>into their own separate structure (and ostensibly their own librte_pktdev
>library) is particularly valuable.  While it does provide some minimal
>code
>savings when new device classes are introduced, the savings are not
>significant
>(approximlately 0.5kb per device class if the rte_ethdev generic tx and rx
>routines are any sort of indicator).  It does however, come with
>significant
>costs in the sense that it binds a device class to using an I/O model (in
>this
>case packet based recieve and transmit) for which the device class may
>not be
>suited.

The only reason the we only have a 0.5Kb saving is you are only looking at
moving Rx/Tx routines into pktdev, but what happens if we decide to move a
number of common functions like start/stop and others, then you start to
see a much bigger saving. Do we need this saving, maybe not, but it does
provide a single call API rte_pkt_rx/tx_burst to use instead of the
application having to make sure it is calling the correct device Rx/Tx
routines. All that is required is passing in the device pointer and it is
handled for the application. Bruce added some code below to that effect.
>
>To illustrate the difference in design ideas, currenty the dpdk data
>pipeline
>looks like this:
>
>+------------+   +----------+   +---------+
>|            |   |          |   |         |
>|  ARP       |   |  ethdev  |   |         |   +----------+
>|  handler   +-->+  api     +-->+  PMD    +-->+ Wire     |
>|            |   |          |   |         |   +----------+
>|            |   |          |   |         |
>+------------+   +----------+   +---------+

You did not add the crypto to this picture as it is in the picture below
to make them the same.

+-----+  +---------+  +---------+  +------+
|     |  |         |  |         |  |      |  +------+
| ARP +--+ ethdev  +--+ crypto  +--+ PMD  +--+ Wire |
|     |  |         |  |         |  |      |  +------+
+-----+  +---------+  +---------+  +------+


>
>
>Where the ARP handler code is just some code that knows how to manage arp
>requests and responses, and only transmits and receives frames
>
>Keiths idea would introduce this new pktdev handler structure and make the
>dataplane pipeline look like this:
>
>+------------+ +------------+  +------------+  +--------+
>|            | |            |  |            |  |        |
>|  ARP       | | pktdev api |  | pktdev_api |  |        |  +---------+
>|  handler   +-+            +--+            +--+ PMD    +--+Wire     |
>|            | |            |  |            |  |        |  +---------+
>|            | |            |  |            |  |        |
>+------------+ |            |  |            |  |        |
>               |            |  |            |  +--------+
>               |            |  |            |
>               |            |  |            |
>               |            |  |            |
>               | rte_ethdev |  | rte_crypto |
>               |            |  |            |
>               |            |  |            |
>               +------------+  +------------+

You are drawing this picture it appears trying to make the pktdev another
function call layer when it is just a single macro in the rte_ethdev
header pointing to the rte_pktdev tx_burst routine. No function function
overhead as the macro in rte_ethdev changes the rte_eth_tx_burst to
rte_pkt_tx_burst routine, which BTW is the same routine that was in
rte_ethdev today. The pktdev and ethdev are calling into the PMD tx
routine via the dev_ops function pointers structure. Which is also no
extra over head.

If you are calling the rte_pkt_tx_burst routine directly it just means you
need to get the device pointer to pass instead of the port id value in the
rte_pkt_tx_burst routine. The above turns into something like this:

+-----+  +---------+  +--------+  +-----+
|     |  | ethdev  |  |        |  |     |  +------+
| ARP +--+ map to  +--+ crypto +--+ PMD  +--+ Wire |
|     |  | pktdev  |  |        |  |     |  +------+
+-----+  +---------+  +--------+  +-----+

So the path of the data is the same only a macro does a simple rename of
the call to rte_eth_tx_burst routine. If you call the pktdev routine
directly then the macro is not used.


>
>The idea being that now all devices in the dataplane are pktdev devices
>and code
>that transmits and receives frames only needs to know that a device can
>transmit
>and receive frames.  The crypto device in this chain is ostensibly
>preforming
>some sort of ipsec functionality so that arp frames are properly
>encrypted and
>encapsulated for sending via a tunnel.
>
>On the surface this seems reasonable, and in a sense it is.  However, my
>assertion is that we already have this functionality, and it is the
>rte_ethdev
>device.  To illustrate further, in my view  we can do the above already:
>
>+------------+  +---------+ +---------+  +---------+  +--------+
>|            |  |         | |         |  |         |  |        |
>|            |  |ethdev   | | ipsec   |  |ethdev   +--+        |
>| ARP handler+->+api      +-+ tunnel  +->+api      |  | PMD
>|            |  |         | | PMD     |  |         |  |        |
>|            |  |         | |         |  |         |  |        |
>+------------+  +---------+ +---+-----+  +---------+  +--------+
>                                |
>                             +--+-----+
>                             |        |
>                             |crypto  |
>                             |api     |
>                             |        |
>                             |        |
>                             +--------+
>
>Using the rte_ethdev we can already codify the ipsec functionailty as a
>pmd that
>registers an ethdev, and stack it with other pmds using methods simmilar
>to what
>the bonding pmd does (or via some other more generalized dataplane
>indexing
>function).  This still leaves us with the creation of the crypto api,
>which is
>adventageous because:

The proposal does not remove the bonding method and can still be used,
correct?
I do not see the different here using the pktdev style routines or using
ethdev routines.
>
>1) It is not constrained by the i/o model of the dataplane (it may include
>packet based i/o, but can build on more rudimentary (and performant)
>interfaces.
>For instance, in addition to async block based i/o, a crypto device may
>also
>operate syncrhnously, meaning a call can be saved with each transaction
>(2 calls
>for a tx/rx vs one for an encrypt operation).
>
>2) It is not constrained by use case.  That is to say the API can be
>constructed
>for more natural use with other functions (for instance encryptions of
>files on
>disk or via a pipe to another process), which may not have any relation
>to the
>data plane of DPDK.
>
>Neil

Ok, you snipped the text of the email here an it makes the context wrong
without the rest of the code IMO. I will try to explain without the text
that was omitted, but it would be best for anyone missing the original
email to read it for more details. I know the formatting got messed up a
bit :-(

http://dpdk.org/ml/archives/dev/2015-April/016124.html


In the rest of the text it does show the points I wanted to make here and
how little overhead it added.

Lets just say we do not move the TX/RX routines from rte_ethdev into
rte_pktdev and only have a header file with a few structures and macros to
help define some common parts between each of the new device types being
added. I could see that as an option, but I do not see the big issues you
are pointing out here.

You did have some great comments about how crypto is used and the APIs
from the Linux Kernel crypto model is proven and I do not disagree.

In my email to my own email I tried to point our we could add something
very similar to Linux Kernel Crypto API and it would be a model most would
be able to understand. Creating my own crypto API is not my goal, but to
use standards where it makes the most sense to DPDK.

The email link above is the email I suggested the Linux Kernel Crypto API
would be reasonable.

Regards,
++Keith



More information about the dev mailing list