[dpdk-dev] [RFC] Adding multiple device types to DPDK.

Wiles, Keith keith.wiles at intel.com
Sat Apr 4 17:16:08 CEST 2015



On 4/4/15, 8:11 AM, "Neil Horman" <nhorman at tuxdriver.com> wrote:

>On Fri, Apr 03, 2015 at 10:32:01PM +0000, Wiles, Keith wrote:
>> Hi Neil,
>> 
>> On 4/3/15, 12:00 PM, "Neil Horman" <nhorman at tuxdriver.com> wrote:
>> 
>> >On Wed, Apr 01, 2015 at 12:44:54PM +0000, Wiles, Keith wrote:
>> >> Hi all, (hoping format of the text is maintained)
>> >> 
>> >> Bruce and myself are submitting this RFC in hopes of providing
>> >>discussion
>> >> points for the idea. Please do not get carried away with the code
>> >> included, it was to help everyone understand the proposal/RFC.
>> >> 
>> >> The RFC is to describe a proposed change we are looking to make to
>>DPDK
>> >>to
>> >> add more device types. We would like to add in to DPDK the idea of a
>> >> generic packet-device or ?pktdev?, which can be thought of as a thin
>> >>layer
>> >> for all device classes. For other device types such as potentially a
>> >> ?cryptodev? or ?dpidev?. One of the main goals is to not effect
>> >> performance and not require any current application to be modified.
>>The
>> >> pktdev layer is providing a light framework for developers to add a
>> >>device
>> >> to DPDK.
>> >> 
>> >> Reason for Change
>> >> -----------------
>> >> 
>> >> The reason why we are looking to introduce these concepts to DPDK
>>are:
>> >> 
>> >> * Expand the scope of DPDK so that it can provide APIs for more than
>> >>just
>> >> packet acquisition and transmission, but also provide APIs that can
>>be
>> >> used to work with other hardware and software offloads, such as
>> >> cryptographic accelerators, or accelerated libraries for
>>cryptographic
>> >> functions. [The reason why both software and hardware are mentioned
>>is
>> >>so
>> >> that the same APIs can be used whether or not a hardware accelerator
>>is
>> >> actually available].
>> >> * Provide a minimal common basis for device abstraction in DPDK, that
>> >>can
>> >> be used to unify the different types of packet I/O devices already
>> >> existing in DPDK. To this end, the ethdev APIs are a good starting
>> >>point,
>> >> but the ethdev library contains too many functions which are
>> >>NIC-specific
>> >> to be a general-purpose set of APIs across all devices.
>> >>      Note: The idea was previously touched on here:
>> >> http://permalink.gmane.org/gmane.comp.networking.dpdk.devel/13545
>> >> 
>> >> Description of Proposed Change
>> >> ------------------------------
>> >> 
>> >> The basic idea behind "pktdev" is to abstract out a few common
>>routines
>> >> and structures/members of structures by starting with ethdev
>>structures
>> >>as
>> >> a starting point, cut it down to little more than a few members in
>>each
>> >> structure then possible add just rx_burst and tx_burst. Then use the
>> >> structures as a starting point for writing a device type. Currently
>>we
>> >> have the rx_burst/tx_burst routines moved to the pktdev and it see
>>like
>> >> move a couple more common functions maybe resaonable. It could be the
>> >> Rx/Tx routines in pktdev should be left as is, but in the code below
>>is
>> >>a
>> >> possible reason to abstract a few routines into a common set of
>>files.
>> >> 
>> >> >From there, we have the ethdev type which adds in the existing
>> >>functions
>> >> specific to Ethernet devices, and also, for example, a cryptodev
>>which
>> >>may
>> >> add in functions specific for cryptographic offload. As now, with the
>> >> ethdev, the specific drivers provide concrete implementations of the
>> >> functionality exposed by the interface. This hierarchy is shown in
>>the
>> >> diagram below, using the existing ethdev and ixgbe drivers as a
>> >>reference,
>> >> alongside a hypothetical cryptodev class and driver implementation
>> >> (catchingly called) "X":
>> >> 
>> >>                     ,---------------------.
>> >>                     | struct rte_pktdev   |
>> >>                     +---------------------+
>> >>                     | rte_pkt_rx_burst()  |
>> >>             .-------| rte_pkt_tx_burst()  |-----------.
>> >>             |       `---------------------'           |
>> >>             |                                         |
>> >>             |                                         |
>> >>   ,-------------------------------.
>>,------------------------------.
>> >>   |    struct rte_ethdev          |    |      struct rte_cryptodev
>> |
>> >>   +-------------------------------+
>>+------------------------------+
>> >>   | rte_eth_dev_configure()       |    |
>>rte_crypto_init_sym_session()|
>> >>   | rte_eth_allmulticast_enable() |    |
>>rte_crypto_del_sym_session() |
>> >>   | rte_eth_filter_ctrl()         |    |
>> |
>> >>   `-------------------------------'
>>`---------------.--------------'
>> >>             |                                          |
>> >>             |                                          |
>> >>   ,---------'---------------------.
>>,---------------'--------------.
>> >>   |    struct rte_pmd_ixgbe       |    |      struct rte_pmd_X
>> |
>> >>   +-------------------------------+
>>+------------------------------+
>> >>   | .configure -> ixgbe_configure |    | .init_session ->
>>X_init_ses()|
>> >>   | .tx_burst  -> ixgbe_xmit_pkts |    | .tx_burst ->
>>X_handle_pkts() |
>> >>   `-------------------------------'
>>`------------------------------'
>> >> 
>> >> We are not attempting to create a real class model here only looking
>>at
>> >> creating a very basic common set of APIs and structures for other
>>device
>> >> types.
>> >> 
>> >> In terms of code changes for this, we obviously need to add in new
>> >> interface libraries for pktdev and cryptodev. The pktdev library can
>> >> define a skeleton structure for the first few elements of the nested
>> >> structures to ensure consistency. Each of the defines below
>>illustrate
>> >>the
>> >> common members in device structures, which gives some basic structure
>> >>the
>> >> device framework. Each of the defines are placed at the top of the
>> >>devices
>> >> matching structures and allows the devices to contain common and
>>private
>> >> data. The pkdev structures overlay the first common set of members
>>for
>> >> each device type.
>> >> 
>> >
>> >
>> >Keith and I discussed this offline, and for the purposes of
>>completeness
>> >I'll
>> >offer my rebuttal to this proposal here.
>> >
>> >In short, I don't think the segregation of the transmit and receive
>> >routines
>> >into their own separate structure (and ostensibly their own
>>librte_pktdev
>> >library) is particularly valuable.  While it does provide some minimal
>> >code
>> >savings when new device classes are introduced, the savings are not
>> >significant
>> >(approximlately 0.5kb per device class if the rte_ethdev generic tx
>>and rx
>> >routines are any sort of indicator).  It does however, come with
>> >significant
>> >costs in the sense that it binds a device class to using an I/O model
>>(in
>> >this
>> >case packet based recieve and transmit) for which the device class may
>> >not be
>> >suited.
>> 
>> The only reason the we only have a 0.5Kb saving is you are only looking
>>at
>> moving Rx/Tx routines into pktdev, but what happens if we decide to
>>move a
>> number of common functions like start/stop and others, then you start to
>> see a much bigger saving.
>You're right of course, but lets just try the math here.  If we assume
>that,
>since all these libraries are just inlined redirection functions, we can
>guess
>that each one is about .25kb of code (5.kb/2 functions).  So if you add
>two more
>functions you're looking at 1kb of code savings.  Of course if you start
>doing
>so, you beg the two questions that I've been trying to pose here:
>
>1) How much 'common' API do you want to impose on a device class that may
>not
>be condusive to the common set
>
>2) At what point does your common API just become an rte_ethdev?

You are taking the code size to the extreme, which is not the real point
here. I agree it does not matter in footprint savings as most DPDK systems
are pretty big. The reason for the common code was to help remove some of
the code required for the developer to write in the new device type. If
the developer of the new device type writes his own set of Rx/Tx routines
the community will determine if they are required or do the current APIs
work or do we require them to support both.

Please try not getting hung up on the footprint savings and I am sorry if
I was making this a huge point.

> 
>> Do we need this saving, maybe not, but it does
>I would say definately not, given that the DPDK houses entire API sets
>that are
>always-inline, and any savings from the above are easily eclipsed by the
>fanout
>that occurs from use in multiple call sites.  Lets be honest, performance
>is the
>DPDK's primary concern.  Code size is not a consideration.  Its only an
>argument
>here because it makes this proposal look favorable.  Its not a bad thing
>mind
>you, but if this proposal caused any code bloat, it wouldn't even be
>mentioned.

In the current design the pktdev APIs are still inline routines only the
debug routines are not.

>
>> provide a single call API rte_pkt_rx/tx_burst to use instead of the
>> application having to make sure it is calling the correct device Rx/Tx
>> routines.
>Given that that is a compile time issue, I really don't see why that is a
>big
>deal.  Especially if you, as I suggest, just use the rte_ethdev as your
>common
>dataplane device model.  Then by virtue of the fact that they're all
>rte_ethdevs, you get common API naming.
>
>> All that is required is passing in the device pointer and it is
>> handled for the application. Bruce added some code below to that effect.
>Yes, and you have that now, its an rte_ethdev.
>
>> >
>> >To illustrate the difference in design ideas, currenty the dpdk data
>> >pipeline
>> >looks like this:
>> >
>> >+------------+   +----------+   +---------+
>> >|            |   |          |   |         |
>> >|  ARP       |   |  ethdev  |   |         |   +----------+
>> >|  handler   +-->+  api     +-->+  PMD    +-->+ Wire     |
>> >|            |   |          |   |         |   +----------+
>> >|            |   |          |   |         |
>> >+------------+   +----------+   +---------+
>> 
>> You did not add the crypto to this picture as it is in the picture below
>> to make them the same.
>> 
>No I didn't, because it doesn't exist right now.  Thats what I mean by
>'currently'.  See my description below after you added your drawing.
>
>> +-----+  +---------+  +---------+  +------+
>> |     |  |         |  |         |  |      |  +------+
>> | ARP +--+ ethdev  +--+ crypto  +--+ PMD  +--+ Wire |
>> |     |  |         |  |         |  |      |  +------+
>> +-----+  +---------+  +---------+  +------+
>> 
>> 
>> >
>> >
>> >Where the ARP handler code is just some code that knows how to manage
>>arp
>> >requests and responses, and only transmits and receives frames
>> >
>> >Keiths idea would introduce this new pktdev handler structure and make
>>the
>> >dataplane pipeline look like this:
>> >
>> >+------------+ +------------+  +------------+  +--------+
>> >|            | |            |  |            |  |        |
>> >|  ARP       | | pktdev api |  | pktdev_api |  |        |  +---------+
>> >|  handler   +-+            +--+            +--+ PMD    +--+Wire     |
>> >|            | |            |  |            |  |        |  +---------+
>> >|            | |            |  |            |  |        |
>> >+------------+ |            |  |            |  |        |
>> >               |            |  |            |  +--------+
>> >               |            |  |            |
>> >               |            |  |            |
>> >               |            |  |            |
>> >               | rte_ethdev |  | rte_crypto |
>> >               |            |  |            |
>> >               |            |  |            |
>> >               +------------+  +------------+
>> 
>> You are drawing this picture it appears trying to make the pktdev
>>another
>> function call layer when it is just a single macro in the rte_ethdev
>Not trying to imply a new function call layer, just trying to illustrate
>your
>proposal, that you wish to make rte_ethdevs,and rte_cryptodevs all look
>like
>rte_pktdevs so that the dataplane has a common element for passing mbufs
>around,
>regardless of its true device class.  Not sure where you see an extra
>function
>call layer here.

Please see my comment above as I am not trying to make them all look the
same you are taking my words to the extreme. We need some type of
structure for adding more device types to DPDK and I am just trying to
suggest a few common parts as a solution.

Some parts of the structures should be common and possible a few common
routines, yes a big part of each device will be different. The rte_ethdev
structures is what we have today and was a good starting point, but a big
part of the code will be different.

Should we allow someone to write a new device type that is completely
different down to typedefs for variable types and all new structures, no
we should not, giving them some guide lines by providing a set of routines
and structures to start with, by all means lets do it.

>
>> header pointing to the rte_pktdev tx_burst routine. No function function
>> overhead as the macro in rte_ethdev changes the rte_eth_tx_burst to
>> rte_pkt_tx_burst routine, which BTW is the same routine that was in
>> rte_ethdev today. The pktdev and ethdev are calling into the PMD tx
>> routine via the dev_ops function pointers structure. Which is also no
>> extra over head.
>> 
>Never suggested it was extra overhead, where did you get that from?  I'm
>just
>illustrating what it is you want to do.  If I didn't get it right I
>apologize,
>please clarify.

No need to clarify as we agree and maybe I read too much into your
comments.
>
>> If you are calling the rte_pkt_tx_burst routine directly it just means
>>you
>> need to get the device pointer to pass instead of the port id value in
>>the
>> rte_pkt_tx_burst routine. The above turns into something like this:
>> 
>> +-----+  +---------+  +--------+  +-----+
>> |     |  | ethdev  |  |        |  |     |  +------+
>> | ARP +--+ map to  +--+ crypto +--+ PMD  +--+ Wire |
>> |     |  | pktdev  |  |        |  |     |  +------+
>> +-----+  +---------+  +--------+  +-----+
>> 
>> So the path of the data is the same only a macro does a simple rename of
>> the call to rte_eth_tx_burst routine. If you call the pktdev routine
>> directly then the macro is not used.
>> 
>> 
>Let me quash this by stipulating
>that you are absolutely correct, there is no additional overhead in your
>design,
>it should provide the exact same performance that an rte_ethdev currently
>does.
>I have no problem with that.  My point is: It provides the exact same
>performance that an rte_ethdev does, so lets just use rte_ethdev if we're
>trying
>to model devices in the dataplane.

The ethdev model does not give us a different device type it just forces
the new device type to look like an ethdev, this is what I am reading from
your email here. I do not believe you are suggesting that ethdev look a
like is the answer to a new device type.

The bonding model allows you to link some devices together and that is
good in IMO. The bonding model is kind of like a pipeline model at the
device layer. The pipeline style at the application layer could have
provided the same feature as the bonding model just being done at two
different layers in the system (application and device layer).

Now the pipeline model is a bit more generic and may add some small
overhead compared to the bonding model. If we require every device type to
support a bonding style then we would need them all to follow some type of
common structure, correct?
>
>> >
>> >The idea being that now all devices in the dataplane are pktdev devices
>> >and code
>> >that transmits and receives frames only needs to know that a device can
>> >transmit
>> >and receive frames.  The crypto device in this chain is ostensibly
>> >preforming
>> >some sort of ipsec functionality so that arp frames are properly
>> >encrypted and
>> >encapsulated for sending via a tunnel.
>> >
>> >On the surface this seems reasonable, and in a sense it is.  However,
>>my
>> >assertion is that we already have this functionality, and it is the
>> >rte_ethdev
>> >device.  To illustrate further, in my view  we can do the above
>>already:
>> >
>> >+------------+  +---------+ +---------+  +---------+  +--------+
>> >|            |  |         | |         |  |         |  |        |
>> >|            |  |ethdev   | | ipsec   |  |ethdev   +--+        |
>> >| ARP handler+->+api      +-+ tunnel  +->+api      |  | PMD
>> >|            |  |         | | PMD     |  |         |  |        |
>> >|            |  |         | |         |  |         |  |        |
>> >+------------+  +---------+ +---+-----+  +---------+  +--------+
>> >                                |
>> >                             +--+-----+
>> >                             |        |
>> >                             |crypto  |
>> >                             |api     |
>> >                             |        |
>> >                             |        |
>> >                             +--------+
>> >
>> >Using the rte_ethdev we can already codify the ipsec functionailty as a
>> >pmd that
>> >registers an ethdev, and stack it with other pmds using methods
>>simmilar
>> >to what
>> >the bonding pmd does (or via some other more generalized dataplane
>> >indexing
>> >function).  This still leaves us with the creation of the crypto api,
>> >which is
>> >adventageous because:
>> 
>> The proposal does not remove the bonding method and can still be used,
>> correct?
>Not sure I fully understand the question, but no, I'm not proposing any
>specific
>change to bonding here, just using it to illustrate that we have ways
>currently
>to stack PMD's on top of one another to create a multi-stage data
>pipeline.

Look at the above comment for this one.
>
>> I do not see the different here using the pktdev style routines or using
>> ethdev routines.
>Exactly my point!  There is no difference to the dataplane API, it remains
>unchanged, just as in your proposal, except that there is no work
>required to
>create a pktdev device that doesn't change anything.  The advantage that
>I'm
>proposing here is that my model separates whatever crypto device class
>API you want
>to design from the dataplane (rte_ethdev) api.  Doing this lets you
>create a
>crypto API that is more appropriately suited to the crypto class of
>device.  if
>there is overlap with rte_ethdev, thats fine, you can create simmilar
>functions
>(even giving them identical names to the rte_ethdev apis), and that will
>be
>properly resolved at compile time.  But more importantly, where there is
>not
>overlap, you're not forced into creating an API that inherits the common
>API
>functions you propose, when they are not best suited to your device.

I am still not trying to suggest we have the cryptodev be a copy of the
ethdev here. We need some type of structure and common guide lines for a
new device, which pktdev is a very reasonable plus very light framework.

I am a bit confused now it seems like you are suggesting a new device
should be ethdev and then you seem to be stating it should be completely
differentŠ

Lets just agree some common parts are good between device types and some
are just not. Making all devices look like an ethdev is just not going to
fly and making a new device type into something completely different is
not going to fly.

Then the only solution is to have somethings in common to help unify the
designs, but still allow the developer to add APIs to that device which
make the most sense.
>
>> >
>> >1) It is not constrained by the i/o model of the dataplane (it may
>>include
>> >packet based i/o, but can build on more rudimentary (and performant)
>> >interfaces.
>> >For instance, in addition to async block based i/o, a crypto device may
>> >also
>> >operate syncrhnously, meaning a call can be saved with each transaction
>> >(2 calls
>> >for a tx/rx vs one for an encrypt operation).
>> >
>> >2) It is not constrained by use case.  That is to say the API can be
>> >constructed
>> >for more natural use with other functions (for instance encryptions of
>> >files on
>> >disk or via a pipe to another process), which may not have any relation
>> >to the
>> >data plane of DPDK.
>> >
>> >Neil
>> 
>> Ok, you snipped the text of the email here an it makes the context wrong
>I figured your illustration above was sufficient to make your point and
>mine.
>If your code tells a different story, I apologize.
>
>> without the rest of the code IMO. I will try to explain without the text
>> that was omitted, but it would be best for anyone missing the original
>> email to read it for more details. I know the formatting got messed up a
>> bit :-(
>> 
>> http://dpdk.org/ml/archives/dev/2015-April/016124.html
>> 
>> 
>> In the rest of the text it does show the points I wanted to make here
>>and
>> how little overhead it added.
>> 
>As above, I'm stipulating to the fact that there is not performance
>overhead.
>That is not, and was never my point.  My point was/is that we already
>have an
>API to do what you want to do.  The commonality of functions that you
>seem to
>focus on, is already provided by the fact that we have a common device
>for the
>dataplane (the rte_ethdev).  You're proposal doesn't buy us anything
>beyond
>that.  By your own assertion, your proposal:
>
>1) Its just as performant as an rte_ethdev
>
>2) It doesn't require any code change to applications
>
>Given that, and the fact that it does require that new device classes
>inherit an

Please do not bring inheritance into this again I wish Bruce had never
used that term :-( in hind site it is not a very good way to describe the
design. It seems like it at a very high level, but it seems to drag
everyone down to the bits and bytes way too much. We ever stated that all
the APIs are common between ethdev (or are inherited) between a new device
only a couple which seemed to make sense to me. The structures being a bit
similar was my goal and to provide a good light framework.
 
>API set that may not be relevant to said device class, my one and only
>question
>is: whats the advantage?

I have been trying to layout the advantages:
 1) - we provide a very light framework for the developer of a new device
type to start with in his design
 2) - we look at making some routines common between devices if it make
sense, some non-performance APIs do make sense to be common, but the
structures need to be common in some way IMO
 3) - we make a few structures look similar using pktdev structures as an
overlay to allow for a clean way to pass around these different device
types
 4) - adopt standard APIs or translate those APIs into something similar
for DPDK (e.g. Linux kernel crypto APIs)

>You claim that common API naming makes application

I never claimed that common API naming make applications common, but only
a few common constructs or API semantics could help.

>code generic, but we have that today.  If you want to do IPsec with a
>crypto
>offload devie, write a pmd that interfaces to the data pipeline via the
>rte_ethdev API and uses whatever crypto offload device API you want to
>construct
>to do the crypto work.  That way, if a given application wants to use the
>crypto

I am trying to suggest the crypto device can have other APIs better suited
to how we want to use that device, again we can have a few common items
between ethdev and cryptodev, but it does not make sense to use all of
ethdev.

I am getting confused here as you seem to be suggesting we use ethdev
model, but I am sure you are not, correct?

>hardware for something non-network related (say disk/file encryption), it
>can
>use the crypto device layer in a way that is not packet/network oriented
>more
>easily, at which point having a common API is completely irrelevant, as
>the
>application code is being specifically written to talk to a crypto device.
>
>
>
>
>> Lets just say we do not move the TX/RX routines from rte_ethdev into
>> rte_pktdev and only have a header file with a few structures and macros
>>to
>> help define some common parts between each of the new device types being
>> added. I could see that as an option, but I do not see the big issues
>>you
>> are pointing out here.
>> 
>My big issue is that I see no point in creating a common API.

I am not trying to create a common API between all devices, it only seemed
like a reasonable design to make Rx/Tx look similar. Also making some of
the structures look similar is a good thing IMO.

>You're focused on finding a common set of API's that apply to all device
>classes, and I'm asserting that there is no safe set of common API
>routines.
>While it may be possible to shoehorn multiple device classes into the
>same i/o
>API model, you are doing yourself a disservice by mandating that.

I feel like you are taking the extreme view here again and that was not my
intent to make everything the same only where it makes sense. A ethdev and
cryptodev are two different animals, but they do have some similarities
and I want to make sure we investigate these similarities, right?

>
>Put another way, ipsec devices and crypto devices are separate device
>classes
>and deserve to be modeled differently.  An IPsec device is a network
>element
>that is very nicely modeled as an ethernet interface, for which you have
>a very
>robust API (the rte_ethdev).  Crypto offload devices are not network
>elements,
>and can preform i/o in a mutltitude of ways (synchronous, asynchronous,
>scatter/gather, etc), and have many non ethernet API facets.  The latter
>deserves to have its own API design, not to be forced to inherit part of
>an
>ethernet device (rte_pktdev) and be forced to work that way.
>
>> You did have some great comments about how crypto is used and the APIs
>> from the Linux Kernel crypto model is proven and I do not disagree.
>> 
>> In my email to my own email I tried to point our we could add something
>Yeah, sorry about the broken thread, the in-reply-to field got somehow
>html-mangled.  Not sure how that happened as I use mutt as my MUA.
>
>> very similar to Linux Kernel Crypto API and it would be a model most
>>would
>> be able to understand. Creating my own crypto API is not my goal, but to
>> use standards where it makes the most sense to DPDK.
>> 
>> The email link above is the email I suggested the Linux Kernel Crypto
>>API
>> would be reasonable.
>> 
>And thats good.  It doesn't need to be the linux kernel crypto API by any
>stretch, and I don't expect it to be.  My take away from review of the
>linux
>crypto api is that, while it uses some simmilar notions to parts of the
>network
>API (scatter gather i/o, optional async operation, completion queues), it
>doesn't create any sort of common API structure between it and the network
>subsystem, because they're separate devices.  Even though crypto is
>largely used
>in the network datapath for various ipsec tunnels, crypto still uses its
>own API
>methods and data types because its not exclusive to the use of
>networking, and
>doing so allows for more flexible use outside the datapath.  Thats what
>I'm
>trying to get at here.  That users might want to use crypto outside of the
>network path, and they should have an API that is well suited to that
>task.

We agree here, I only want some similarities between devices not an exact
common API. 

I am afraid I am not following you now. At one point it seems like all of
the devices must be completely different then in another place ethdev is
best way to go as a common API. I think we are saying the same thing here
just from a different point of view and I have tried to understand your
view point. I have even relaxed my design goals to meet you part way. It
seems like we agree and disagree here, which does not make a lot of sense
to me at this point.

Maybe we need to get into a room and talk F2F to make sure we are both
being understood here. Maybe we can then make sure we are both being heard
and come to some type of solution for DPDK. Maybe the community meeting is
a good place, but I feel it is going to be too big and too short to
discuss this correctly. If you are up to a conf call or I can travel to
you if you feel a F2F is better.

>
>Neil
>> Regards,
>> ++Keith
>> 
>> 



More information about the dev mailing list