[dpdk-dev] [RFC PATCH 1/9] security: introduce CPU Crypto action type and API

Zhang, Roy Fan roy.fan.zhang at intel.com
Mon Oct 14 01:07:31 CEST 2019
Previous message: [dpdk-dev] [RFC PATCH 1/9] security: introduce CPU Crypto action type and API
Next message: [dpdk-dev] [RFC PATCH 1/9] security: introduce CPU Crypto action type and API
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi Akhil,

Thanks for the review and comments! 
Knowing you are extremely busy. Here is my point in brief:
I think placing the CPU synchronous crypto in the rte_security make sense, as

1. rte_security contains inline crypto and lookaside crypto action type already, adding cpu_crypto action type is reasonable.
2. rte_security contains the security features may not supported by all devices, such as crypto, ipsec, and PDCP. cpu_crypto follow this category, again crypto.
3. placing CPU synchronous crypto API in rte_security is natural - as inline mode works synchronously, too. However cryptodev doesn't.
4. placing CPU synchronous crypto API in rte_security helps boosting SW crypto performance, I have already provided a simple perf test inside the unit test in the patchset for the user to try out - just comparing its output against DPDK crypto perf app output.
5. placing CPU synchronous crypto API in cryptodev will never serve HW lookaside crypto PMDs, as making them to work synchronously have huge performance penalty. However Cryptodev framework's existing design is providing APIs that will work in all crypto PMDs (rte_cryptodev_enqueue_burst / dequeue_burst for example), this does not fit in cryptodev's principle.
6. placing CPU synchronous crypto API in cryptodev confuses the user, as: 
	- the session created for async mode may not work in sync mode
	- both enqueue/dequeue and cpu_crypto_process does the same crypto processing, but one PMD may support only one API (set), the other may support another, and the third PMD supports both. We have to provide another API to let the user query which one to support which.
	- two completely different code paths for async/sync mode.
7. You said in the end of the email - placing CPU synchronous crypto API into rte_security is not acceptable as it does not do any rte_security stuff - crypto isn't? You may call this a quibble, but in my idea, in the patchset both PMDs' implementations did offload the work to the CPU's special circuit designed dedicated to accelerate the crypto processing.

To me cryptodev is the one CPU synchronous crypto API should not go into, rte_security is.

Regards,
Fan

> -----Original Message-----
> From: Akhil Goyal [mailto:akhil.goyal at nxp.com]
> Sent: Friday, October 11, 2019 2:24 PM
> To: Ananyev, Konstantin <konstantin.ananyev at intel.com>; 'dev at dpdk.org'
> <dev at dpdk.org>; De Lara Guarch, Pablo <pablo.de.lara.guarch at intel.com>;
> 'Thomas Monjalon' <thomas at monjalon.net>; Zhang, Roy Fan
> <roy.fan.zhang at intel.com>; Doherty, Declan <declan.doherty at intel.com>
> Cc: 'Anoob Joseph' <anoobj at marvell.com>
> Subject: RE: [RFC PATCH 1/9] security: introduce CPU Crypto action type and
> API
> 
> Hi Konstantin,
> 
> >
> > Hi Akhil,
> >
> ..[snip]
> 
> > > > > > > OK let us assume that you have a separate structure. But I
> > > > > > > have a few
> > > > queries:
> > > > > > > 1. how can multiple drivers use a same session
> > > > > >
> > > > > > As a short answer: they can't.
> > > > > > It is pretty much the same approach as with rte_security -
> > > > > > each device
> > needs
> > > > to
> > > > > > create/init its own session.
> > > > > > So upper layer would need to maintain its own array (or so) for such
> case.
> > > > > > Though the question is why would you like to have same session
> > > > > > over
> > > > multiple
> > > > > > SW backed devices?
> > > > > > As it would be anyway just a synchronous function call that
> > > > > > will be
> > executed
> > > > on
> > > > > > the same cpu.
> > > > >
> > > > > I may have single FAT tunnel which may be distributed over
> > > > > multiple Cores, and each core is affined to a different SW device.
> > > >
> > > > If it is pure SW, then we don't need multiple devices for such scenario.
> > > > Device in that case is pure abstraction that we can skip.
> > >
> > > Yes agreed, but that liberty is given to the application whether it
> > > need multiple devices with single queue or a single device with multiple
> queues.
> > > I think that independence should not be broken in this new API.
> > > >
> > > > > So a single session may be accessed by multiple devices.
> > > > >
> > > > > One more example would be depending on packet sizes, I may
> > > > > switch
> > between
> > > > > HW/SW PMDs with the same session.
> > > >
> > > > Sure, but then we'll have multiple sessions.
> > >
> > > No, the session will be same and it will have multiple private data
> > > for each of
> > the PMD.
> > >
> > > > BTW, we have same thing now - these private session pointers are
> > > > just
> > stored
> > > > inside the same rte_crypto_sym_session.
> > > > And if user wants to support this model, he would also need to
> > > > store <dev_id, queue_id> pair for each HW device anyway.
> > >
> > > Yes agreed, but how is that thing happening in your new struct, you
> > > cannot
> > support that.
> >
> > User can store all these info in his own struct.
> > That's exactly what we have right now.
> > Let say ipsec-secgw has to store for each IPsec SA:
> > pointer to crypto-session and/or pointer to security session plus (for
> > lookaside-devices) cdev_id_qp that allows it to extract dev_id +
> > queue_id information.
> > As I understand that works for now, as each ipsec_sa uses only one
> > dev+queue. Though if someone would like to use multiple devices/queues
> > for the same SA - he would need to have an array of these <dev+queue>
> pairs.
> > So even right now rte_cryptodev_sym_session is not self-consistent and
> > requires extra information to be maintained by user.
> 
> Why are you increasing the complexity for the user application.
> The new APIs and struct should be such that it need to do minimum changes
> in the stack so that stack is portable on multiple vendors.
> You should try to hide as much complexity in the driver or lib to give the user
> simple APIs.
> 
> Having a same session for multiple devices was added by Intel only for some
> use cases.
> And we had split that session create API into 2. Now if those are not useful
> shall we move back to the single API. I think @Doherty, Declan and @De Lara
> Guarch, Pablo can comment on this.
> 
> >
> > >
> > > >
> > > > >
> > > > > >
> > > > > > > 2. Can somebody use the scheduler pmd for scheduling the
> > > > > > > different
> > type
> > > > of
> > > > > > payloads for the same session?
> > > > > >
> > > > > > In theory yes.
> > > > > > Though for that scheduler pmd should have inside it's
> > > > > > rte_crypto_cpu_sym_session an array of pointers to the
> > > > > > underlying devices sessions.
> > > > > >
> > > > > > >
> > > > > > > With your proposal the APIs would be very specific to your
> > > > > > > use case
> > only.
> > > > > >
> > > > > > Yes in some way.
> > > > > > I consider that API specific for SW backed crypto PMDs.
> > > > > > I can hardly see how any 'real HW' PMDs (lksd-none,
> > > > > > lksd-proto) will
> > benefit
> > > > > > from it.
> > > > > > Current crypto-op API is very much HW oriented.
> > > > > > Which is ok, that's for it was intended for, but I think we
> > > > > > also need one
> > that
> > > > > > would be designed
> > > > > > for SW backed implementation in mind.
> > > > >
> > > > > We may re-use your API for HW PMDs as well which do not have
> > requirement
> > > > of
> > > > > Crypto-op/mbuf etc.
> > > > > The return type of your new process API may have a status which
> > > > > say
> > > > 'processed'
> > > > > Or can be say 'enqueued'. So if it is  'enqueued', we may have a
> > > > > new API for
> > > > raw
> > > > > Bufs dequeue as well.
> > > > >
> > > > > This requirement can be for any hardware PMDs like QAT as well.
> > > >
> > > > I don't think it is a good idea to extend this API for async (lookaside)
> devices.
> > > > You'll need to:
> > > >  - provide dev_id and queue_id for each process(enqueue) and
> > > > dequeuer operation.
> > > >  - provide IOVA for all buffers passing to that function (data
> > > > buffers, digest,
> > IV,
> > > > aad).
> > > >  - On dequeue provide some way to associate dequed data and digest
> > > > buffers with
> > > >    crypto-session that was used  (and probably with mbuf).
> > > >  So most likely we'll end up with another just version of our
> > > > current crypto-op structure.
> > > > If you'd like to get rid of mbufs dependency within current
> > > > crypto-op API that understandable, but I don't think we should
> > > > have same API for both sync (CPU) and async
> > > > (lookaside) cases.
> > > > It doesn't seem feasible at all and voids whole purpose of that patch.
> > >
> > > At this moment we are not much concerned about the dequeue API and
> > > about
> > the
> > > HW PMD support. It is just that the new API should be generic enough
> > > to be
> > used in
> > > some future scenarios as well. I am just highlighting the possible
> > > usecases
> > which can
> > > be there in future.
> >
> > Sorry, but I strongly disagree with such approach.
> > We should stop adding/modifying API 'just in case' and because 'it
> > might be useful for some future HW'.
> > Inside DPDK we already do have too many dev level APIs without any
> > implementations.
> > That's quite bad practice and very dis-orienting for end-users.
> > I think to justify API additions/changes we need at least one proper
> > implementation for it, or at least some strong evidence that people
> > are really committed to support it in nearest future.
> > BTW, that what TB agreed on, nearly a year ago.
> >
> > This new API (if we'll go ahead with it of course) would stay
> > experimental for some time anyway to make sure we don't miss anything
> > needed (I think for about a year time- frame).
> > So if you guys *really* want to extend it support _async_ devices too
> > - I am open for modifications/additions here.
> > Though personally I think such addition would over-complicate things
> > and we'll end up with another reincarnation of current crypto-op.
> > We actually discussed it internally, and decided to drop that idea because
> of that.
> > Again, my opinion - for lookaside devices it might be better to try to
> > optimize current crypto-op path (remove mbuf requirement, probably add
> > ability to group by session on enqueue/dequeue, etc.).
> 
> I agree that the new API is experimental and can be modified later. So no
> issues in that, but we can keep some things in mind while defining APIs.
> These were some comments from my side, if those are impacting the current
> scenario, you can drop those. We will take care of those later.
> 
> >
> > >
> > > What is the issue that you face in making a dev-op for this new API.
> > > Do you see
> > any
> > > performance impact with that?
> >
> > There are two main things:
> > 1. user would need to maintain and provide for each process() call
> > dev_id+queue_id.
> > That's means extra (and totally unnecessary for SW) overhead.
> 
> You are using a crypto device for performing the processing, you must use
> dev_id to identify which SW device it is. This is how the DPDK Framework
> works.
> .
> 
> > 2. yes I would expect some perf overhead too - it would be extra call or
> branch.
> > Again as it would be data-dependency - most likely cpu wouldn't be
> > able to pipeline it efficiently:
> >
> > rte_crypto_sym_process(uint8_t dev_id, uint16 qp_id,
> > rte_crypto_sym_session *ses, ...) {
> >      struct rte_cryptodev *dev = &rte_cryptodevs[dev_id];
> >      return (*dev->process)(sess->data[dev->driver_id, ...); }
> >
> > driver_specific_process(driver_specific_sym_session *sess) {
> >    return sess->process(sess, ...) ;
> > }
> >
> > I didn't make any exact measurements but sure it would be slower than
> just:
> > session_udata->process(session->udata->sess, ...); Again it would be
> > much more noticeable on low end cpus.
> > Let say here:
> > http://mails.dpdk.org/archives/dev/2019-September/144350.html
> > Jerin claims 1.5-3% drop for introducing extra call via hiding eth_dev
> > contents - I suppose we would have something similar here.
> > I do realize that in majority of cases crypto is more expensive then
> > RX/TX, but still.
> >
> > If it would be a really unavoidable tradeoff (support already existing
> > API, or so) I wouldn't mind, but I don't see any real need for it right now.
> 
> Calling session_udata->process(session->udata->sess, ...); from the
> application and Application need to maintain for each PMD the process() API
> in its memory will make the application not portable to other vendors.
> 
> What we are doing here is defining another way to create sessions for the
> same stuff that is already done. This make applications non-portable and
> confusing for the application writer.
> 
> I would say you should do some profiling first. As you also mentioned crypto
> workload is more Cycle consuming, it will not impact this case.
> 
> 
> >
> > >
> > > >
> > > > > That is why a dev-ops would be a better option.
> > > > >
> > > > > >
> > > > > > > When you would add more functionality to this sync
> > > > > > > API/struct, it will
> > end
> > > > up
> > > > > > being the same API/struct.
> > > > > > >
> > > > > > > Let us  see how close/ far we are from the existing APIs
> > > > > > > when the
> > actual
> > > > > > implementation is done.
> > > > > > >
> > > > > > > > > I am not sure if that would be needed.
> > > > > > > > > It would be internal to the driver that if synchronous
> > > > > > > > > processing is
> > > > > > > > supported(from feature flag) and
> > > > > > > > > Have relevant fields in xform(the newly added ones which
> > > > > > > > > are
> > packed
> > > > as
> > > > > > per
> > > > > > > > your suggestions) set,
> > > > > > > > > It will create that type of session.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > + * Main points:
> > > > > > > > > > + * - Current crypto-dev API is reasonably mature and
> > > > > > > > > > + it is
> > desirable
> > > > > > > > > > + *   to keep it unchanged (API/ABI stability). From other
> side, this
> > > > > > > > > > + *   new sync API is new one and probably would require
> extra
> > > > changes.
> > > > > > > > > > + *   Having it as a new one allows to mark it as experimental,
> > without
> > > > > > > > > > + *   affecting existing one.
> > > > > > > > > > + * - Fully opaque cpu_sym_session structure gives more
> flexibility
> > > > > > > > > > + *   to the PMD writers and again allows to avoid ABI
> breakages
> > in
> > > > future.
> > > > > > > > > > + * - process() function per set of xforms
> > > > > > > > > > + *   allows to expose different process() functions for
> different
> > > > > > > > > > + *   xform combinations. PMD writer can decide, does he
> wants
> > to
> > > > > > > > > > + *   push all supported algorithms into one process()
> function,
> > > > > > > > > > + *   or spread it across several ones.
> > > > > > > > > > + *   I.E. More flexibility for PMD writer.
> > > > > > > > >
> > > > > > > > > Which process function should be chosen is internal to
> > > > > > > > > PMD, how
> > > > would
> > > > > > that
> > > > > > > > info
> > > > > > > > > be visible to the application or the library. These will
> > > > > > > > > get stored in
> > the
> > > > > > session
> > > > > > > > private
> > > > > > > > > data. It would be upto the PMD writer, to store the per
> > > > > > > > > session
> > process
> > > > > > > > function in
> > > > > > > > > the session private data.
> > > > > > > > >
> > > > > > > > > Process function would be a dev ops just like enc/deq
> > > > > > > > > operations
> > and it
> > > > > > should
> > > > > > > > call
> > > > > > > > > The respective process API stored in the session private data.
> > > > > > > >
> > > > > > > > That model (via devops) is possible, but has several
> > > > > > > > drawbacks from
> > my
> > > > > > > > perspective:
> > > > > > > >
> > > > > > > > 1. It means we'll need to pass dev_id as a parameter to
> > > > > > > > process()
> > function.
> > > > > > > > Though in fact dev_id is not a relevant information for us
> > > > > > > > here (all we need is pointer to the session and pointer to
> > > > > > > > the fuction to call) and I tried to avoid using it in data-path
> functions for that API.
> > > > > > >
> > > > > > > You have a single vdev, but someone may have multiple vdevs
> > > > > > > for each
> > > > thread,
> > > > > > or may
> > > > > > > Have same dev with multiple queues for each core.
> > > > > >
> > > > > > That's fine. As I said above it is a SW backed implementation.
> > > > > > Each session has to be a separate entity that contains all
> > > > > > necessary
> > > > information
> > > > > > (keys, alg/mode info,  etc.)  to process input buffers.
> > > > > > Plus we need the actual function pointer to call.
> > > > > > I just don't see what for we need a dev_id in that situation.
> > > > >
> > > > > To iterate the session private data in the session.
> > > > >
> > > > > > Again, here we don't need care about queues and their pinning to
> cores.
> > > > > > If let say someone would like to process buffers from the same
> > > > > > IPsec SA
> > on 2
> > > > > > different cores in parallel, he can just create 2 sessions for
> > > > > > the same
> > xform,
> > > > > > give one to thread #1  and second to thread #2.
> > > > > > After that both threads are free to call process(this_thread_ses, ...)
> at will.
> > > > >
> > > > > Say you have a 16core device to handle 100G of traffic on a single
> tunnel.
> > > > > Will we make 16 sessions with same parameters?
> > > >
> > > > Absolutely same question we can ask for current crypto-op API.
> > > > You have lookaside crypto-dev with 16 HW queues, each queue is
> > > > serviced by different CPU.
> > > > For the same SA, do you need a separate session per queue, or is
> > > > it ok to
> > reuse
> > > > current one?
> > > > AFAIK, right now this is a grey area not clearly defined.
> > > > For crypto-devs I am aware - user can reuse the same session (as
> > > > PMD uses it read-only).
> > > > But again, right now I think it is not clearly defined and is
> > > > implementation specific.
> > >
> > > User can use the same session, that is what I am also insisting, but
> > > it may have
> > separate
> > > Session private data. Cryptodev session create API provide that
> > > functionality
> > and we can
> > > Leverage that.
> >
> > rte_cryptodev_sym_session. sess_data[] is indexed by driver_id, which
> > means we can't use the same rte_cryptodev_sym_session to hold sessions
> > for both sync and async mode for the same device. Off course we can
> > add a hard requirement that any driver that wants to support process()
> > has to create sessions that can handle both  process and
> > enqueue/dequeue, but then again  what for to create such overhead?
> >
> > BTW, to be honest, I don't consider current rte_cryptodev_sym_session
> > construct for multiple device_ids:
> > __extension__ struct {
> >                 void *data;
> >                 uint16_t refcnt;
> >         } sess_data[0];
> >         /**< Driver specific session material, variable size */
> >
> Yes I also feel the same. I was also not in favor of this when it was introduced.
> Please go ahead and remove this. I have no issues with that.
> 
> > as an advantage.
> > It looks too error prone for me:
> > 1. Simultaneous session initialization/de-initialization for devices
> > with the same driver_id is not possible.
> > 2. It assumes that all device driver will be loaded before we start to
> > create session pools.
> >
> > Right now it seems ok, as no-one requires such functionality, but I
> > don't know how it will be in future.
> > For me rte_security session model, where for each security context
> > user have to create new session looks much more robust.
> Agreed
> 
> >
> > >
> > > BTW, I can see a v2 to this RFC which is still based on security library.
> >
> > Yes, v2 was concentrated on fixing found issues, some code
> > restructuring, i.e. - changes that would be needed anyway whatever API
> aproach we'll choose.
> >
> > > When do you plan
> > > To submit the patches for crypto based APIs. We have RC1 merge
> > > deadline for
> > this
> > > patchset on 21st Oct.
> >
> > We'd like to start working on it ASAP, but it seems we still have a
> > major disagreement about how this crypto-dev API should look like.
> > Which makes me think - should we return to our original proposal via
> > rte_security?
> > It still looks to me like clean and straightforward way to enable this
> > new API, and probably wouldn't cause that much controversy.
> > What do you think?
> 
> I cannot spend more time discussing on this until RC1 date. I have some other
> stuff pending.
> You can send the patches early next week with the approach that I
> mentioned or else we can discuss this post RC1(which would mean deferring
> to 20.02).
> 
> But moving back to security is not acceptable to me. The code should be put
> where it is intended and not where it is easy to put. You are not doing any
> rte_security stuff.
> 
> 
> Regards,
> Akhil
Previous message: [dpdk-dev] [RFC PATCH 1/9] security: introduce CPU Crypto action type and API
Next message: [dpdk-dev] [RFC PATCH 1/9] security: introduce CPU Crypto action type and API
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the dev mailing list