[dpdk-dev] [v2 1/2] cryptodev: support enqueue callback functions

Gujjar, Abhinandan S abhinandan.gujjar at intel.com
Fri Oct 9 18:51:29 CEST 2020


Hi Konstantin,


> -----Original Message-----
> From: Ananyev, Konstantin <konstantin.ananyev at intel.com>
> Sent: Friday, October 9, 2020 8:10 PM
> To: Gujjar, Abhinandan S <abhinandan.gujjar at intel.com>; Honnappa
> Nagarahalli <Honnappa.Nagarahalli at arm.com>; dev at dpdk.org; Doherty,
> Declan <declan.doherty at intel.com>
> Cc: jerinj at marvell.com; Akhil.goyal at nxp.com; Vangati, Narender
> <narender.vangati at intel.com>; nd <nd at arm.com>; nd <nd at arm.com>
> Subject: RE: [dpdk-dev] [v2 1/2] cryptodev: support enqueue callback functions
> 
> Hi Abhinandan,
> 
> >
> > Hi Konstantin & Honnappa,
> >
> > Thanks for all the inputs and feedback.
> >
> > @Ananyev, Konstantin,
> > I have measured the perf with and without callback on xeon. Here are the
> numbers:
> >
> > ./app/dpdk-test-crypto-perf -l 6-7
> > --vdev="crypto_openssl0,socket_id=0,max_nb_sessions=128" -- --ptest
> > throughput --devtype crypto_openssl --optype cipher-then-auth
> > --cipher-algo aes-cbc --cipher-op encrypt --cipher-key-sz 16
> > --auth-algo sha1-hmac --auth-op generate --auth-key-sz 64 --digest-sz
> > 12 --total-ops 10000000 --burst-sz 32 --buffer-sz 64
> >
> > With callback(+ RCU - totally opaque to data-plane threads)
> >     lcore id    Buf Size  Burst Size    Enqueued    Dequeued  Failed Enq  Failed
> Deq        MOps        Gbps  Cycles/Buf
> >            7          64          32                10000000    10000000           0           0
> 0.8129      0.4162     2694.09
> >            7          64          32                10000000    10000000           0           0
> 0.8152      0.4174     2686.31
> >            7          64          32                10000000    10000000           0           0
> 0.8198      0.4197     2671.48
> >
> > Without callback:
> >     lcore id    Buf Size  Burst Size    Enqueued    Dequeued  Failed Enq  Failed
> Deq        MOps        Gbps  Cycles/Buf
> >
> >            7          64          32               10000000    10000000           0           0
> 0.8234      0.4216     2659.81
> >            7          64          32               10000000    10000000           0           0
> 0.8247      0.4222     2655.63
> >            7          64          32               10000000    10000000           0           0
> 0.8123      0.4159     2695.90
> 
> 
> Just to cofirm:
> You implemented crypto enqueue callbacks using RCU QSBR online /offline (as
> suggested below) and numbers above are for:
> 1) callback code in place and some dummy callback installed
That's right. (+ RCU calling online + offline APIs inside rte_cryptodev_enqueue_burst())
> 2) callback code in place but no callbacks installed
No callback code. i.e. Original code.
> 
> Is my understanding correct here?
> Thanks
> Konstantin
> 
> 
> >
> > Regards
> > Abhinandan
> >
> > > -----Original Message-----
> > > From: Ananyev, Konstantin <konstantin.ananyev at intel.com>
> > > Sent: Tuesday, September 29, 2020 2:33 PM
> > > To: Honnappa Nagarahalli <Honnappa.Nagarahalli at arm.com>; Gujjar,
> > > Abhinandan S <abhinandan.gujjar at intel.com>; dev at dpdk.org; Doherty,
> > > Declan <declan.doherty at intel.com>
> > > Cc: jerinj at marvell.com; Akhil.goyal at nxp.com; Vangati, Narender
> > > <narender.vangati at intel.com>; nd <nd at arm.com>; nd <nd at arm.com>
> > > Subject: RE: [dpdk-dev] [v2 1/2] cryptodev: support enqueue callback
> > > functions
> > >
> > >
> > > >
> > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > +#ifdef RTE_CRYPTODEV_CALLBACKS int
> > > > > > > > > > > > > > +rte_cryptodev_rcu_qsbr_add(uint8_t dev_id,
> > > > > > > > > > > > > > +struct rte_rcu_qsbr
> > > > > > > > > > > > > > +*qsbr) {
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +	struct rte_cryptodev *dev;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +	if (!rte_cryptodev_pmd_is_valid_dev(dev_id)) {
> > > > > > > > > > > > > > +		CDEV_LOG_ERR("Invalid dev_id=%"
> PRIu8,
> > > > > dev_id);
> > > > > > > > > > > > > > +		return -EINVAL;
> > > > > > > > > > > > > > +	}
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +	dev = &rte_crypto_devices[dev_id];
> > > > > > > > > > > > > > +	dev->qsbr = qsbr;
> > > > > > > > > > > > > > +	return 0;
> > > > > > > > > > > > > > +}
> > > > > > > > > > > > >
> > > > > > > > > > > > > So if I understand your patch correctly you
> > > > > > > > > > > > > propose a new working model for
> > > > > > > > > > > > > crypto-devs:
> > > > > > > > > > > > > 1. Control-plane has to allocate/setup rcu_qsbr
> > > > > > > > > > > > > and do rte_cryptodev_rcu_qsbr_add().
> > > > > > > > > > > > > 2. Data-plane has somehow to obtain pointer to
> > > > > > > > > > > > > that rcu_qsbr and wrap
> > > > > > > > > > > > > cryptodev_enqueue()
> > > > > > > > > > > > >    with rcu_qsbr_quiescent()  or
> > > > > > > rcu_qsbr_online()/rcu_qsbr_offline().
> > > > > > > > > > > > Yes. I think, it is not a new model. It is same as
> > > > > > > > > > > > RCU integration with
> > > > > > > > > LPM.
> > > > > > > > > > > > Please refer:
> > > > > > > > > > > > https://patches.dpdk.org/cover/73673/
> > > > > > > > > > >
> > > > > > > > > > > I am talking about new working model for crypto-dev
> > > > > > > enqueue/dequeue.
> > > > > > > > > > > As I said above now it becomes data-plane thread
> > > > > > > > > > > responsibility
> > > to:
> > > > > > > > > > >  -somehow to obtain pointer to that rcu_qsbr for
> > > > > > > > > > > each cryptodev it is
> > > > > > > > > using.
> > > > > > > > > > >  -call rcu sync functions (quiescent/online/offline)
> > > > > > > > > > > on a regular
> > > > > basis.
> > > > > > > > > > It is not on regular basis. When data plane comes up,
> > > > > > > > > > they report
> > > > > online.
> > > > > > > > > > They report quiescent when they are done with critical
> > > > > > > > > > section or shared
> > > > > > > > > structure.
> > > > > > > > >
> > > > > > > > > I understand that, but it means all existing apps have
> > > > > > > > > to be changed that
> > > > > > > way.
> > > > > > > > >
> > > > > > > > > > All though, there is some dataplane changes involved
> > > > > > > > > > here, I don't think, it
> > > > > > > > > is major.
> > > > > > > > >
> > > > > > > > > I still think our goal here should be to make no visible
> > > > > > > > > changes to the dataplane.
> > > > > > > > > I.E. all necessary data-plane changes need to be hidden
> > > > > > > > > inside CB invocation part.
> > > > > > > > Please note that this is being implemented using the
> > > > > > > > memory reclamation framework documented at
> > > > > > > > https://doc.dpdk.org/guides/prog_guide/rcu_lib.html#resour
> > > > > > > > ce-r
> > > > > > > > ecla
> > > > > > > > mati
> > > > > > > > on-framework-for-dpdk
> > > > > > > >
> > > > > > > > While using RCU there are couple of trade-offs that
> > > > > > > > applications have to
> > > > > > > consider:
> > > > > > > > 1) Performance - reporting the quiescent state too often
> > > > > > > > results in performance impact on data plane
> > > > > > > > 2) Amount of outstanding memory to reclaim - reporting
> > > > > > > > less often results in more outstanding memory to reclaim
> > > > > > > >
> > > > > > > > Hence, the quiescent state reporting is left to the application.
> > > > > > > > The application decides how often it reports the quiescent
> > > > > > > > state and has control
> > > > > > > over the data plane performance and the outstanding memory
> > > > > > > to
> > > reclaim.
> > > > > > > >
> > > > > > > > When you say "new working model for crypto-dev
> > > > > > > > enqueue/dequeue",
> > > > > > > >
> > > > > > > > 1) are you comparing these with existing crypto-dev
> > > > > > > > enqueue/dequeue
> > > > > > > APIs? If yes, these are new APIs, it is not breaking anything.
> > > > > > > > 2) are you comparing these with existing call back
> > > > > > > > functions in ethdev enqueue/dequeue APIs? If yes, agree
> > > > > > > > that this is a new model. But, it is
> > > > > > > possible to support what ethdev supports along with the RCU
> > > > > > > method used in this patch.
> > > > > > >
> > > > > > > What I am talking about:
> > > > > > >
> > > > > > > Existing cryptodev enqueue/dequeue model doesn't require for
> > > > > > > the user to manage any RCU QSBR state manually.
> > > > > > > I believe that addition of ability to add/remove
> > > > > > > enqueue/dequeue callbacks shouldn't change existing working
> model.
> > > > > > > I think that adding/removing such callbacks has to be opaque
> > > > > > > to the user DP code and shouldn't require user to change it.
> > > > > > > Same as we have now for ethdev callback implementation.
> > > > > > The ethdev callback implementation conveniently leaves the
> > > > > > problem of
> > > > > freeing memory to the user to resolve, it does not handle the issue.
> > > > > > Hence, it "looks" to be opaque to the DP code. However, if the
> > > > > > application has to implement a safe way to free the call back
> > > > > > memory, its
> > > > > DP is affected based on call backs are being used or not.
> > > > >
> > > > > Yes, I think that's big drawback in initial ethdev callback
> > > > > implementation - it simply ignores DP/CP sync problem completely.
> > > > > Though I think it is possible to have both here:
> > > > >  keep callback "opaque" to DP code and provide some sync
> > > > > mechanism between DP/CP.
> > > > > Hopefully one day we can fix ethdev callbacks too.
> > > > The solution we develop can be used in ethdev too.
> > > >
> > > > >
> > > > > > > I think that forcing DP code to be aware that callbacks are
> > > > > > > present or not and to modify its behaviour depending on that
> > > > > > > nearly voids the purpose of having callbacks at all.
> > > > > > > In that case DP can just invoke callback function directly
> > > > > > > from it's
> > > > > codepath .
> > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > Note that now data-plane thread would have to do
> > > > > > > > > > > that always
> > > > > > > > > > > - even if there are now callbacks installed for that
> > > > > > > > > > > cryptodev queue
> > > > > > > right now.
> > > > > > > > > > > All that changes behaviour of existing apps and I
> > > > > > > > > > > presume would reduce adoption of  that fature.
> > > > > > > > If I understand this correct, you are talking about a case
> > > > > > > > where in the application might be
> > > > > > > > registering/unregistering multiple times during its
> > > > > > > > lifetime. In this case, yes, the application might be
> > > > > > > > reporting the
> > > > > > > quiescent state even when it has not registered the call backs.
> > > > > > > But, it has the flexibility to not report it if it implements additional
> logic.
> > > > > > > > Note that we are assuming that the application has to
> > > > > > > > report quiescent state only for using callback functions.
> > > > > > > > Most probably the application has
> > > > > > > other requirements to use RCU.
> > > > > > > > Why not support what is done for ethdev call back
> > > > > > > > functions along with
> > > > > > > providing RCU method?
> > > > > > > >
> > > > > > > > > > There is always trade off involved!
> > > > > > > > > > In the previous patch, you suggested that some lazy
> > > > > > > > > > app may not free up the memory allocated by add cb.
> > > > > > > > > > For such apps, this patch has sync mechanism with some
> > > > > > > > > > additional cost of CP & DP
> > > > > changes.
> > > > > > > > >
> > > > > > > > > Sigh, it is not about laziness of the app.
> > > > > > > > > The problem with current ethedev cb mechanism and yours
> > > > > > > > > V1 (which was just a clone of it) - CP doesn't know when
> > > > > > > > > it is safe after CB removal to free related memory.
> > > > > > > > >
> > > > > > > > > > > I still think all this callback mechanism should be
> > > > > > > > > > > totally opaque to data-plane threads - user
> > > > > > > > > > > shouldn't change his app code depending on would
> > > > > > > > > > > some enqueue/dequeue callbacks be
> > > > > > > installed or not.
> > > > > > > > > > I am not sure, how that can be implemented with
> > > > > > > > > > existing RCU
> > > > > design.
> > > > > > > > >
> > > > > > > > > As I said below the simplest way - with calling rcu
> > > > > > > > > onine/offline inside CB invocation block.
> > > > > > > > > That's why I asked you - did you try that approach and
> > > > > > > > > what is the perf numbers?
> > > > > > > > > I presume with no callbacks installed the perf change
> > > > > > > > > should be nearly
> > > > > > > zero.
> > > > > > > > >
> > > > > > > > > > @Honnappa Nagarahalli, Do you have any suggestions?
> > > > > > > > Reporting quiescent state in the call back functions has
> > > > > > > > several
> > > > > > > disadvantages:
> > > > > > > > 1) it will have performance impacts and the impacts will
> > > > > > > > increase as the
> > > > > > > number of data plane threads increase.
> > > > > > > > 2) It will require additional configuration parameters to
> > > > > > > > control how often the quiescent state is reported to
> > > > > > > > control the performance
> > > > > impact.
> > > > > > > > 3) Does not take advantage of the fact that most probably
> > > > > > > > the application is using RCU already
> > > > > > > > 4) There are few difficulties as well, please see below.
> > > > > > >
> > > > > > > I suggested Abhinandan to use RCU library because it is
> > > > > > > already there, and I thought it would be good not to re-implement
> the wheel.
> > > > > > > Though if you feel librte_rcu doesn't match that task -
> > > > > > > fine, let's do it without librte_rcu.
> > > > > > > After all, what we need here - just an atomic ref count per
> > > > > > > queue that we are going to increment at entering and leaving
> > > > > > > list of callbacks inside enqueue/dequeue.
> > > > > > Ok, looks like I missed the point that a queue is used by a
> > > > > > single data plane
> > > > > thread.
> > > > > > Along with ref count increment you need the memory orderings
> > > > > > to avoid
> > > > > race conditions. These will be the same ones used in RCU.
> > > > > > On the control plane, you need to read this counter and poll
> > > > > > for the
> > > > > counter updates. All this is same cost as in RCU.
> > > > >
> > > > > Agree.
> > > > >
> > > > > > To control the cost, you
> > > > > > will have to control the rate of quiescent state reporting and
> > > > > > might have to
> > > > > expose this as a configuration parameter.
> > > > > >
> > > > > > The other important information you have to consider is if the
> > > > > > thread is making any blocking calls, which may be in some
> > > > > > other library. The thread is supposed to call
> > > > > > rcu_qsbr_thread_offline API before calling a
> > > > > blocking call. This allows the RCU to know that this particular
> > > > > thread will not report quiescent state. The cryptodev library
> > > > > might not have
> > > that information.
> > > > > >
> > > > > > If you want to go ahead with this design, you can still use
> > > > > > RCU with single thread configuration (like you have mentioned
> > > > > > below) and hide the
> > > > > details from the application.
> > > > >
> > > > > Yes,  same thought here -  use rcu_qsbr online/offline for DP
> > > > > part and hide actual sync details inside callback code.
> > > > We can give it a try. If we can have the performance numbers, we
> > > > can decide about how to control how often these APIs are called on
> > > > the data
> > > plane.
> > >
> > > To avoid misunderstanding: I am talking about calling online/offline
> > > with every
> > > cryptodev_enqueue() traversal over CB list.
> > >
> > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > That seems quite a big change and I don't think
> > > > > > > > > > > > > it is acceptable for most users.
> > > > > > > > > > > > > From my perspective adding/installing call-backs
> > > > > > > > > > > > > to the dev has to be opaque to the data-plane code.
> > > > > > > > > > > > > Also note that different callbacks can be
> > > > > > > > > > > > > installed by different entities (libs) and might
> > > > > > > > > > > > > have no idea about each
> > > > > other.
> > > > > > > > > > > > > That's why I thought it would be better to make
> > > > > > > > > > > > > all this RCU stuff internal inside cryptodev:
> > > > > > > > > > > > >     hide all this rcu_qsbr allocation/setup
> > > > > > > > > > > > > inside cryptod somehow to
> > > > > > > > > > > obtain pointer to that rcu_qsbr ev init/queue setup
> > > > > > > > > > > > >     invoke rcu_qsbr_online()/rcu_qsbr_offline()
> > > > > > > > > > > > > inside
> > > > > > > > > > > cryptodev_enqueue().
> > > > > > > > This will bring in the application related information
> > > > > > > > such as the thread ID
> > > > > > > into the library.
> > > > > > >
> > > > > > > I don't think it would.
> > > > > > > Cryptodev enqueue/dequeue functions are not supposed to be
> > > > > > > thread safe (same as rx/tx burst).
> > > > > > > So we can always use RCU with just one thread(thread_id = 0).
> > > > > > Agree, the memory that needs to be freed is accessed by a
> > > > > > single thread
> > > > > on the data plane. RCU with one thread would suffice.
> > > > > >
> > > > > > > But as I said above - if you feel RCU lib is an overhead
> > > > > > > here, that's fine - I think it would be easy enough to do without
> librte_rcu.
> > > > > > >
> > > > > > > > If the same API calls are being made from multiple data
> > > > > > > > plane threads, you need a way to configure that
> > > > > > > > information to the library. So, it is better to leave
> > > > > > > > those details for the application to
> > > handle.
> > > > > > > >
> > > > > > > > > > > > I have already tried exploring above stuffs. There
> > > > > > > > > > > > are too many
> > > > > > > > > constraints.
> > > > > > > > > > > > The changes don't fit in, as per RCU design.
> > > > > > > > > > >
> > > > > > > > > > > Hmm could you be more specific here - what
> > > > > > > > > > > constraints are you referring to?
> > > > > > > > > > >
> > > > > > > > > > > > Moreover, having rcu api under enqueue_burst()
> > > > > > > > > > > > will affect the
> > > > > > > > > > > performance.
> > > > > > > > > > >
> > > > > > > > > > > It most likely will. Though my expectation it will
> > > > > > > > > > > affect performance only when some callbacks are
> > > > > > > > > > > installed. My thought
> > > > > > > here:
> > > > > > > > > > > callback function by itself will affect
> > > > > > > > > > > cryptdev_enqueue performance anyway,
> > > > > > > > > > With existing callback design, I have measured the
> > > > > > > > > > performance(with
> > > > > > > > > crypto perf test) on xeon.
> > > > > > > > > > It was almost negligible and same was shared with Declan.
> > > > > > > > >
> > > > > > > > > I am asking about different thing: did you try alternate
> > > > > > > > > approach I described, that wouldn't require changes in
> > > > > > > > > the user data-
> > > > > plane code.
> > > > > > > > >
> > > > > > > > > > That is one of the reasons, I didn't want to add to
> > > > > > > > > > many stuffs in to the
> > > > > > > > > callback.
> > > > > > > > > > The best part of existing design is crypto lib is not much
> modified.
> > > > > > > > > > The changes are either pushed to CP or DP.
> > > > > > > > > >
> > > > > > > > > > so adding extra overhead for sync is probably ok here.
> > > > > > > > >
> > > > > > > > > I think that extra overhead when callbacks are present
> > > > > > > > > is expected and probably acceptable.
> > > > > > > > > Changes in the upper-layer data-plane code - probably not.
> > > > > > > > >
> > > > > > > > > > > Though for situation when no callbacks are installed
> > > > > > > > > > > - perfomance should be left unaffected (or impact
> > > > > > > > > > > should be as small
> > > > > > > as possible).
> > > > > > > > > > >
> > > > > > > > > > > > The changes are more on control plane side, which
> > > > > > > > > > > > is one
> > > time.
> > > > > > > > > > > > The data plane changes are minimal.
> > > > > > > > > > >
> > > > > > > > > > > I still think upper layer data-plane code should
> > > > > > > > > > > stay unaffected (zero changes).
> > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > >
> > > > > > <snip>


More information about the dev mailing list