[dpdk-dev] [RFC v1] doc compression API for DPDK

Verma, Shally Shally.Verma at cavium.com
Tue Dec 26 12:15:40 CET 2017


HI Fiona

> [Fiona] Would it be preferable to issue v2 even with design so far?
Sure will do. 
Please see inline for feedback on other points.

> -----Original Message-----
> From: Trahe, Fiona [mailto:fiona.trahe at intel.com]
> Sent: 22 December 2017 20:43
> To: Verma, Shally <Shally.Verma at cavium.com>; dev at dpdk.org
> Cc: Athreya, Narayana Prasad <NarayanaPrasad.Athreya at cavium.com>;
> Gupta, Ashish <Ashish.Gupta at cavium.com>; Sahu, Sunila
> <Sunila.Sahu at cavium.com>; De Lara Guarch, Pablo
> <pablo.de.lara.guarch at intel.com>; Challa, Mahipal
> <Mahipal.Challa at cavium.com>; Jain, Deepak K <deepak.k.jain at intel.com>;
> Hemant Agrawal <hemant.agrawal at nxp.com>; Roy Pledge
> <roy.pledge at nxp.com>; Youri Querry <youri.querry_1 at nxp.com>; Ahmed
> Mansour <ahmed.mansour at nxp.com>; Trahe, Fiona
> <fiona.trahe at intel.com>
> Subject: RE: [RFC v1] doc compression API for DPDK
> 
> Hi Shally,
> 
> > -----Original Message-----
> > From: Verma, Shally [mailto:Shally.Verma at cavium.com]
> > Sent: Friday, December 22, 2017 7:46 AM
> > To: Trahe, Fiona <fiona.trahe at intel.com>; dev at dpdk.org
> > Cc: Athreya, Narayana Prasad <NarayanaPrasad.Athreya at cavium.com>;
> Gupta, Ashish
> > <Ashish.Gupta at cavium.com>; Sahu, Sunila <Sunila.Sahu at cavium.com>;
> De Lara Guarch, Pablo
> > <pablo.de.lara.guarch at intel.com>; Challa, Mahipal
> <Mahipal.Challa at cavium.com>; Jain, Deepak K
> > <deepak.k.jain at intel.com>; Hemant Agrawal
> <hemant.agrawal at nxp.com>; Roy Pledge
> > <roy.pledge at nxp.com>; Youri Querry <youri.querry_1 at nxp.com>;
> Ahmed Mansour
> > <ahmed.mansour at nxp.com>; Trahe, Fiona <fiona.trahe at intel.com>
> > Subject: RE: [RFC v1] doc compression API for DPDK
> >
> > Hi Fiona
> >
> > > -----Original Message-----
> > > From: Trahe, Fiona [mailto:fiona.trahe at intel.com]
> > > Sent: 20 December 2017 21:03
> > > To: Verma, Shally <Shally.Verma at cavium.com>; dev at dpdk.org
> > > Cc: Athreya, Narayana Prasad <NarayanaPrasad.Athreya at cavium.com>;
> > > Gupta, Ashish <Ashish.Gupta at cavium.com>; Sahu, Sunila
> > > <Sunila.Sahu at cavium.com>; De Lara Guarch, Pablo
> > > <pablo.de.lara.guarch at intel.com>; Challa, Mahipal
> > > <Mahipal.Challa at cavium.com>; Jain, Deepak K
> <deepak.k.jain at intel.com>;
> > > Hemant Agrawal <hemant.agrawal at nxp.com>; Roy Pledge
> > > <roy.pledge at nxp.com>; Youri Querry <youri.querry_1 at nxp.com>;
> Ahmed
> > > Mansour <ahmed.mansour at nxp.com>; Trahe, Fiona
> > > <fiona.trahe at intel.com>
> > > Subject: RE: [RFC v1] doc compression API for DPDK
> > >
> > > Hi Shally,
> > >
> > > I think we are almost in sync now - a few comments below with just one
> > > open question which I suspect was a typo.
> > > If this is ok then no need for a meeting I think.
> > > In this case will you issue a v2 of this doc ?
> > >
> > >
> > > > -----Original Message-----
> > > > From: Verma, Shally [mailto:Shally.Verma at cavium.com]
> > > > Sent: Wednesday, December 20, 2017 7:15 AM
> > > > To: Trahe, Fiona <fiona.trahe at intel.com>; dev at dpdk.org
> > > > Cc: Athreya, Narayana Prasad
> <NarayanaPrasad.Athreya at cavium.com>;
> > > Gupta, Ashish
> > > > <Ashish.Gupta at cavium.com>; Sahu, Sunila
> <Sunila.Sahu at cavium.com>;
> > > De Lara Guarch, Pablo
> > > > <pablo.de.lara.guarch at intel.com>; Challa, Mahipal
> > > <Mahipal.Challa at cavium.com>; Jain, Deepak K
> > > > <deepak.k.jain at intel.com>; Hemant Agrawal
> > > <hemant.agrawal at nxp.com>; Roy Pledge
> > > > <roy.pledge at nxp.com>; Youri Querry <youri.querry_1 at nxp.com>;
> > > Ahmed Mansour
> > > > <ahmed.mansour at nxp.com>
> > > > Subject: RE: [RFC v1] doc compression API for DPDK
> > > >
> > > > Hi Fiona
> > > >
> > > > Please refer to my comments below with my understanding on two
> major
> > > points OUT_OF_SPACE and
> > > > Stateful Design.
> > > > If you believe we still need a meeting to converge on same please
> share
> > > meeting details to me.
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Trahe, Fiona [mailto:fiona.trahe at intel.com]
> > > > > Sent: 15 December 2017 23:11
> > > > > To: Verma, Shally <Shally.Verma at cavium.com>; dev at dpdk.org
> > > > > Cc: Athreya, Narayana Prasad
> <NarayanaPrasad.Athreya at cavium.com>;
> > > > > Challa, Mahipal <Mahipal.Challa at cavium.com>; De Lara Guarch, Pablo
> > > > > <pablo.de.lara.guarch at intel.com>; Gupta, Ashish
> > > > > <Ashish.Gupta at cavium.com>; Sahu, Sunila
> <Sunila.Sahu at cavium.com>;
> > > > > Trahe, Fiona <fiona.trahe at intel.com>; Jain, Deepak K
> > > > > <deepak.k.jain at intel.com>
> > > > > Subject: RE: [RFC v1] doc compression API for DPDK
> > > > >
> > > > > Hi Shally,
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Verma, Shally [mailto:Shally.Verma at cavium.com]
> > > > > > Sent: Thursday, December 7, 2017 5:43 AM
> > > > > > To: Trahe, Fiona <fiona.trahe at intel.com>; dev at dpdk.org
> > > > > > Cc: Athreya, Narayana Prasad
> > > <NarayanaPrasad.Athreya at cavium.com>;
> > > > > Challa, Mahipal
> > > > > > <Mahipal.Challa at cavium.com>; De Lara Guarch, Pablo
> > > > > <pablo.de.lara.guarch at intel.com>; Gupta, Ashish
> > > > > > <Ashish.Gupta at cavium.com>; Sahu, Sunila
> > > <Sunila.Sahu at cavium.com>
> > > > > > Subject: RE: [RFC v1] doc compression API for DPDK
> > > > >
> > > > > //snip....
> > > > >
> > > > > > > > > > Please note any time output buffer ran out of space during
> write
> > > > > then
> > > > > > > > > operation will turn “Stateful”.  See
> > > > > > > > > > more on Stateful under respective section.
> > > > > > > > > [Fiona] Let's come back to this later. An alternative is that
> > > > > > > OUT_OF_SPACE is
> > > > > > > > > returned and the  application
> > > > > > > > > must treat as a fail and resubmit the operation with a larger
> > > > > destination
> > > > > > > > > buffer.
> > > > > > > >
> > > > > > > > [Shally] Then I propose to add a feature flag
> > > > > > > "FF_SUPPORT_OUT_OF_SPACE" per xform type for flexible
> > > > > > > > PMD design.
> > > > > > > > As there're devices which treat it as error on compression but
> not
> > > on
> > > > > > > decompression.
> > > > > > > > If it is not supported, then it should be treated as failure
> condition
> > > and
> > > > > app
> > > > > > > can resubmit operation.
> > > > > > > > if supported, behaviour *To-be-Defined* under stateful.
> > > > > > > [Fiona] Can you explain 'turn stateful' some more?
> > > > > > > If compressor runs out of space during stateless operation, either
> > > comp
> > > > > or
> > > > > > > decomp, and turns stateful, how would the app know? And what
> > > would
> > > > > be in
> > > > > > > status, consumed and produced?
> > > > > > > Could it return OUT_OF_SPACE, and if both consumed and
> produced
> > > == 0
> > > > > >
> > > > > > [Shally] If consumed = produced == 0, then it's not OUT_OF_SPACE
> > > > > condition.
> > > > > >
> > > > > > > then the whole op must be resubmitted with a bigger output
> buffer.
> > > But
> > > > > if
> > > > > > > consumed and produced > 0 then app could take the output and
> > > submit
> > > > > next
> > > > > > > op
> > > > > > > continuing from consumed+1.
> > > > > > >
> > > > > >
> > > > > > [Shally] consumed and produced will *always* be > 0 in case of
> > > > > OUT_OF_SPACE.
> > > > > > OUT_OF_SPACE means output buffer exhausted while writing data
> into
> > > it
> > > > > and PMD may have more to
> > > > > > write to it. So in such case, PMD should set
> > > > > > Produced = complete length of output buffer
> > > > > > Status = OUT_OF_SPACE
> > > > > > consume, following possibilities here:
> > > > > > 1. consumed = complete length of src mbuf means PMD has read
> full
> > > input,
> > > > > OR
> > > > > > 2. consumed = partial length of src mbuf means PMD has read
> partial
> > > input
> > > > > >
> > > > > > On seeing this status, app should consume output and re-enqueue
> > > same
> > > > > op with empty output buffer and
> > > > > > src = consumed+1.
> > > > > [Fiona] As this was a stateless op, the PMD cannot be expected to
> have
> > > > > stored the history and state and so
> > > > > cannot be expected to continue from consumed+1. This would be
> > > stateful
> > > > > behaviour.
> > > >
> > > > [Shally] Exactly.
> > > >
> > > > > But it seems you are saying that even on in this stateless case you'd
> like
> > > the
> > > > > PMDs who can store state
> > > > > to have the option of converting to stateful. So
> > > > > a PMD which can support this could return OUT_OF_SPACE with
> > > > > produced/consumed as you describe above.
> > > > > a PMD which can't support it should return an error.
> > > > > The appl can continue on from consumed+1 in the former case and
> > > resubmit
> > > > > the full request
> > > > > with a bigger buffer in the latter case.
> > > > > Is this the behaviour you're looking for?
> > > > > If so the error could be something like NEED_BIGGER_DST_BUF?
> > > > > However, wouldn't OUT_OF_SPACE with produced=consumed=0
> convey
> > > the
> > > > > same information on the API?
> > > > > It may correspond to an error on the underlying PMD, but would it be
> > > simpler
> > > > > on the compressdev API
> > > > >
> > > > >
> > > > > > Please note as per current proposal, app should call
> > > > > rte_compdev_enqueue_stream() version of API if it
> > > > > > doesn't know output size beforehand.
> > > > > [Fiona] True. But above is only trying to describe behaviour in the
> > > stateless
> > > > > error case.
> > > >
> > > > [Shally] Ok. Now I got point of confusion with term 'turns stateful' here.
> No
> > > it's not like stateless to
> > > > stateful conversion.
> > > > Stateless operation is stateless only and in stateless we don't expect
> > > OUT_OF_SPACE error. So, now I
> > > > also understand what you're trying to imply with
> produced=consumed=0.
> > > >
> > > > So, let me summarise redefinition of OUT_OF_SPACE based on RFC v3:
> > > >
> > > > Interpreting OUT_OF_SPACE condition:
> > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > > A. Stateless Operations:
> > > > ----------------------------------
> > > > A.1 If operation is stateless i.e. rte_comp_op. op_type ==
> > > RTE_COMP_OP_STATELESS, and PMD runs out
> > > > of buffer during compression or decompression then it is an error
> condition
> > > for PMD.
> > > > It will reset itself and return with produced=consumed=0 with status
> > > OUT_OF_SPACE. On seeing this,
> > > > application should resubmit full request with bigger output buffer size.
> > > >
> > > > B. Stateful Operations:
> > > > -------------------------------
> > > > B.1 If operation is stateful i.e. rte_comp_op.op_type ==
> > > RTE_COMP_OP_STATEFUL,  and PMD runs out
> > > > of buffer during compression or decompression, then PMD will update
> > > > produced=consumed (as mentioned above)
> > > [Fiona] ? Did you mean to say "will update produced & consumed" ?
> >
> > [Shally] Yes you right that was typo. It should be produced & consumed.
> >
> > > I think
> > >   - consumed would be <= input length (typically <)
> > >   - produced would be <= output buffer len (typically =, but could be a
> few
> > > bytes less)
> > >   - status would be OUT_OF_SPACE
> > > Do you agree?
> > >
> >
> > [Shally] Yes.
> >
> > > > and app should resubmit op with input from consumed+1
> > > > and output buffer with free space.
> > > > Please note for such case, application should allocate stream via call to
> > > rte_comp_stream_create() and
> > > > attach it to op and pass it along every time pending op is enqueued until
> op
> > > processing is complete with
> > > > status set to SUCCESS/FAILURE.
> > > [Fiona] Agreed
> > >
> > > >
> > > > >
> > > > > //snip.....
> > > > >
> > > > > > > > > >
> > > > > > > > > > D.2.1.2 Stateful operation state maintenance
> > > > > > > > > >  -------------------------------------------------------------
> > > > > > > > > > This section starts with description of our understanding
> about
> > > > > > > > > compression API support for stateful.
> > > > > > > > > > Depending upon understanding build upon these concepts,
> we
> > > will
> > > > > > > identify
> > > > > > > > > required data structure/param
> > > > > > > > > > to maintain in-progress operation context by PMD.
> > > > > > > > > >
> > > > > > > > > > For stateful compression, batch of dependent packets starts
> at a
> > > > > packet
> > > > > > > > > having
> > > > > > > > > > RTE_NO_FLUSH/RTE_SYNC_FLUSH flush value and end at
> packet
> > > > > having
> > > > > > > > > RTE_FULL_FLUSH/FINAL_FLUSH.
> > > > > > > > > > i.e. array of operations will carry structure like this:
> > > > > > > > > >
> > > > > > > > > > --------------------------------------------------------------------------
> ----
> > > -----
> > > > > -
> > > > > > > > > > |op1.no_flush | op2.no_flush | op3.no_flush |
> op4.full_flush|
> > > > > > > > > > --------------------------------------------------------------------------
> ----
> > > -----
> > > > > -
> > > > > > > > > >
> > > > > > > [Fiona] I think it needs to be more constrained than your
> examples
> > > > > below.
> > > > > > > Only 1 operation from a stream can be in a burst. As each
> operation
> > > > > > > in a stateful stream must complete, as next operation needs state
> > > and
> > > > > > > history
> > > > > > > of previous operation to be complete before it can be processed.
> > > > > > > And if one failed, e.g. due to OUT_OF_SPACE, this should affect
> > > > > > > the following operation in the same stream.
> > > > > > > Worst case this means bursts of 1. Burst can be >1 if there are
> > > multiple
> > > > > > > independent streams with available data for processing. Or if
> there is
> > > > > > > data available which can be statelessly processed.
> > > > > > >
> > > > > > > If there are multiple buffers available from a stream , then instead
> > > they
> > > > > can
> > > > > > > be linked together in an mbuf chain sent in a single operation.
> > > > > > >
> > > > > > > To handle the sequences below would mean the PMD
> > > > > > > would need to store ops sending one at a time to be processed.
> > > > > > >
> > > > > > > As this is significantly different from what you describe below, I'll
> wait
> > > for
> > > > > > > further feedback
> > > > > > > before continuing.
> > > > > >
> > > > > > [Shally] I concur with your thoughts. And these're are not
> significantly
> > > > > different from the concept
> > > > > > presented below.
> > > > > >
> > > > > > Yes as you mentioned, even for burst_size>1 PMD will have to
> serialize
> > > > > each op internally i.e.
> > > > > > It has to wait for previous to finish before putting next for
> processing
> > > which
> > > > > is
> > > > > > as good as application making serialised call passing one op at-a-time
> or
> > > if
> > > > > > stream consists of multiple buffers, making their scatter-gather list
> and
> > > > > > then enqueue it as one op at a time which is more efficient and
> ideal
> > > usage.
> > > > > > However in order to allow extensibility, I didn't mention limitation
> on
> > > > > burst_size.
> > > > > > Because If PMD doesn't support burst_size > 1 it can always return
> > > > > nb_enqueued = 1, in which case
> > > > > > app can enqueue next however with condition it should wait for
> > > previous
> > > > > to complete
> > > > > > before making next enqueue call.
> > > > > >
> > > > > > So, if we take simple example to compress 2k of data with src mbuf
> size
> > > =
> > > > > 1k.
> > > > > > Then with burst_size=1, expected call flow would be(this is just one
> > > flow,
> > > > > other variations are also possible
> > > > > > suchas making chain of 1k buffers and pass whole data in one go):
> > > > > >
> > > > > > 1. fill 1st 1k chunk of data in op.msrc
> > > > > > 2.enqueue_stream (..., |op.flush = no_flush|, 1, ptr_stream);
> > > > > > 3.dequeue_burst(|op|,1);
> > > > > > 4.refill next 1k chunk in op.msrc
> > > > > > 5.enqueue_stream(...,|op.flush = full_flush|, 1 , ptr_stream);
> > > > > > 6.dequeue_burst(|op|, 1);
> > > > > > 7.end
> > > > > >
> > > > > > So, I don’t see much of a change in API call flow from here to design
> > > > > presented below except nb_ops = 1 in
> > > > > > each call.
> > > > > > However I am assuming that op structure would still be same for
> > > stateful
> > > > > processing i.e. it would start with
> > > > > > op.flush value = NO/SYNC_FLUSH and end at op with flush value =
> FULL
> > > > > FLUSH.
> > > > > > Are we on same page here?
> > > > > >
> > > > > > Thanks
> > > > > > Shally
> > > > >
> > > > > [Fiona] We still have a different understanding of the stateful flow
> > > needed
> > > > > on the API.
> > > > > I’ll try to clarify and maybe we can set up a meeting to discuss.
> > > > > My assumptions first:
> > > > > •	Order of ops on a qp must be maintained – ops should be
> dequeued
> > > > > in same sequence they are enqueued.
> > > > > •	Ops from many streams can be enqueued on same qp.
> > > > > •	Ops from a qp may be fanned out to available hw or sw
> engines and
> > > > > processed in parallel, so each op must be independent.
> > > > > •	Stateless and stateful ops can be enqueued on the same qp
> > > > >
> > > > > Submitting a burst of stateless ops to a qp is no problem.
> > > > > Submitting more than 1 op at a time from the same stateful stream to
> a
> > > qp is
> > > > > a problem.
> > > > > Example:
> > > > > Appl submits 2 ops in same stream in a burst, each has src and dest
> > > mbufs,
> > > > > input length/offset and
> > > > > requires checksum to be calculated.
> > > > > The first op must be processed to completion before the second can
> be
> > > > > started as it needs the history and the checksum so far.
> > > > > If each dest mbuf is big enough so no overflow, each dest mbuf will
> be
> > > > > partially filled. This is probably not
> > > > > what’s desired, and will force an extra copy to make the output data
> > > > > contiguous.
> > > > > If the dest mbuf in the first op is too small, then does the PMD alloc
> more
> > > > > memory in the dest mbuf?
> > > > > Or alloc another mbuf? Or fail and the whole burst must be
> resubmitted?
> > > > > Or store the 2nd op, wait, on seeing the OUT_OF_SPACE on the 1st
> op,
> > > > > overwrite the src, dest, len etc of the 2nd op
> > > > > to include the unprocessed part of the 1st op?
> > > > > In the meantime, are all other ops on the qp blocked behind these?
> > > > > For hw accelerators it’s worse, as PMD would normally return once
> ops
> > > are
> > > > > offloaded and the dequeue would
> > > > > pass processed ops straight back to the appl. Instead, the enqueue
> would
> > > > > need to kick off a thread to
> > > > > dequeue ops and filter to find the stateful one, storing the others til
> the
> > > next
> > > > > application dequeue is called.
> > > > >
> > > > > Above scenarios don’t lend themselves to accelerating a packet
> > > processing
> > > > > workload.
> > > > > It pushes a workload down to all PMDs which I believe belongs above
> this
> > > API
> > > > > as
> > > > > that work is not about offloading the compute intensive compression
> > > work
> > > > > but
> > > > > about the sequencing of data and so is better coded once, above the
> API
> > > in
> > > > > an application layer
> > > > > common to all PMDs. (See Note1 in
> > > http://dpdk.org/ml/archives/dev/2017-
> > > > > October/078944.html )
> > > > > If an application has several packets with data from a stream that it
> needs
> > > to
> > > > > (de)compress statefully,
> > > > > what it probably wants is for the output data to fill each output buffer
> > > > > completely before writing to the next buffer.
> > > > > Chaining the src mbufs in these pkts into one chain and sending as
> one op
> > > > > allows the output
> > > > > data to be packed into a dest mbuf or mbuf chain.
> > > > > I think what’s needed is a layer above the API to accumulate incoming
> > > > > packets while waiting for the
> > > > > previous set of packets to be compressed. Forwarding to the PMD to
> > > queue
> > > > > there is not the right place
> > > > > to buffer them as the queue should be per stream rather than on the
> > > > > accelerator engine’s queue
> > > > > which has lots of other independent packets.
> > > > >
> > > >
> > > > [Shally] Ok. I believe I get it.
> > > > In general I agree to this proposal. However have concern on 1 point
> here
> > > i.e. order maintenance. Please
> > > > see further for more explanation.
> > > >
> > > > >
> > > > > Proposal:
> > > > > • Ops from a qp may be fanned out to available hw or sw engines and
> > > > >     processed in parallel, so each op must be independent.
> > > > [Shally] Possible only if  PMD support combination of SW and HW
> > > processing. Right?
> > > [Fiona] Not necessarily, Intel QuickAssist accelerators are HW and can
> > > process ops from same qp in parallel
> > >
> > >
> > > > > • Order of ops on a qp must be maintained – ops should be
> dequeued in
> > > > > same sequence they are enqueued.
> > > > [Shally] If each op is independent then why do we need to maintain
> > > ordering. Since they're independent
> > > > and thus can be processed in parallel so they can well be quite out-of-
> order
> > > and available for dequeue as
> > > > soon as completed.
> > > > Serializing them will limit HW throughput capability. And I can envision
> > > some app may not care about
> > > > ordering just completion.
> > > > So I would suggest if application need ordering should tag each op with
> > > some id or serial number in op
> > > > user_data area to identify enqueue order OR we may add flag in
> > > enqueue_burst() API to enforce
> > > > serialized dequeuing, if that's hard requirement of any.
> > > [Fiona] Ok,  I think you're right, this requirement isn't needed.
> > > In stateless ops it's not needed.
> > > For stateful the appl should only have one op per stream inflight at any
> time
> > > so manages the ordering.
> > > So we can specify on the API that ordering is not necessarily maintained
> on
> > > the qp and PMDs may return responses out-of-order.
> > > The responsibility is on the application to maintain order if it's needed.
> > > If later we find some argument for maintaining order I'd suggest a
> > > configuration param per qp or even per device rather than on the
> > > enqueue_burst()
> >
> > [Shally] Done.
> >
> > >
> > >
> > > > > • Stateless and stateful ops can be enqueued on the same qp
> > > > > • Stateless and stateful ops can be enqueued in the same burst
> > > > > • Only 1 op at a time may be enqueued to the qp from any stateful
> > > stream.
> > > > > • A burst can have multiple stateful ops, but each must be from a
> > > different
> > > > > stream.
> > > > > • All ops will have a session attached – this will only contain
> immutable
> > > data
> > > > > which
> > > > >    can be used by many ops, devices and or drivers at the same time.
> > > > > • All stateful ops will have a stream attached for maintaining state and
> > > > >    history, this can only be used by one op at a time.
> > > > [Shally] So, you mean:
> > > >
> > > > A single enque_burst() *can* carry multiple streams. I.E. This is allowed
> > > both in burst or in qp (say, when
> > > > multiple threads call enque_burst() on same qp)
> > > >
> > > >                                       ----------------------------------------------------------------
> -----
> > > ------------------------------------
> > > >               enque_burst (|op1.no_flush | op2.no_flush | op3.flush_final |
> > > op4.no_flush | op5.no_flush |)
> > > >                                        ----------------------------------------------------------------
> ----
> > > -------------------------------------
> > > > Where,
> > > > All op1, op2...op5 belongs to all *different* streams. Op3 can be
> > > stateless/stateful depending upon
> > > > op_type value and each can have *same or different* sessions.
> > > [Fiona] Exactly
> > >
> > > > If I understand this right, then yes it looks good to me. However this
> also
> > > bring one minor point for
> > > > discussion but I would wait to initiate that until we close on current
> open
> > > points.
> > > >
> > > > Thanks
> > > > Shally
> >
> > [Shally] Since we are in sync now. So I will bring up another point for
> discussion.
> >
> > I'm thinking probably we should have stream regardless of op_type where
> it should be marked
> > *mandatory* for stateful but *optional*  (or may be mandatory) for
> stateless as having it for stateless may
> > help some PMD to gain performance in data path and reason is here:
> >
> > Currently we see stream as resource which maintain states (et el) for
> stateful processing but this is also a
> > placeholder where PMD can choose to do one-time resource setup
> common to both op_types (such as
> > allocating instruction from its internal pool and initialize it with session
> params).
> > So for such PMD designs, it will be beneficial to use stream for stateless as
> well as it will help minimize
> > instruction setup time on data path as all 1-time operations will be done in
> stream_create().
> > In case of stateless, if stream is present would mean it is available for next
> use as soon as last associated
> > op is dequeued as it hold no context of last op but common resources re-
> useable for next op.
> [Fiona] We intend to use session private area for similar.  But I don't see a
> problem with what you suggest.
> We can either add a capability which appl must check to know if it should call
> stream_create() for STATELESS sessions OR we could say stream_create
> should be called for every session, if it returns non-NULL for stateless session
> then it should be attached to every op sent to the session?
> 

[Shally] I would prefer for second option i.e "say stream_create should be called for every session".
It will give more flexibility to PMD as then it can decide it's support per session. Also, it keeps spec simple.

> 
> > Apart, there's another point. We can enable API spec to leverage mempool
> object cache concept by
> > allowing PMD to allocate stream from op pool as per-op private data i.e.
> each object elt_size =
> > sizeof(rte_comp_op) + user_size + stream_size.
> > This would help PMD reduce memory access time if caching is enabled as
> each op stream reside with it in
> > cache rather than having them from different pool with different policies.
> [Fiona] I'm not sure about this. The intention was the op-pool would be
> device-independent.
> So the appl would have to retrieve the size of stream from all PMDs and size
> for the largest.
> So could be wasteful for memory.
> For stateful I can see, if memory was not an issue this would be good, as one
> op could be re-used, stream would be already attached.
> But for stateless, I think you're suggesting there would be one stream used
> by many ops in a burst or even in different bursts. So how could that work as
> each op would have a separate stream?

[Shally] No that’s not what I intend. Each op would still be separate stream. Only requirement I raised that, stream could also be kept in per-op private area to utilize object caching because in any case each op uses separate stream. 
It can be implemented in multiple ways (maybe add another API , say alloc_stream_from_op_priv_area() or add simply get_stream_size()). 
Surely, this makes op pool device dependent so I see it as additional feature to meet specific requirement of app using mempool caching.
However I do not see it as blocker to freeze on current spec and thus suggest to leave it open for discussion and add later as incremental patch if we see value in it.

Thanks
Shally

> 
> 
> > If agreed, then it can be enabled in API spec such as (this is just a proposal,
> there could be other  ..)
> >
> > - Modify stream_create spec following:
> >
> > struct rte_comp_op_pool_private {
> > 	uint16_t user_size;
> > 	/**< Size of private user data with each operation. */
> > 	uint16_t dev_priv_data_size;
> > 	/**< Size of device private data with each operation. */
> > };
> >
> > int rte_comp_stream_create(uint32_t dev_id,
> >                                                rte_comp_session *sess,
> >                                                void **stream,
> >                                                rte_mempool *op_pool /* optional */);
> >
> > - This will map to PMD ops stream_create(). Here if op_pool != NULL, then
> PMD can re-allocate op pool
> > with new elt_size = rte_comp_op + op_private->user_size +
> dev_private_stream_size so that stream reside
> > in op private area.
> >
> > Or, we may add another API altogether to do stream allocations from
> op_pool such as
> > rte_comp_stream_create_from_pool(...);
> >
> > I will issue RFC Doc v2 after your feedback these.
> [Fiona] Would it be preferable to issue v2 even with design so far? To give
> Ahmed and the community a better doc to review. Also I'm on holidays until
> after Christmas so will not get back to this til 8th January.
> 
> 
> 
> > Thanks
> > Shally
> >
> > > > >
> > > > >
> > > > > Code artefacts:
> > > > >
> > > > > enum rte_comp_op_type {
> > > > >     RTE_COMP_OP_STATELESS,
> > > > >     RTE_COMP_OP_STATEFUL
> > > > > }
> > > > >
> > > > > Add following to rte_comp_op:
> > > > >     enum rte_comp_op_type op_type;
> > > > >     void * stream_private;
> > > > >     /* location where PMD maintains stream state – only required if
> > > op_type is
> > > > > STATEFUL, else set to NULL */
> > > > >
> > > > > As size of stream data will vary depending on PMD, each PMD or
> device
> > > > > should allocate & manage its own mempool. Associated APIs are:
> > > > > rte_comp_stream_create(uint8_t dev_id, rte_comp_session *sess,
> void
> > > **
> > > > > stream);
> > > > > /* This should alloc a stream from the device’s mempool and initialise
> it.
> > > This
> > > > > handle will be passed to the PMD with every op in the stream. Q.
> Should
> > > > > qp_id also be added, with constraint that all ops in the same stream
> > > should
> > > > > be sent to the same qp?  */
> > > > > rte_comp_stream_free(uint8_t dev_id, void * stream);
> > > > > /* This should clear the stream and return it to the device’s mempool
> */
> > > > >
> > > > > All ops are enqueued/dequeued to device & qp using same
> > > > > rte_compressdev_enqueue_burst()/…dequeue_burst;
> > > > >
> > > > > Re flush flags, stateful stream would start with op.flush = NONE or
> SYNC
> > > and
> > > > > end with FULL or FINAL
> > > > > STATELESS ops would just use either FULL or FINAL
> > > > >
> > > > >
> > > > > Let me know if you want to set up a meeting - it might be a more
> > > effective
> > > > > way to
> > > > > arrive at an API that works for all PMDs.
> > > > >
> > > > > I'll send out a v3 today with above plus updates based on all the other
> > > > > feedback.
> > > > >
> > > > > Regards,
> > > > > Fiona



More information about the dev mailing list