[dpdk-dev] [RFC v2] doc compression API for DPDK

Verma, Shally Shally.Verma at cavium.com
Thu Feb 1 06:40:48 CET 2018



>-----Original Message-----
>From: Trahe, Fiona [mailto:fiona.trahe at intel.com]
>Sent: 01 February 2018 00:33
>To: Ahmed Mansour <ahmed.mansour at nxp.com>; Verma, Shally <Shally.Verma at cavium.com>; dev at dpdk.org
>Cc: Athreya, Narayana Prasad <NarayanaPrasad.Athreya at cavium.com>; Gupta, Ashish <Ashish.Gupta at cavium.com>; Sahu, Sunila
><Sunila.Sahu at cavium.com>; De Lara Guarch, Pablo <pablo.de.lara.guarch at intel.com>; Challa, Mahipal
><Mahipal.Challa at cavium.com>; Jain, Deepak K <deepak.k.jain at intel.com>; Hemant Agrawal <hemant.agrawal at nxp.com>; Roy
>Pledge <roy.pledge at nxp.com>; Youri Querry <youri.querry_1 at nxp.com>; Trahe, Fiona <fiona.trahe at intel.com>
>Subject: RE: [RFC v2] doc compression API for DPDK
>
>Hi Ahmed, Shally,
>
>///snip///
>> >>>>> D.1.1 Stateless and OUT_OF_SPACE
>> >>>>> ------------------------------------------------
>> >>>>> OUT_OF_SPACE is a condition when output buffer runs out of space
>> >> and
>> >>>> where PMD still has more data to produce. If PMD run into such
>> >> condition,
>> >>>> then it's an error condition in stateless processing.
>> >>>>> In such case, PMD resets itself and return with status
>> >>>> RTE_COMP_OP_STATUS_OUT_OF_SPACE with produced=consumed=0
>> >> i.e.
>> >>>> no input read, no output written.
>> >>>>> Application can resubmit an full input with larger output buffer size.
>> >>>> [Ahmed] Can we add an option to allow the user to read the data that
>> >> was
>> >>>> produced while still reporting OUT_OF_SPACE? this is mainly useful for
>> >>>> decompression applications doing search.
>> >>> [Shally] It is there but applicable for stateful operation type (please refer to
>> >> handling out_of_space under
>> >>> "Stateful Section").
>> >>> By definition, "stateless" here means that application (such as IPCOMP)
>> >> knows maximum output size
>> >>> guaranteedly and ensure that uncompressed data size cannot grow more
>> >> than provided output buffer.
>> >>> Such apps can submit an op with type = STATELESS and provide full input,
>> >> then PMD assume it has
>> >>> sufficient input and output and thus doesn't need to maintain any contexts
>> >> after op is processed.
>> >>> If application doesn't know about max output size, then it should process it
>> >> as stateful op i.e. setup op
>> >>> with type = STATEFUL and attach a stream so that PMD can maintain
>> >> relevant context to handle such
>> >>> condition.
>> >> [Fiona] There may be an alternative that's useful for Ahmed, while still
>> >> respecting the stateless concept.
>> >> In Stateless case where a PMD reports OUT_OF_SPACE in decompression
>> >> case
>> >> it could also return consumed=0, produced = x, where x>0. X indicates the
>> >> amount of valid data which has
>> >>  been written to the output buffer. It is not complete, but if an application
>> >> wants to search it it may be sufficient.
>> >> If the application still wants the data it must resubmit the whole input with a
>> >> bigger output buffer, and
>> >>  decompression will be repeated from the start, it
>> >>  cannot expect to continue on as the PMD has not maintained state, history
>> >> or data.
>> >> I don't think there would be any need to indicate this in capabilities, PMDs
>> >> which cannot provide this
>> >> functionality would always return produced=consumed=0, while PMDs which
>> >> can could set produced > 0.
>> >> If this works for you both, we could consider a similar case for compression.
>> >>
>> > [Shally] Sounds Fine to me. Though then in that case, consume should also be updated to actual
>> consumed by PMD.
>> > Setting consumed = 0 with produced > 0 doesn't correlate.
>> [Ahmed]I like Fiona's suggestion, but I also do not like the implication
>> of returning consumed = 0. At the same time returning consumed = y
>> implies to the user that it can proceed from the middle. I prefer the
>> consumed = 0 implementation, but I think a different return is needed to
>> distinguish it from OUT_OF_SPACE that the use can recover from. Perhaps
>> OUT_OF_SPACE_RECOVERABLE and OUT_OF_SPACE_TERMINATED. This also allows
>> future PMD implementations to provide recover-ability even in STATELESS
>> mode if they so wish. In this model STATELESS or STATEFUL would be a
>> hint for the PMD implementation to make optimizations for each case, but
>> it does not force the PMD implementation to limit functionality if it
>> can provide recover-ability.
>[Fiona] So you're suggesting the following:
>OUT_OF_SPACE - returned only on stateful operation. Not an error. Op.produced
>    can be used and next op in stream should continue on from op.consumed+1.
>OUT_OF_SPACE_TERMINATED - returned only on stateless operation.
>    Error condition, no recovery possible.
>    consumed=produced=0. Application must resubmit all input data with
>    a bigger output buffer.
>OUT_OF_SPACE_RECOVERABLE - returned only on stateless operation, some recovery possible.
>     - consumed = 0, produced > 0. Application must resubmit all input data with
>        a bigger output buffer. However in decompression case, data up to produced
>        in dst buffer may be inspected/searched. Never happens in compression
>        case as output data would be meaningless.
>     - consumed > 0, produced > 0. PMD has stored relevant state and history and so
>        can convert to stateful, using op.produced and continuing from consumed+1.
>I don't expect our PMDs to use this last case, but maybe this works for others?
>I'm not convinced it's not just adding complexity. It sounds like a version of stateful
>without a stream, and maybe less efficient?
>If so should it respect the FLUSH flag? Which would have been FULL or FINAL in the op.
>Or treat it as FLUSH_NONE or SYNC? I don't know why an application would not
>simply have submitted a STATEFUL request if this is the behaviour it wants?
>
>
>
>
>> >
>> >>>>> D.2 Compression API Stateful operation
>> >>>>> ----------------------------------------------------------
>> >>>>>  A Stateful operation in DPDK compression means application invokes
>> >>>> enqueue burst() multiple times to process related chunk of data either
>> >>>> because
>> >>>>> - Application broke data into several ops, and/or
>> >>>>> - PMD ran into out_of_space situation during input processing
>> >>>>>
>> >>>>> In case of either one or all of the above conditions, PMD is required to
>> >>>> maintain state of op across enque_burst() calls and
>> >>>>> ops are setup with op_type RTE_COMP_OP_STATEFUL, and begin with
>> >>>> flush value = RTE_COMP_NO/SYNC_FLUSH and end at flush value
>> >>>> RTE_COMP_FULL/FINAL_FLUSH.
>> >>>>> D.2.1 Stateful operation state maintenance
>> >>>>> ---------------------------------------------------------------
>> >>>>> It is always an ideal expectation from application that it should parse
>> >>>> through all related chunk of source data making its mbuf-chain and
>> >> enqueue
>> >>>> it for stateless processing.
>> >>>>> However, if it need to break it into several enqueue_burst() calls, then
>> >> an
>> >>>> expected call flow would be something like:
>> >>>>> enqueue_burst( |op.no_flush |)
>> >>>> [Ahmed] The work is now in flight to the PMD.The user will call dequeue
>> >>>> burst in a loop until all ops are received. Is this correct?
>> >>>>
>> >>>>> deque_burst(op) // should dequeue before we enqueue next
>> >>> [Shally] Yes. Ideally every submitted op need to be dequeued. However
>> >> this illustration is specifically in
>> >>> context of stateful op processing to reflect if a stream is broken into
>> >> chunks, then each chunk should be
>> >>> submitted as one op at-a-time with type = STATEFUL and need to be
>> >> dequeued first before next chunk is
>> >>> enqueued.
>> >>>
>> >>>>> enqueue_burst( |op.no_flush |)
>> >>>>> deque_burst(op) // should dequeue before we enqueue next
>> >>>>> enqueue_burst( |op.full_flush |)
>> >>>> [Ahmed] Why now allow multiple work items in flight? I understand that
>> >>>> occasionaly there will be OUT_OF_SPACE exception. Can we just
>> >> distinguish
>> >>>> the response in exception cases?
>> >>> [Shally] Multiples ops are allowed in flight, however condition is each op in
>> >> such case is independent of
>> >>> each other i.e. belong to different streams altogether.
>> >>> Earlier (as part of RFC v1 doc) we did consider the proposal to process all
>> >> related chunks of data in single
>> >>> burst by passing them as ops array but later found that as not-so-useful for
>> >> PMD handling for various
>> >>> reasons. You may please refer to RFC v1 doc review comments for same.
>> >> [Fiona] Agree with Shally. In summary, as only one op can be processed at a
>> >> time, since each needs the
>> >> state of the previous, to allow more than 1 op to be in-flight at a time would
>> >> force PMDs to implement internal queueing and exception handling for
>> >> OUT_OF_SPACE conditions you mention.
>> [Ahmed] But we are putting the ops on qps which would make them
>> sequential. Handling OUT_OF_SPACE conditions would be a little bit more
>> complex but doable.
>[Fiona] In my opinion this is not doable, could be very inefficient.
>There may be many streams.
>The PMD would have to have an internal queue per stream so
>it could adjust the next src offset and length in the OUT_OF_SPACE case.
>And this may ripple back though all subsequent ops in the stream as each
>source len is increased and its dst buffer is not big enough.
>
>> The question is this mode of use useful for real
>> life applications or would we be just adding complexity? The technical
>> advantage of this is that processing of Stateful ops is interdependent
>> and PMDs can take advantage of caching and other optimizations to make
>> processing related ops much faster than switching on every op. PMDs have
>> maintain state of more than 32 KB for DEFLATE for every stream.
>> >> If the application has all the data, it can put it into chained mbufs in a single
>> >> op rather than
>> >> multiple ops, which avoids pushing all that complexity down to the PMDs.
>> [Ahmed] I think that your suggested scheme of putting all related mbufs
>> into one op may be the best solution without the extra complexity of
>> handling OUT_OF_SPACE cases, while still allowing the enqueuer extra
>> time If we have a way of marking mbufs as ready for consumption. The
>> enqueuer may not have all the data at hand but can enqueue the op with a
>> couple of empty mbus marked as not ready for consumption. The enqueuer
>> will then update the rest of the mbufs to ready for consumption once the
>> data is added. This introduces a race condition. A second flag for each
>> mbuf can be updated by the PMD to indicate that it processed it or not.
>> This way in cases where the PMD beat the application to the op, the
>> application will just update the op to point to the first unprocessed
>> mbuf and resend it to the PMD.
>[Fiona] This doesn't sound safe. You want to add data to a stream after you've
>enqueued the op. You would have to write to op.src.length at a time when the PMD
>might be reading it. Sounds like a lock would be necessary.
>Once the op has been enqueued, my understanding is its ownership is handed
>over to the PMD and the application should not touch it until it has been dequeued.
>I don't think it's a good idea to change this model.
>Can't the application just collect a stream of data in chained mbufs until it has
>enough to send an op, then construct the op and while waiting for that op to
>complete, accumulate the next batch of chained mbufs? Only construct the next op
>after the previous one is complete, based on the result of the previous one.
>
>
>> >>>> [Ahmed] Related to OUT_OF_SPACE. What status does the user recieve
>> >> in a
>> >>>> decompression case when the end block is encountered before the end
>> >> of
>> >>>> the input? Does the PMD continue decomp? Does it stop there and
>> >> return
>> >>>> the stop index?
>> >>>>
>> >>> [Shally] Before I could answer this, please help me understand your use
>> >> case . When you say  "when the
>> >>> end block is encountered before the end of the input?" Do you mean -
>> >>> "Decompressor process a final block (i.e. has BFINAL=1 in its header) and
>> >> there's some footer data after
>> >>> that?" Or
>> >>> you mean "decompressor process one block and has more to process till its
>> >> final block?"
>> >>> What is "end block" and "end of input" reference here?
>> [Ahmed] I meant BFINAL=1 by end block. The end of input is the end of
>> the input length.
>> e.g.
>> | input
>> length--------------------------------------------------------------|
>> |--data----data----data------data-------BFINAL-footer-unrelated data|
>> >>>
>[Fiona] I propose if BFINAL bit is detected before end of input
>the decompression should stop. In this case consumed will be < src.length.
>produced will be < dst buffer size. Do we need an extra STATUS response?
>STATUS_BFINAL_DETECTED  ?
[Shally] @fiona, I assume you mean here decompressor stop after processing Final block right? And if yes, and if it can process that final block successfully/unsuccessfully, then status could simply be SUCCESS/FAILED.
I don't see need of specific return code for this use case. Just to share, in past, we have practically run into such cases with boost lib, and decompressor has simply worked this way.

>Only thing I don't like this is it can impact on performance, as normally
>we can just look for STATUS == SUCCESS. Anything else should be an exception.
>Now the application would have to check for SUCCESS || BFINAL_DETECTED every time.
>Do you have a suggestion on how we should handle this?
>


More information about the dev mailing list