[dpdk-dev] [RFC PATCH] eventdev: add buffered enqueue and flush APIs

Eads, Gage gage.eads at intel.com
Mon Dec 12 18:56:32 CET 2016



>  -----Original Message-----
>  From: Jerin Jacob [mailto:jerin.jacob at caviumnetworks.com]
>  Sent: Wednesday, December 7, 2016 10:42 PM
>  To: Eads, Gage <gage.eads at intel.com>
>  Cc: dev at dpdk.org; Richardson, Bruce <bruce.richardson at intel.com>; Van
>  Haaren, Harry <harry.van.haaren at intel.com>; hemant.agrawal at nxp.com
>  Subject: Re: [RFC PATCH] eventdev: add buffered enqueue and flush APIs
>  
>  On Mon, Dec 05, 2016 at 11:30:46PM +0000, Eads, Gage wrote:
>  >
>  > > On Dec 3, 2016, at 5:18 AM, Jerin Jacob
>  <jerin.jacob at caviumnetworks.com> wrote:
>  > >
>  > >> On Fri, Dec 02, 2016 at 01:45:56PM -0600, Gage Eads wrote:
>  > >> This commit adds buffered enqueue functionality to the eventdev API.
>  > >> It is conceptually similar to the ethdev API's tx buffering,
>  > >> however with a smaller API surface and no dropping of events.
>  > >
>  > > Hello Gage,
>  > > Different implementation may have different strategies to hold the buffers.
>  >
>  > A benefit of inlining the buffering logic in the header is that we avoid the
>  overhead of entering the PMD for what is a fairly simple operation (common
>  case: add event to an array, increment counter). If we make this
>  implementation-defined (i.e. use PMD callbacks), we lose that benefit.
>  In general, I agree from the system perspective. But, few general issues with
>  eventdev integration part,
>  
>  1) What if the burst has ATOMIC flows and if we are NOT en-queuing to the
>  implementation then other event ports won't get the packets from the same
>  ATOMIC tag ? BAD. Right?

I'm not sure what scenario you're describing here. The buffered (as implemented in my patch) and non-buffered enqueue operations are functionally the same (as long as the buffer is flushed), the difference lies in when the events are moved from the application level to the PMD.

>  2) At least, In our HW implementation, The event buffer strategy is more like, if
>  you enqueue to HW then ONLY you get the events from dequeue provided if op
>  == RTE_EVENT_OP_FORWARD.So it will create deadlock.i.e application cannot
>  hold the events with RTE_EVENT_OP_FORWARD

If I'm reading this correctly, you're concerned that buffered events can result in deadlock if they're not flushed. Whether the buffering is done in the app itself, inline in the API, or in the PMDs, not flushing the buffer is an application bug. E.g. the app could be fixed by flushing its enqueue buffer after processing every burst dequeued event set, or only if dequeue returns 0 events.

>  3) So considering the above case there is nothing like flush for us
>  4) In real high throughput benchmark case, we will get the packets at the rate
>  of max burst and then we always needs to memcpy before we flush.
>  Otherwise there will be ordering issue as burst can get us the packet from
>  different flows(unlike polling mode)

I take it you're referring to the memcpy in the patch, and not an additional memcpy? At any rate, I'm hoping that SIMD instructions can optimize the 16B event copy.

>  
>  >
>  > > and some does not need to hold the buffers if it is DDR backed.
>  >
>  > Though DDR-backed hardware doesn't need to buffer in software, doing so
>  would reduce the software overhead of enqueueing. Compared to N individual
>  calls to enqueue, buffering N events then calling enqueue burst once can
>  benefit from amortized (or parallelized) PMD-specific bookkeeping and error-
>  checking across the set of events, and will definitely benefit from the amortized
>  function call overhead and better I-cache behavior. (Essentially this is VPP from
>  the fd.io project.) This should result in higher overall event throughout
>  (agnostic of the underlying device).
>  
>  See above. I am not against burst processing in "application".
>  The flush does not make sense for us in HW perspective and it is costly for us if
>  we trying generalize it.
>  

Besides the data copy that buffering requires, are there additional costs from your perspective?

>  >
>  > I'm skeptical that other buffering strategies would emerge, but I can only
>  speculate on Cavium/NXP/etc. NPU software.
>  i>
>  > > IHMO, This may not be the candidate for common code. I guess you can
>  > > move this to driver side and abstract under SW driver's enqueue_burst.
>  > >
>  >
>  > I don't think that will work without adding a flush API, otherwise we could
>  have indefinitely buffered events. I see three ways forward:
>  
>  I agree. More portable way is to move the "flush" to the implementation and
>  "flush"
>  whenever it makes sense to PMD.
>  
>  >
>  > - The proposed approach
>  > - Add the proposed functions but make them implementation-specific.
>  > - Require the application to write its own buffering logic (i.e. no
>  > API change)
>  
>  I think, If the additional function call overhead cost is too much for SW
>  implementation then we can think of implementation-specific API or custom
>  application flow based on SW driver.
>  
>  But I am not fan of that(but tempted do now a days), If we take that route, we
>  have truckload of custom implementation specific API and now we try to hide
>  all black magic under enqueue/dequeue to make it portable at some expense.

Agreed, it's not worth special-casing the API with this relatively minor addition.

Thanks,
Gage


More information about the dev mailing list