[dpdk-dev] [RFC 1/3] eventdev: allow for event devices requiring maintenance

Jerin Jacob jerinjacobk at gmail.com
Tue Apr 14 18:15:20 CEST 2020


On Tue, Apr 14, 2020 at 9:27 PM Mattias Rönnblom
<mattias.ronnblom at ericsson.com> wrote:
>
> On 2020-04-10 15:00, Jerin Jacob wrote:
> > On Thu, Apr 9, 2020 at 7:32 PM Mattias Rönnblom
> > <mattias.ronnblom at ericsson.com> wrote:
> >> On 2020-04-09 15:32, Jerin Jacob wrote:
> >>> On Thu, Apr 9, 2020 at 5:51 PM Mattias Rönnblom
> >>> <mattias.ronnblom at ericsson.com> wrote:
> >>>> On 2020-04-08 21:36, Jerin Jacob wrote:
> >>>>> On Wed, Apr 8, 2020 at 11:27 PM Mattias Rönnblom
> >>>>> <mattias.ronnblom at ericsson.com> wrote:
> >>>>>> Extend Eventdev API to allow for event devices which require various
> >>>>>> forms of internal processing to happen, even when events are not
> >>>>>> enqueued to or dequeued from a port.
> >>>>>>
> >>>>>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom at ericsson.com>
> >>>>>> ---
> >>>>>>     lib/librte_eventdev/rte_eventdev.h     | 65 ++++++++++++++++++++++++++
> >>>>>>     lib/librte_eventdev/rte_eventdev_pmd.h | 14 ++++++
> >>>>>>     2 files changed, 79 insertions(+)
> >>>>>>
> >>>>>> diff --git a/lib/librte_eventdev/rte_eventdev.h b/lib/librte_eventdev/rte_eventdev.h
> >>>>>> index 226f352ad..d69150792 100644
> >>>>>> --- a/lib/librte_eventdev/rte_eventdev.h
> >>>>>> +++ b/lib/librte_eventdev/rte_eventdev.h
> >>>>>> @@ -289,6 +289,15 @@ struct rte_event;
> >>>>>>      * single queue to each port or map a single queue to many port.
> >>>>>>      */
> >>>>>>
> >>>>>> +#define RTE_EVENT_DEV_CAP_REQUIRES_MAINT (1ULL << 9)
> >>>>>> +/**< Event device requires calls to rte_event_maintain() during
> >>>>> This scheme would call for DSW specific API handling in fastpath.
> >>>> Initially this would be so, but buffering events might yield performance
> >>>> benefits for more event devices than DSW.
> >>>>
> >>>>
> >>>> In an application, it's often convenient, but sub-optimal from a
> >>>> performance point of view, to do single-event enqueue operations. The
> >>>> alternative is to use an application-level buffer, and the flush this
> >>>> buffer with rte_event_enqueue_burst(). If you allow the event device to
> >>>> buffer, you get the simplicity of single-event enqueue operations, but
> >>>> without taking any noticeable performance hit.
> >>> IMO, It is better to aggregate the burst by the application,  as sending
> >>> event by event to the driver to aggregate has performance due to cost
> >>> function pointer overhead.
> >>
> >> That's a very slight overhead - but for optimal performance, sure. It'll
> >> come at a cost in terms of code complexity. Just look at the adapters.
> >> They do this already. I think some applications are ready to take the
> >> extra 5-10 clock cycles or so it'll cost them to do the function call
> >> (provided the event device had buffering support).
> > So Is there any advantage of moving aggregation logic to PMD? it is costly.
>
>
> What do you mean by aggregation logic?

aggregation == buffering.

>
>
> >
> >>
> >>> Another concern is the frequency of calling rte_event_maintain() function by
> >>> the application, as the timing requirements will vary differently by
> >>> the driver to driver and application to application.
> >>> IMO, It is not portable and I believe the application should not be
> >>> aware of those details. If the driver needs specific maintenance
> >>> function for any other reason then better to use DPDK SERVICE core infra.
> >>
> >> The only thing the application needs to be aware of, is that it needs to
> >> call rte_event_maintain() as often as it would have called dequeue() in
> >> your "typical worker" example. To make sure this call is cheap-enough is
> >> up to the driver, and this needs to hold true for all event devices that
> >> needs maintenance.
> > Why not rte_event_maintain() can't do either in dequeue() or enqueue()
> > in the driver context? Either one of them has to be called
> > periodically by application
> > in any case?
>
>
> No, producer-only ports can go idle for long times. For applications
> that don't "go idle" need not worry about the maintain function.

If I understand it correctly, If the worker does not call enqueue() or dequeue()
for a long time then maintain() needs to be called by the application.

That's where I concern with this API. What is the definition of long
time frame?(ns or ticks?)
Will it be changing driver to driver and arch to arch? And it is
leaking the driver requirements to the application.


>
>
> >
> >>
> >> If you plan to use a non-buffering hardware device driver or a soft,
> >> centralized scheduler that doesn't need this, it will also not set the
> >> flag, and thus the application needs not care about the
> >> rte_event_maintain() function. DPDK code such as the eventdev adapters
> >> do need to care, but the increase in complexity is slight, and the cost
> >> of calling rte_maintain_event() on a maintenance-free devices is very
> >> low (since the then-NULL function pointer is in the eventdev struct,
> >> likely on a cache-line already dragged in).
> >>
> >>
> >> Unfortunately, DPDK doesn't have a per-core delayed-work mechanism.
> >> Flushing event buffers (and other DSW "background work") can't be done
> >> on a service core, since they would work on non-MT-safe data structures
> >> on the worker thread's event ports.
> > Yes. Otherwise, DSW needs to update to support MT safe.
>
>
> I haven't been looking at this in detail, but I suspect it will be both
> complex and not very performant. One of problems that need to be solved
> in such a solution, is the "pausing" of flows during migration. All
> participating lcores needs to ACK that a flow is paused.

Could you share the patch on the details on how much it costs?

>
>
> >
> >>
> >>>>>> + * periods when neither rte_event_dequeue_burst() nor
> >>>>> The typical worker thread will be
> >>>>> while (1) {
> >>>>>                    rte_event_dequeue_burst();
> >>>>>                     ..proess..
> >>>>>                    rte_event_enqueue_burst();
> >>>>> }
> >>>>> If so, Why DSW driver can't do the maintenance in driver context in
> >>>>> dequeue() call.
> >>>>>
> >>>> DSW already does maintenance on dequeue, and works well in the above
> >>>> scenario. The typical worker does not need to care about the
> >>>> rte_event_maintain() functions, since it dequeues events on a regular basis.
> >>>>
> >>>>
> >>>> What this RFC addresses is the more atypical (but still fairly common)
> >>>> case of a port being neither dequeued to or enqueued from on a regular
> >>>> basis. The timer and ethernet rx adapters are examples of such.
> >>> If it is an Adapter specific use case problem then maybe, we have
> >>> an option to fix the problem in adapter specific API usage or in that area.
> >>>
> >> It's not adapter specific, I think. There might be producer-only ports,
> >> for example, which doesn't provide a constant stream of events, but
> >> rather intermittent bursts. A traffic generator is one example of such
> >> an application, and there might be other, less synthetic ones as well.
> > In that case, the application knows the purpose of the eventdev port.
> > Is changing eventdev spec to configure "port" or "queue" for that use
> > case help? Meaning, DSW or
> > Any driver can get the hint and change the function pointers
> > accordingly for fastpath.
> > For instance, do maintenance on enqueue() for such ports or so.
>
>
> This is what DSW does already today. A dequeue() call with a zero-length
> event array serves the purpose of rte_event_maintain(). It's a bit of a
> hack, in my opinion.

I agree that it is the hack.

One more concern we have we are adding API for the new driver requirements and
not updating the example application. Sharing the example application
patch would
enable us to understand the cost and usage.(i.e Cost wrt to other SW drivers)



>
>
> >
> >>
> >>>>>> + * rte_event_enqueue_burst() are called on a port. This will allow the
> >>>>>> + * event device to perform internal processing, such as flushing
> >>>>>> + * buffered events, return credits to a global pool, or process
> >>>>>> + * signaling related to load balancing.
> >>>>>> + */
>
>


More information about the dev mailing list