[dpdk-dev] [RFC 1/3] eventdev: allow for event devices requiring maintenance

Jerin Jacob jerinjacobk at gmail.com
Thu Apr 16 19:19:08 CEST 2020


On Tue, Apr 14, 2020 at 11:25 PM Mattias Rönnblom
<mattias.ronnblom at ericsson.com> wrote:
>
> On 2020-04-14 18:15, Jerin Jacob wrote:
> > On Tue, Apr 14, 2020 at 9:27 PM Mattias Rönnblom
> > <mattias.ronnblom at ericsson.com> wrote:
> >> On 2020-04-10 15:00, Jerin Jacob wrote:
> >>> On Thu, Apr 9, 2020 at 7:32 PM Mattias Rönnblom
> >>> <mattias.ronnblom at ericsson.com> wrote:
> >>>> On 2020-04-09 15:32, Jerin Jacob wrote:
> >>>>> On Thu, Apr 9, 2020 at 5:51 PM Mattias Rönnblom
> >>>>> <mattias.ronnblom at ericsson.com> wrote:
> >>>>>> On 2020-04-08 21:36, Jerin Jacob wrote:
> >>>>>>> On Wed, Apr 8, 2020 at 11:27 PM Mattias Rönnblom
> >>>>>>> <mattias.ronnblom at ericsson.com> wrote:
> >>>>>>>> Extend Eventdev API to allow for event devices which require various
> >>>>>>>> forms of internal processing to happen, even when events are not
> >>>>>>>> enqueued to or dequeued from a port.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom at ericsson.com>
> >>>>>>>> ---
> >>>>>>>>      lib/librte_eventdev/rte_eventdev.h     | 65 ++++++++++++++++++++++++++
> >>>>>>>>      lib/librte_eventdev/rte_eventdev_pmd.h | 14 ++++++
> >>>>>>>>      2 files changed, 79 insertions(+)
> >>>>>>>>
> >>>>>>>> diff --git a/lib/librte_eventdev/rte_eventdev.h b/lib/librte_eventdev/rte_eventdev.h
> >>>>>>>> index 226f352ad..d69150792 100644
> >>>>>>>> --- a/lib/librte_eventdev/rte_eventdev.h
> >>>>>>>> +++ b/lib/librte_eventdev/rte_eventdev.h
> >>>>>>>> @@ -289,6 +289,15 @@ struct rte_event;
> >>>>>>>>       * single queue to each port or map a single queue to many port.
> >>>>>>>>       */
> >>>>>>>>
> >>>>>>>> +#define RTE_EVENT_DEV_CAP_REQUIRES_MAINT (1ULL << 9)
> >>>>>>>> +/**< Event device requires calls to rte_event_maintain() during
> >>>>>>> This scheme would call for DSW specific API handling in fastpath.
> >>>>>> Initially this would be so, but buffering events might yield performance
> >>>>>> benefits for more event devices than DSW.
> >>>>>>
> >>>>>>
> >>>>>> In an application, it's often convenient, but sub-optimal from a
> >>>>>> performance point of view, to do single-event enqueue operations. The
> >>>>>> alternative is to use an application-level buffer, and the flush this
> >>>>>> buffer with rte_event_enqueue_burst(). If you allow the event device to
> >>>>>> buffer, you get the simplicity of single-event enqueue operations, but
> >>>>>> without taking any noticeable performance hit.
> >>>>> IMO, It is better to aggregate the burst by the application,  as sending
> >>>>> event by event to the driver to aggregate has performance due to cost
> >>>>> function pointer overhead.
> >>>> That's a very slight overhead - but for optimal performance, sure. It'll
> >>>> come at a cost in terms of code complexity. Just look at the adapters.
> >>>> They do this already. I think some applications are ready to take the
> >>>> extra 5-10 clock cycles or so it'll cost them to do the function call
> >>>> (provided the event device had buffering support).
> >>> So Is there any advantage of moving aggregation logic to PMD? it is costly.
> >>
> >> What do you mean by aggregation logic?
> > aggregation == buffering.
>
>
> The reason I put it into DSW to begin with was that it yielded a
> significant performance benefit, for situations where the application
> would enqueue() few or even single events per call. For DSW it will
> translate to lower DPDK event ring overhead. I would imagine it could
> improve performance for hardware-based event devices as well.

Yes. we are already doing this in application. It makes sense for buffering.

>
>
> >>
> >>>>> Another concern is the frequency of calling rte_event_maintain() function by
> >>>>> the application, as the timing requirements will vary differently by
> >>>>> the driver to driver and application to application.
> >>>>> IMO, It is not portable and I believe the application should not be
> >>>>> aware of those details. If the driver needs specific maintenance
> >>>>> function for any other reason then better to use DPDK SERVICE core infra.
> >>>> The only thing the application needs to be aware of, is that it needs to
> >>>> call rte_event_maintain() as often as it would have called dequeue() in
> >>>> your "typical worker" example. To make sure this call is cheap-enough is
> >>>> up to the driver, and this needs to hold true for all event devices that
> >>>> needs maintenance.
> >>> Why not rte_event_maintain() can't do either in dequeue() or enqueue()
> >>> in the driver context? Either one of them has to be called
> >>> periodically by application
> >>> in any case?
> >>
> >> No, producer-only ports can go idle for long times. For applications
> >> that don't "go idle" need not worry about the maintain function.
> > If I understand it correctly, If the worker does not call enqueue() or dequeue()
> > for a long time then maintain() needs to be called by the application.
> >
> > That's where I concern with this API. What is the definition of long
> > time frame?(ns or ticks?)
> > Will it be changing driver to driver and arch to arch? And it is
> > leaking the driver requirements to the application.
> >
>
> It's difficult to quantify exactly, but the rate should be on the same
> order of magnitude as you would call dequeue() on a consumer-type worker

The challenge is if another driver has a different requirement for maintain()
interms of frequecey if is a public API then how we will abstract?

> port. All the RTE_EVENT_DEV_CAP_* are essentially represents such
> leakage, where the event device driver and/or the underlying hardware
> express various capabilities and limitations.

I agree. But in fastpath, we do not bring any such _functional_ difference.
If it is on a slow path then no issue at all.


>
>
> I'm not sure it needs to be much more complicated for the application to
> handle than the change to the event adapters I included in the patch.
> There, it boils down the service function call rate, which would be high
> during low load (causing buffers to be flush quickly etc), and a little
> lower during high lcore load.
>
>
> >>
> >>>> If you plan to use a non-buffering hardware device driver or a soft,
> >>>> centralized scheduler that doesn't need this, it will also not set the
> >>>> flag, and thus the application needs not care about the
> >>>> rte_event_maintain() function. DPDK code such as the eventdev adapters
> >>>> do need to care, but the increase in complexity is slight, and the cost
> >>>> of calling rte_maintain_event() on a maintenance-free devices is very
> >>>> low (since the then-NULL function pointer is in the eventdev struct,
> >>>> likely on a cache-line already dragged in).
> >>>>
> >>>>
> >>>> Unfortunately, DPDK doesn't have a per-core delayed-work mechanism.
> >>>> Flushing event buffers (and other DSW "background work") can't be done
> >>>> on a service core, since they would work on non-MT-safe data structures
> >>>> on the worker thread's event ports.
> >>> Yes. Otherwise, DSW needs to update to support MT safe.
> >>
> >> I haven't been looking at this in detail, but I suspect it will be both
> >> complex and not very performant. One of problems that need to be solved
> >> in such a solution, is the "pausing" of flows during migration. All
> >> participating lcores needs to ACK that a flow is paused.
> > Could you share the patch on the details on how much it costs?
>
>
> I don't have a ready-made solution for making lcore ports thread-safe.

OK

>
>
> >
> >>
> >>>>>>>> + * periods when neither rte_event_dequeue_burst() nor
> >>>>>>> The typical worker thread will be
> >>>>>>> while (1) {
> >>>>>>>                     rte_event_dequeue_burst();
> >>>>>>>                      ..proess..
> >>>>>>>                     rte_event_enqueue_burst();
> >>>>>>> }
> >>>>>>> If so, Why DSW driver can't do the maintenance in driver context in
> >>>>>>> dequeue() call.
> >>>>>>>
> >>>>>> DSW already does maintenance on dequeue, and works well in the above
> >>>>>> scenario. The typical worker does not need to care about the
> >>>>>> rte_event_maintain() functions, since it dequeues events on a regular basis.
> >>>>>>
> >>>>>>
> >>>>>> What this RFC addresses is the more atypical (but still fairly common)
> >>>>>> case of a port being neither dequeued to or enqueued from on a regular
> >>>>>> basis. The timer and ethernet rx adapters are examples of such.
> >>>>> If it is an Adapter specific use case problem then maybe, we have
> >>>>> an option to fix the problem in adapter specific API usage or in that area.
> >>>>>
> >>>> It's not adapter specific, I think. There might be producer-only ports,
> >>>> for example, which doesn't provide a constant stream of events, but
> >>>> rather intermittent bursts. A traffic generator is one example of such
> >>>> an application, and there might be other, less synthetic ones as well.
> >>> In that case, the application knows the purpose of the eventdev port.
> >>> Is changing eventdev spec to configure "port" or "queue" for that use
> >>> case help? Meaning, DSW or
> >>> Any driver can get the hint and change the function pointers
> >>> accordingly for fastpath.
> >>> For instance, do maintenance on enqueue() for such ports or so.
> >>
> >> This is what DSW does already today. A dequeue() call with a zero-length
> >> event array serves the purpose of rte_event_maintain(). It's a bit of a
> >> hack, in my opinion.
> > I agree that it is the hack.
> >
> > One more concern we have we are adding API for the new driver requirements and
> > not updating the example application. Sharing the example application
> > patch would
> > enable us to understand the cost and usage.(i.e Cost wrt to other SW drivers)
>
>
> Good point.

I suggest sharing patch based on existing app/adapter usage, based on that, we
can analyze more on abstraction and cost analysis.



>
>
>
> >
> >>
> >>>>>>>> + * rte_event_enqueue_burst() are called on a port. This will allow the
> >>>>>>>> + * event device to perform internal processing, such as flushing
> >>>>>>>> + * buffered events, return credits to a global pool, or process
> >>>>>>>> + * signaling related to load balancing.
> >>>>>>>> + */
> >>
>


More information about the dev mailing list