[dpdk-dev] [PATCH 1/4] eventdev: introduce event driven programming model

Bruce Richardson bruce.richardson at intel.com
Mon Nov 28 10:16:10 CET 2016


On Sat, Nov 26, 2016 at 08:24:55AM +0530, Jerin Jacob wrote:
> On Fri, Nov 25, 2016 at 11:00:53AM +0000, Bruce Richardson wrote:
> > On Fri, Nov 25, 2016 at 05:53:34AM +0530, Jerin Jacob wrote:
> > > On Thu, Nov 24, 2016 at 04:35:56PM +0100, Thomas Monjalon wrote:
> > > > 2016-11-24 07:29, Jerin Jacob:
> > > > > On Wed, Nov 23, 2016 at 07:39:09PM +0100, Thomas Monjalon wrote:
> > > > > > 2016-11-18 11:14, Jerin Jacob:
> > > > > > > +Eventdev API - EXPERIMENTAL
> > > > > > > +M: Jerin Jacob <jerin.jacob at caviumnetworks.com>
> > > > > > > +F: lib/librte_eventdev/
> > > > > > 
> > > 
> > > I don't think there is any portability issue here, I can explain.
> > > 
> > > The application level, we have two more use case to deal with non burst
> > > variant
> > > 
> > > - latency critical work
> > > - on dequeue, if application wants to deal with only one flow(i.e to
> > >   avoid processing two different application flows to avoid cache trashing)
> > > 
> > > Selection of the burst variants will be based on
> > > rte_event_dev_info_get() and rte_event_dev_configure()(see, max_event_port_dequeue_depth,
> > > max_event_port_enqueue_depth, nb_event_port_dequeue_depth, nb_event_port_enqueue_depth )
> > > So I don't think their is portability issue here and I don't want to waste my
> > > CPU cycles on the for loop if application known to be working with non
> > > bursts variant like below
> > > 
> > 
> > If the application is known to be working on non-burst varients, then
> > they always request a burst-size of 1, and skip the loop completely.
> > There is no extra performance hit in that case in either the app or the
> > driver (since the non-burst driver always returns 1, irrespective of the
> > number requested).
> 
> Hmm. I am afraid, There is.
> On the app side, the const "1" can not be optimized by the compiler as
> on downside it is function pointer based driver interface
> On the driver side, the implementation would be for loop based instead
> of plain access.
> (compiler never can see the const "1" in driver interface)
> 
> We are planning to implement burst mode as kind of emulation mode and
> have a different scheme for burst and nonburst. The similar approach we have
> taken in introducing rte_event_schedule() and split the responsibility so
> that SW driver can work without additional performance overhead and neat
> driver interface.
> 
> If you are concerned about the usability part and regression on the SW
> driver, then it's not the case, application will use nonburst variant only if
> dequeue_depth == 1 and/or explicit case where latency matters.
> 
> On the portability side, we support both case and application if written based
> on dequeue_depth it will perform well in both implementations.IMO, There is
> no another shortcut for performance optimized application running on different
> set of model.I think it is not an issue as, in event model as each cores
> identical and main loop can be changed based on dequeue_depth
> if needs performance(anyway mainloop will be function pointer based).
> 

Ok, I think I see your point now. Here is an alternative suggestion.

1. Keep the single user API.
2. Have both single and burst function pointers in the driver
3. Call appropriately in the eventdev layer based on parameters. For
example:

rte_event_dequeue_burst(..., int num)
{
	if (num == 1 && single_dequeue_fn != NULL)
		return single_dequeue_fn(...);
	return burst_dequeue_fn(...);
}

This way drivers can optionally special-case the single dequeue case -
the function pointer check will definitely be predictable in HW making
that a near-zero-cost check - while not forcing all drivers to do so.
It also reduces the public API surface, and gives us a single enqueue
and dequeue function.

/Bruce



More information about the dev mailing list