[dpdk-dev] [PATCH 1/4] eventdev: introduce event driven programming model

Jerin Jacob jerin.jacob at caviumnetworks.com
Thu Nov 24 02:59:13 CET 2016


On Wed, Nov 23, 2016 at 07:39:09PM +0100, Thomas Monjalon wrote:
> Hi Jerin,

Hi Thomas,

> 
> Thanks for bringing a big new piece in DPDK.
> 
> I made some comments below.

Thanks for the review.

> 
> 2016-11-18 11:14, Jerin Jacob:
> > +Eventdev API - EXPERIMENTAL
> > +M: Jerin Jacob <jerin.jacob at caviumnetworks.com>
> > +F: lib/librte_eventdev/
> 
> OK to mark it experimental.
> What is the plan to remove the experimental word?

IMO, EXPERIMENTAL status can be changed when
- At least two event drivers available(Intel and Cavium are working on
  SW and HW event drivers)
- Functional test applications are fine with at least two drivers
- Portable example application to showcase the features of the library
- eventdev integration with another dpdk subsystem such as ethdev

Thoughts?. I am not sure the criteria used in cryptodev case.


> 
> > + * RTE event device drivers do not use interrupts for enqueue or dequeue
> > + * operation. Instead, Event drivers export Poll-Mode enqueue and dequeue
> > + * functions to applications.
> 
> To the question "what makes DPDK different" it could be answered
> that DPDK event drivers implement polling functions :)

Mostly taken from ethdev API header file :-)

> 
> > +#include <stdbool.h>
> > +
> > +#include <rte_pci.h>
> > +#include <rte_dev.h>
> > +#include <rte_memory.h>
> 
> Is it possible to remove some of these includes from the API?

OK. I will scan through all the header file and remove the not required
ones.

> 
> > +
> > +#define EVENTDEV_NAME_SKELETON_PMD event_skeleton
> > +/**< Skeleton event device PMD name */
> 
> I do not understand this #define.

Applications can explicitly request the a specific driver though driver
name. This will go as argument to rte_event_dev_get_dev_id(const char *name).
The reason for keeping this #define in rte_eventdev.h is that,
application needs to include only rte_eventdev.h not rte_eventdev_pmd.h.

I will remove the definition from this patch and add this definition in
skeleton driver patch(patch 03/04)

> And it is not properly prefixed.

OK. I will prefix with RTE_ in v2.

> 
> > +struct rte_event_dev_info {
> > +	const char *driver_name;	/**< Event driver name */
> > +	struct rte_pci_device *pci_dev;	/**< PCI information */
> 
> There is some work in progress to remove PCI information from ethdev.
> Please do not add any PCI related structure in eventdev.
> The generic structure is rte_device.

OK. Makes sense. A grep of "rte_device" shows none of the subsystem
implemented yet and the work in progress. I will change to rte_device
when it is mainline. The skeleton eventdev driver based on PCI bus needs
this for the moment.


> 
> > +struct rte_event_dev_config {
> > +	uint32_t dequeue_wait_ns;
> > +	/**< rte_event_dequeue() wait for *dequeue_wait_ns* ns on this device.
> 
> Please explain exactly when the wait occurs and why.

Here is the explanation from rte_event_dequeue() API definition,
-
@param wait
0 - no-wait, returns immediately if there is no event.
>0 - wait for the event, if the device is configured with
RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT then this function will wait until
the event available or *wait* time.
if the device is not configured with RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT
then this function will wait until the event available or *dequeue_wait_ns*
                                                      ^^^^^^^^^^^^^^^^^^^^^^
ns which was previously supplied to rte_event_dev_configure()
-
This is provides the application to have control over, how long the
implementation should wait if event is not available.

Let me know what exact changes are required if details are not enough in
rte_event_dequeue() API definition.

> 
> > +	 * This value should be in the range of *min_dequeue_wait_ns* and
> > +	 * *max_dequeue_wait_ns* which previously provided in
> > +	 * rte_event_dev_info_get()
> > +	 * \see RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT
> 
> I think the @see syntax would be more consistent than \see.

OK. I will change to @see

> 
> > +	uint8_t nb_event_port_dequeue_depth;
> > +	/**< Number of dequeue queue depth for any event port on this device.
> 
> I think it deserves more explanations.

see below

> 
> > +	uint32_t event_dev_cfg;
> > +	/**< Event device config flags(RTE_EVENT_DEV_CFG_)*/
> 
> How this field differs from others in the struct?
> Should it be named flags?

OK. I will change to flags

> 
> > +	uint32_t event_queue_cfg; /**< Queue config flags(EVENT_QUEUE_CFG_) */
> 
> Same comment about the naming of this field for event_queue config sruct.

OK. I will change to flags

> 
> > +/** Event port configuration structure */
> > +struct rte_event_port_conf {
> > +	int32_t new_event_threshold;
> > +	/**< A backpressure threshold for new event enqueues on this port.
> > +	 * Use for *closed system* event dev where event capacity is limited,
> > +	 * and cannot exceed the capacity of the event dev.
> > +	 * Configuring ports with different thresholds can make higher priority
> > +	 * traffic less likely to  be backpressured.
> > +	 * For example, a port used to inject NIC Rx packets into the event dev
> > +	 * can have a lower threshold so as not to overwhelm the device,
> > +	 * while ports used for worker pools can have a higher threshold.
> > +	 * This value cannot exceed the *nb_events_limit*
> > +	 * which previously supplied to rte_event_dev_configure()
> > +	 */
> > +	uint8_t dequeue_depth;
> > +	/**< Configure number of bulk dequeues for this event port.
> > +	 * This value cannot exceed the *nb_event_port_dequeue_depth*
> > +	 * which previously supplied to rte_event_dev_configure()
> > +	 */
> > +	uint8_t enqueue_depth;
> > +	/**< Configure number of bulk enqueues for this event port.
> > +	 * This value cannot exceed the *nb_event_port_enqueue_depth*
> > +	 * which previously supplied to rte_event_dev_configure()
> > +	 */
> > +};
> 
> The depth configuration is not clear to me.

Basically the maximum number of events can be enqueued/dequeued at time
from a given event port. depth of one == non burst mode.

> 
> > +/* Event types to classify the event source */
> 
> Why this classification is needed?

This for application pipeling and the cases like, if application wants to know which
subsystem generated the event.

example packet forwarding loop on the worker cores:
while(1) {
	ev = dequeue()
	// event from ethdev subsystem
	if (ev.event_type == RTE_EVENT_TYPE_ETHDEV) {
		- swap the mac address
		- push to atomic queue for ingress flow order maintenance
		  by CORE
	/* events from core */
	} else if (ev.event_type == RTE_EVENT_TYPE_CORE) {

	}
	enqueue(ev);
}

> 
> > +#define RTE_EVENT_TYPE_ETHDEV           0x0
> > +/**< The event generated from ethdev subsystem */
> > +#define RTE_EVENT_TYPE_CRYPTODEV        0x1
> > +/**< The event generated from crypodev subsystem */
> > +#define RTE_EVENT_TYPE_TIMERDEV         0x2
> > +/**< The event generated from timerdev subsystem */
> > +#define RTE_EVENT_TYPE_CORE             0x3
> > +/**< The event generated from core.
> 
> What is core?

The event are generated by lcore for pipeling. Any suggestion for
better name? lcore?

> 
> > +/* Event enqueue operations */
> 
> I feel a longer explanation is needed here to describe
> what is an operation and where this data is useful.

I will try to add it. The v1 has lengthy description for release
because it not self explanatory.

> 
> > +#define RTE_EVENT_OP_NEW                0
> > +/**< New event without previous context */
> > +#define RTE_EVENT_OP_FORWARD            1
> > +/**< Re-enqueue previously dequeued event */
> > +#define RTE_EVENT_OP_RELEASE            2
> 
> There is no comment for the release operation.

Its there. see next comment

> 
> > +/**
> > + * Release the flow context associated with the schedule type.
> > + *
> [...]
> > + */
> 
> There is no function declaration below this comment.

This comment was for previous RTE_EVENT_OP_RELEASE.I will fix the doxygen
formatting issue.

> 
> > +/**
> > + * The generic *rte_event* structure to hold the event attributes
> > + * for dequeue and enqueue operation
> > + */
> > +struct rte_event {
> > +	/** WORD0 */
> > +	RTE_STD_C11
> > +	union {
> > +		uint64_t event;
> [...]
> > +	};
> > +	/** WORD1 */
> > +	RTE_STD_C11
> > +	union {
> > +		uintptr_t event_ptr;
> 
> I wonder if it can be a problem to have the size of this field
> not constant across machines.

OK. May be I can make it as "uint64_t u64" to reserve space or I can
remove it.

> 
> > +		/**< Opaque event pointer */
> > +		struct rte_mbuf *mbuf;
> > +		/**< mbuf pointer if dequeued event is associated with mbuf */
> 
> How do we know that an event is associated with mbuf?

By looking at the event source/type RTE_EVENT_TYPE_*

> Does it mean that such events are always converted into mbuf even if the
> application does not need it?

Hardware has dependency on getting physical address of the event, so any
struct that has "phys_addr_t buf_physaddr" works.

> 
> > +struct rte_eventdev_driver;
> > +struct rte_eventdev_ops;
> 
> I think it is better to split API and driver interface in two files.
> (we should do this split in ethdev)

I thought so, but then the "static inline" versions of northbound
API(like rte_event_enqueue) will go another file(due to the fact that
implementation need to deference "dev->data->ports[port_id]"). Do you want that way?
I would like to keep all northbound API in rte_eventdev.h and not any of them
in rte_eventdev_pmd.h.

Any suggestions?

> 
> > +/**
> > + * Enqueue the event object supplied in the *rte_event* structure on an
> > + * event device designated by its *dev_id* through the event port specified by
> > + * *port_id*. The event object specifies the event queue on which this
> > + * event will be enqueued.
> > + *
> > + * @param dev_id
> > + *   Event device identifier.
> > + * @param port_id
> > + *   The identifier of the event port.
> > + * @param ev
> > + *   Pointer to struct rte_event
> > + *
> > + * @return
> > + *  - 0 on success
> > + *  - <0 on failure. Failure can occur if the event port's output queue is
> > + *     backpressured, for instance.
> > + */
> > +static inline int
> > +rte_event_enqueue(uint8_t dev_id, uint8_t port_id, struct rte_event *ev)
> 
> Is it really needed to have non-burst variant of enqueue/dequeue?

Yes. certain HW can work only with non burst variants.
> 
> > +/**
> > + * Converts nanoseconds to *wait* value for rte_event_dequeue()
> > + *
> > + * If the device is configured with RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT flag then
> > + * application can use this function to convert wait value in nanoseconds to
> > + * implementations specific wait value supplied in rte_event_dequeue()
> 
> Why is it implementation-specific?
> Why this conversion is not internal in the driver?

This is for performance optimization, otherwise in drivers
need to convert ns to ticks in "fast path"

> 
> End of review for this patch ;)


More information about the dev mailing list