[dpdk-dev] [RFC PATCH 0/1] eventtimer: introduce event timer wheel

Jerin Jacob jerin.jacob at caviumnetworks.com
Thu Aug 17 18:11:03 CEST 2017


Some of the NPU class of networking hardwares has timer hardware where the user
can arm and cancel the event timer. On the expiry of the timeout time, the
hardware will post the notification as an event to eventdev HW, Instead of
calling a callback like CPU based timer scheme. It enables, highresolution
(1us or so) timer management using internal or external clock domains, and
offloading the timer housing keeping work from the worker lcores.

This RFC attempts to abstract such NPU class of timer Hardware and introduce
event timer wheel subsystem inside the eventdev as they are tightly coupled.

This RFC introduces the functionality to create an event timer wheel. This
allows an application to arm event timers, which shall enqueue an event to a
specified event queue on expiry of a given interval.

The event timer wheel uses an ops table to which the various event devices
(e.g Cavium Octeontx, NXP dpaa2 and SW) register timer subsystem implementation
specific ops to use.

The RFC extends DPDK event based programming model where event can be of type
timer, and expiry event will be notified through CPU over eventdev ports.

Some of the use cases of event timer wheel are Beacon Timers,
Generic SW Timeout, Wireless MAC Scheduling, 3G Frame Protocols,
Packet Scheduling, Protocol Retransmission Timers, Supervision Timers.
All these use cases require high resolution and low time drift.

The abstract working model of an event timer wheel is as follows:
=================================================================
                               timer_tick_ns
                                   +
                      +-------+    |
                      |       |    |
              +-------+ bkt 0 +----v---+
              |       |       |        |
              |       +-------+        |
          +---+---+                +---+---+  +---+---+---+---+
          |       |                |       |  |   |   |   |   |
          | bkt n |                | bkt 1 |<-> t0| t1| t2| tn|
          |       |                |       |  |   |   |   |   |
          +---+---+                +---+---+  +---+---+---+---+
              |       Timer wheel      |
          +---+---+                +---+---+
          |       |                |       |
          | bkt 4 |                | bkt 2 |<--- Current bucket
          |       |                |       |
          +---+---+                +---+---+
               |      +-------+       |
               |      |       |       |
               +------+ bkt 3 +-------+
                      |       |
                      +-------+

 - It has a virtual monotonically increasing 64-bit timer wheel clock based on
   *enum rte_event_timer_wheel_clk_src* clock source. The clock source could
   be a CPU clock, or a platform depended external clock.

 - Application creates a timer wheel instance with given clock source,
   the total number of event timers, resolution(expressed in ns) to traverse
   between the buckets.

 - Each timer wheel may have 0 to n buckets based on the configured
   max timeout(max_tmo_ns) and resolution(timer_tick_ns). On timer wheel
   start, the timer starts ticking at *timer_tick_ns* resolution.

 - Application arms an event timer to be expired at the number of
   *timer_tick_ns* from now.

 - Application can cancel the existing armed timer if required.

 - If not canceled by the application and the timer expires then the library
   injects the timer expiry event to the designated event queue.

 - The timer expiry event will be received through *rte_event_dequeue_burst*

 - Application frees the created timer wheel instance.

A more detailed description of the event timer wheel is contained in the
header's comments.

Implementation thoughts
=======================
The event devices have to provide a driver level function that is used to get
event timer subsystem capability and the respective event timer wheel ops.
if the event device is not capable a software implementation of the event timer
wheel ops will be selected.

The software implementation of timer wheel will make use of existing
rte_timer[1], rte_ring library and EAL service cores[2] to achieve event
generation. The worker cores call event timer arm function which enqueues event
timer to a rte_ring. The registered service core would then dequeue event timer
from rte_ring and use the rte_timer library to register a timer.  The service
core then invokes rte_timer_manage() function to retrieve expired timers and
generates the associated event.

The implementation of event timer wheel subsystem for both hardware (Cavium
OCTEONTX) and software(if there are no volunteers) will be undertaken by Cavium.

[1] http://dpdk.org/doc/guides/prog_guide/timer_lib.html
[2] http://dpdk.org/ml/archives/dev/2017-May/065207.html

An example code snippet to show the proposed API usage
======================================================
example: TCP Retransmission in abstract form.

uint8_t
configure_event_dev(...)
{
	/* Create the event device. */
	const struct rte_event_dev_config config = {
		.nb_event_queues = 1,
		/* Event device related configuration. */
		...
	};

	rte_event_dev_configure(event_dev_id, &config);
	/* Event queue and port configuration. */
	...
	/* Start the event device.*/
	rte_event_dev_start(event_dev_id);
}

#define NSECPERSEC	1E9 // No of ns for 1 sec
uint8_t
configure_event_timer_wheel(...)
{
	/* Create an event timer wheel for reliable connections. */
	const struct rte_event_timer_wheel_config wheel_config = {
		.event_dev_id = event_dev_id,
		.timer_wheel_id = 0,
		.clk_src = RTE_EVENT_TIMER_WHEEL_CPU_CLK,
		.timer_tick_ns = NSECPERSEC / 10, // 100 milliseconds
		.max_tmo_nsec = 180 * NSECPERSEC // 2 minutes
		.nb_timers = 40000, // Number of timers that the wheel can hold.
		.timer_wheel_flags = 0,
	};
	struct rte_event_timer_wheel *wheel = NULL;
	wheel = rte_event_timer_wheel_create(&wheel_config);
	if (wheel == NULL)
	{
		/* Failed to create event timer wheel. */
		...
		return false;

	}
	/* Start the event timer wheel. */
	rte_event_timer_wheel_start(wheel);

	/* Create a mempool of event timers. */
	struct rte_mempool *event_timer_pool = NULL;

	event_timer_pool = rte_mempool_create("event_timer_mempool", SIZE,
			sizeof(struct rte_event_timer), ...);
	if (event_timer_pool == NULL)
	{
		/* Failed to create event timer mempool. */
		...
		return false;
	}
}


uint8_t
process_tcp_data_packet(...)
{
	/*Classify based on type*/
	switch (...) {
	case ...:
		/* Setting up a new connection (Protocol dependent.) */
		...
		/* Setting up a new event timer. */
		conn->timer = NULL
		rte_mempool_get(event_timer_pool, (void **)&conn->timer);
		if (timer == NULL) {
			/* Failed to get event timer instance. */
			/* Tear down the connection */
			return false;
		}

		/* Set up the timer event. */
		conn->timer->ev.u64 = conn;
		conn->timer->ev.queue_id = event_queue_id;
		...
		/* All necessary resources successfully allocated */
		/* Compute the timer timeout ticks */
		conn->timer->timeout_ticks = 30; //3 sec Per RFC1122(TCP returns)
		/* Arm the timer with our timeout */
		ret = rte_event_timer_arm_burst(wheel, &conn->timer, 1);
		if (ret != 1) {
			/* Check return value for too early or too late expiration
			 * tick */
			...
			return false;
		}
		return true;
	case ...:
		/* Ack for the previous tcp data packet has been received.*/
		/* cancel the retransmission timer*/
		rte_event_timer_cancel_burst(wheel, &conn->timer, 1);
		break;
	}
}

uint8_t
process_timer_event(...)
{
	/* A retransmission timeout for the connection has been received. */
	conn = ev.event_ptr;
	/* Retransmit last packet (e.g. TCP segment). */
	...
	/* Re-arm timer using original values. */
	rte_event_timer_arm_burst(wheel_id, &conn->timer, 1);
}

void
events_processing_loop(...)
{
	while (...) {
		/* Receive events from the configured event port. */
		rte_event_dequeue_burst(event_dev_id, event_port, &ev, 1, 0);
		...
		/* Classify events based on event_type. */
		switch(ev.event_type) {
			case RTE_EVENT_TYPE_ETHDEV:
				...
				process_packets(...);
				break;
			case RTE_EVENT_TYPE_TIMER:
				process_timer_event(ev);
				...
				break;
		}
	}
}

int main()
{

	configure_event_dev();
	configure_event_timer_wheel();
	on_each_worker_lcores(events_processing_loop())
}

Jerin Jacob (1):
  eventtimer: introduce event timer wheel

 doc/api/doxy-api-index.md                   |   3 +-
 lib/librte_eventdev/Makefile                |   1 +
 lib/librte_eventdev/rte_event_timer_wheel.h | 493 ++++++++++++++++++++++++++++
 lib/librte_eventdev/rte_eventdev.h          |   4 +-
 4 files changed, 498 insertions(+), 3 deletions(-)
 create mode 100644 lib/librte_eventdev/rte_event_timer_wheel.h

-- 
2.14.1



More information about the dev mailing list