[dpdk-dev] [RFC] sched: new features and improvements

Dumitrescu, Cristian cristian.dumitrescu at intel.com
Fri Nov 30 15:14:32 CET 2018

Hi guys,

Here is a list of incremental features and improvements we are considering to prototype and add to the DPDK hierarchical scheduler SW library. This list is driven by our findings as well as feedback from various users. Please take a look and feel free to add more features to this list or comment on the features below. Of course, all these items are subject to preserving the functional correctness, existing accuracy and performance of the current implementation.

1. Pipe level: Increase number of traffic classes (TCs). Allow a more flexible mapping of the pipe queues to traffic classes. Do not allocate memory for queues that are not needed.
a) Currently, each pipe has 16 queues that are hardwired into 4 TCs scheduled with strict priority (SP), and each TC has exactly  with 4 queues that are scheduled with Weighted  Fair Queuing (WFQ). Specifically, TC0 = [Queue 0 .. Queue 3], TC1 = [Queue 4 .. Queue 7], TC2 = [Queue 8 .. Queue 11], TC3 = [Queue 12 .. Queue 15].
b) The plan is to support up to 16 TCs. All the high priority TCs (TC1, TC2, ...) will have exactly 1 queue, while the lowest priority TC, called Best Effort (BE), has 1, 4 or 8 queues. This is justified by the fact that typically all the high priority TCs are fully provisioned (small to medium traffic rates), while most of the traffic fits into the BE class, which is usually greatly oversubscribed. 
c) This leads to the following valid options for mapping pipe queues to TCs:
	i. BE class has 1 queue => Max number of TCs is 16
	ii. BE class has 4 queues => Max number of TCs is 13
	iii. BE class has 8 queues => Max number of TCs is 9
d) In order to keep implementation complexity under control, it is required that all pipes from the same subport share the same mapping of pipe queues to TCs.
e) Currently, all the 16 pipe queues have to be configured (and memory allocated for them internally), even if not all of them are needed. Going forward, it shall be allowed to use less than 16 queues per pipe when not all the 16 queues are needed, and no memory shall be allocated for the queues that are not needed.

2. Subport level: Allow different subports of the same port to have different configuration in terms of number of pipes, pipe queue sizes, pipe queue mapping to traffic classes, etc.
a) In order to keep the implementation complexity under control, it is required that all pipes within the same subport share the same configuration for these parameters.
b) Internal implications: each subport port will likely need to have its own bitmap data structure.

3. Redistribution of unused pipe BW to other pipes within the same subport: Enable the existing oversubscription mechanism by default.
a) Currently, this mechanism needs to be explicitly enabled at build time.
b) This change is subject to performance impact not going to be significant.

4. Pipe TC level: Improve shaper accuracy.
a) The current pipe TC rate limiting mechanism is not robust and it can result in deadlock for certain configurations. Currently, the pipe TC credits are periodically cleared and re-initialized to a fixed value (period is configurable), which can result in deadlock if number of pipe TC credits is smaller than the MTU.
b) The plan is to move the pipe TC rate limiting from the scheduler dequeue operation (shaping) to the scheduler enqueue operation (metering), by using one token bucket per pipe TC. Basically, packets that exceed the pipe TC rate will be detected and dropped earlier rather than later, which should be beneficial from the perspective of not spending cycles on packets that are later going to dropped anyway.
c) Internal implications: Number of token buckets is multiplied 16 times. Need to improve the token bucket performance (e.g. by using branchless code) in order to get back some of the performance.

Best regards,
Your faithful DPDK QoS implementers,
Cristian and Jasvinder

More information about the dev mailing list