[dpdk-users] how to design high performance QoS support for a large amount of subscribers
yuyong.zhang at casa-systems.com
Thu Aug 4 15:46:30 CEST 2016
Thank you very much Cristian for the insightful response.
Very much appreciated.
From: Dumitrescu, Cristian [mailto:cristian.dumitrescu at intel.com]
Sent: Thursday, August 4, 2016 9:01 AM
To: Yuyong Zhang <yuyong.zhang at casa-systems.com>; dev at dpdk.org; users at dpdk.org
Subject: RE: how to design high performance QoS support for a large amount of subscribers
> -----Original Message-----
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Yuyong Zhang
> Sent: Tuesday, August 2, 2016 4:26 PM
> To: dev at dpdk.org; users at dpdk.org
> Subject: [dpdk-dev] how to design high performance QoS support for a
> large amount of subscribers
> I am trying to add QoS support for a high performance VNF with large
> amount of subscribers (millions).
Welcome to the world of DPDK QoS users!
It requires to support guaranteed bit rate
> for different service level of subscribers. I.e. four service levels
> need to be
> * Diamond, 500M
> * Gold, 100M
> * Silver, 50M
> * Bronze, 10M
Service levels translate to pipe profiles in our DPDK implementation. The set of pipe profiles is defined per port.
> Here is the current pipeline design using DPDK:
> * 4 RX threads, does packet classification and load balancing
> * 10-20 worker thread, does application subscriber management
> * 4 TX threads, sends packets to TX NICs.
> * Ring buffers used among RX threads, Worker threads, and TX threads
> I read DPDK program guide for QoS framework regarding hierarchical
> scheduler: Port, sub-port, pipe, TC and queues, I am looking for
> advice on how to design QoS scheduler to support millions of
> subscribers (pipes) which traffic are processed in tens of worker
> threads where subscriber management processing are handled?
Having millions of pipes per port poses some challenges:
1. Does it actually make sense? Assuming the port rate is 10GbE, looking at the smallest user rate you mention above (Bronze, 10Mbps/user), this means that fully provisioning all users (i.e. making sure you can fully handle each user in worst case scenario) results in a maximum of 1000 users per port. Assuming overprovisioning of 50:1, this means a maximum of 50K users per port.
2. Memory challenge. The number of pipes per port is configurable -- hey, this is SW! :) -- but each of these pipes has 16 queues. For 4K pipes per port, this is 64K queues per port; for typical value of 64 packets per queue, this is 4M packets per port, so worst case scenario we need to provision 4M packets in the buffer pool for each output port that has hierarchical scheduler enabled; for buffer size of ~2KB each, this means ~8GB of memory for each output port. If you go from 4k pipes per port to 4M pipes per port, this means 8TB of memory per port. Do you have enough memory in your system? :)
One thing to realize is that even for millions of users in your system, not all of them are active at the same time. So maybe have a smaller number of pipes and only map the active users (those that have any packets to send now) to them (a fraction of the total set of users), with the set of active users changing over time.
You can also consider mapping several users to the same pipe.
> One design thought is as the following:
> 8 ports (each one is associated with one physical port), 16-20
> sub-ports (each is used by one Worker thread), each sub-port supports
> 250K pipes for subscribers. Each worker thread manages one sub-port
> and does metering for the sub-port to get color, and after identity
> subscriber flow pick a unused pipe, and do sched enqueuer/de-queue and
> then put into TX rings to TX threads, and TX threads send the packets to TX NICs.
In the current implementation, each port scheduler object has to be owned by a single thread, i.e. you cannot slit a port across multiple threads, therefore is not straightforward to have different sub-ports handled by different threads. The workaround is to split yourself the physical NIC port into multiple port scheduler objects: for example, create 8 port scheduler objects, set the rate of each to 1/8 of 10GbE, have each of them feed a different NIC TX queue of the same physical NIC port.
You can probably get this scenario (or very similar) up pretty quickly just by handcrafting yourself a configuration file for examples/ip_pipeline application.
> Are there functional and performance issues with above approach?
> Any advice and input are appreciated.
More information about the users