[dpdk-dev] rte_sched library performance question
Dumitrescu, Cristian
cristian.dumitrescu at intel.com
Thu Feb 16 20:08:05 CET 2017
Hi Zoltan,
> -----Original Message-----
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Zoltan Kiss
> Sent: Thursday, February 16, 2017 3:14 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] rte_sched library performance question
>
> Hi,
>
> I'm experimenting a little bit with the scheduler library, and I got some
> performance numbers which seems to be worse than what I've expected.
> I'm sending 64 bytes packets on a 10G interface to a separate thread, and
> my simple test program (based on the qos_sched example) does the
> following:
>
> while (1) {
> uint16_t ret = rte_ring_sc_dequeue_burst(it.ring,
> (void**)flushbatch, FLUSH_SIZE);
> rte_mbuf** t = flushbatch;
>
> if (!ret) {
> /* This call is necessary to make sure the TX completed
> mbuf's
> * are returned to the pool even if there is nothing to
> * transmit */
> rte_eth_tx_burst(it.portid, lcore, t, 0);
> continue;
> }
> rte_sched_port_enqueue(it.port, flushbatch, ret);
> ret = rte_sched_port_dequeue(it.port, flushbatch, FLUSH_SIZE);
Looks to me like the scheduler dequeue burst is equal to the enqueue burst size of FLUSH_SIZE, right?
In this case, you are always dequeueuing the exact packets that you just enqueued, and the scheduler dequeue needs to work really hard to find exactly those FLUSH_SIZE queues that each one have a single packet at this point.
This is wht the enqueue burst size should be bigger than the dequeue burst size. Basically, you add some water into the reservoir up to a reasonable fill level before you start pouring it in your glass if you want to fill the glass quickly.
Typical values used:
-for vector PMD: (enqueue = 32, dequeue = 24), (32, 28), (32, 16), etc
-for scalar PMD: (64, 48), (64, 32), ... We used (256, 248) for VPP
> while (ret) {
> uint16_t n = rte_eth_tx_burst(it.portid, lcore, t, ret);
> /* we cannot drop the packets, so re-send */
> /* update number of packets to be sent */
> ret -= n;
> t = &t[n];
> };
> }
>
> I run this on a separate thread, another one doing rx and feeding the
> packets to the ring. When I comment out the enqueue and dequeue part in
> the
> code (reducing it to simple l2fwd), I can forward the entire ~14 Mpps
> traffic, whilst with the scheduler enabled I can only reach ~5.4 Mpps at
> best. I've tried with a single pipe or with 4k (used rand() to randomly
> distribute between pipe, everything else (class etc) was set to 0), didn't
> make a difference. Is this expected? I'm running this on a Xeon E5-2630 0 @
> 2.30GHz
>
> I've used the following configuration:
>
> ; port configuration [port]
>
> [port]
> frame overhead = 24
> number of subports per port = 1
> number of pipes per subport = 1024
> queue sizes = 64 64 64 64
>
> ; Subport configuration
>
> [subport 0]
> tb rate = 1250000000; Bytes per second
> tb size = 1000000000; Bytes
> tc 0 rate = 1250000000; Bytes per second
> tc 1 rate = 1250000000; Bytes per second
> tc 2 rate = 1250000000; Bytes per second
> tc 3 rate = 1250000000; Bytes per second
> tc period = 10; Milliseconds
> tc oversubscription period = 1000; Milliseconds
>
> pipe 0-1024 = 0; These pipes are configured with pipe profile 0
>
> ; Pipe configuration
>
> [pipe profile 0]
> tb rate = 1250000000; Bytes per second
> tb size = 1000000000; Bytes
>
> tc 0 rate = 1250000000; Bytes per second
> tc 1 rate = 1250000000; Bytes per second
> tc 2 rate = 1250000000; Bytes per second
> tc 3 rate = 1250000000; Bytes per second
> tc period = 10; Milliseconds
>
> tc 0 oversubscription weight = 1
> tc 1 oversubscription weight = 1
> tc 2 oversubscription weight = 1
> tc 3 oversubscription weight = 1
>
> tc 0 wrr weights = 1 1 1 1
> tc 1 wrr weights = 1 1 1 1
> tc 2 wrr weights = 1 1 1 1
> tc 3 wrr weights = 1 1 1 1
>
> Regards,
>
> Zoltan
Regards,
Cristian
More information about the dev
mailing list