[dpdk-dev] Performance impact with QoS

Dumitrescu, Cristian cristian.dumitrescu at intel.com
Mon Nov 17 22:03:37 CET 2014

Previous message: [dpdk-dev] Performance impact with QoS
Next message: [dpdk-dev] [PATCH v6 0/3] app/test: unit test to measure cycles per packet
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi Satish,

The QoS traffic manager has a large memory footprint due to large number of packet queues (e.g. 64K queues of 64 packets each) and large tables (e.g. 4K pipes with one cache line of context per pipe) that far exceeds the amount of CPU cache physically available. There are a lot of data structures that need to be brought into the L1 cache of the traffic manager core in order to take the scheduling decision: bitmap, pipe table entry, queue read/write pointers, queue elements, packet metadata (mbuf), etc. To minimize the penalties associated with the CPU pipeline stalling due to memory accesses, all these data structures are prefetched.

So, the point I am trying to make is there are a lot of critical CPU resources involved: size of L1/L2 cache (per CPU core), size of L3 cache (shared by all CPU cores), bandwidth of L1/L2 cache (per core), bandwidth of L3 cache (shared by all CPU cores), number of outstanding prefetches (per CPU core), etc.

If you map the QoS traffic manager on the same core with packet I/O (i.e. Poll Mode Driver RX/TX), my guess is these two I/O intensive workloads will both compete for the CPU resources listed above and will also impact each other by thrashing each other data structures in and out of L1/L2 cache. If you split them on different CPU cores, their operation is more performant and more predictable, as each one is having its own L1/L2 cache now.

Did you try a CPU core chaining setup (through rte_rings) similar to qos_sched application, like: RX -> (TM enqueue & dequeue) -> TX or RX -> (TM enqueue & TM dequeue & TX)? I am sure you will find the right setup for you by conducting similar experiments. Of course, result also depends on which other workloads your application is performing.

Regards,
Cristian

From: satish [mailto:nsatishbabu at gmail.com]
Sent: Monday, November 17, 2014 6:03 AM
To: dev at dpdk.org
Cc: Dumitrescu, Cristian
Subject: Re: Performance impact with QoS

Hi All,
Can someone please provide comments on queries in below mail?

Regards,
Satish Babu

On Mon, Nov 10, 2014 at 4:24 PM, satish <nsatishbabu at gmail.com<mailto:nsatishbabu at gmail.com>> wrote:
Hi,
I need comments on performance impact with DPDK-QoS.

We are working on developing a application based on DPDK.
Our application supports IPv4 forwarding with and without QoS.

Without QOS, we are achieving almost full wire rate (bi-directional traffic) with 128, 256 and 512 byte packets.
But when we enabled QoS, performance dropped to half for 128 and 256 byte packets.
For 512 byte packet, we didn't observe any drop even after enabling QoS (Achieving full wire rate).
Traffic used in both the cases is same. ( One stream with Qos match to first queue in traffic class 0)

In our application, we are using memory buffer pools to receive the packet bursts (Ring buffer is not used).
Same buffer is used during packet processing and TX (enqueue and dequeue). All above handled on the same core.

For normal forwarding(without QoS), we are using rte_eth_tx_burst for TX.

For forwarding with QoS, using rte_sched_port_pkt_write(), rte_sched_port_enqueue () and rte_sched_port_dequeue ()
before rte_eth_tx_burst ().

We understood that performance dip in case of 128 and 256 byte packet is bacause
of processing more number of packets compared to 512 byte packet.

Can some comment on performance dip in my case with QOS enabled?
[1] can this be because of inefficient use of RTE calls for QoS?
[2] Is it the poor buffer management?
[3] any other comments?

To achieve good performance in QoS case, is it must to use worker thread (running on different core) with ring buffer?

Please provide your comments.

Thanks in advance.

Regards,
Satish Babu




--
Regards,
Satish Babu
--------------------------------------------------------------
Intel Shannon Limited
Registered in Ireland
Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
Registered Number: 308263
Business address: Dromore House, East Park, Shannon, Co. Clare

This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.

Previous message: [dpdk-dev] Performance impact with QoS
Next message: [dpdk-dev] [PATCH v6 0/3] app/test: unit test to measure cycles per packet
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the dev mailing list