[dpdk-dev] IXGBE throughput loss with 4+ cores

Wiles, Keith keith.wiles at intel.com
Tue Aug 28 23:09:21 CEST 2018



> On Aug 28, 2018, at 2:16 PM, Saber Rezvani <irsaber at zoho.com> wrote:
> 
> 
> 
> On 08/28/2018 11:39 PM, Wiles, Keith wrote:
>> Which version of Pktgen? I just pushed a patch in 3.5.3 to fix a  performance problem.
> I use Pktgen verion 3.0.0, indeed it is O.k as far as I  have one core. (10 Gb/s) but when I increase the number of core (one core per queue) then I loose some performance (roughly 8.5 Gb/s for 8-core). In my scenario Pktgen shows it is generating at line rate, but receiving 8.5 Gb/s.
> Is it because of Pktgen???

Normally Pktgen can receive at line rate up to 10G 64 byte frames, which means Pktgen should not be the problem. You can verify that by looping the cable from one port to another on the pktgen machine to create a external loopback. Then send traffic what ever you can send from one port you should be able to receive those packets unless something is configured wrong.

Please send me the command line for pktgen.


In pktgen if you have this config -m “[1-4:5-8].0” then you have 4 cores sending traffic and 4 core receiving packets.

In this case the TX cores will be sending the packets on all 4 lcores to the same port. On the rx side you have 4 cores polling 4 rx queues. The rx queues are controlled by RSS, which means the RX traffic 5 tuples hash must divide the inbound packets across all 4 queues to make sure each core is doing the same amount of work. If you are sending only a single packet on the Tx cores then only one rx queue be used.

I hope that makes sense.

>> 
>>> On Aug 28, 2018, at 12:05 PM, Saber Rezvani <irsaber at zoho.com> wrote:
>>> 
>>> 
>>> 
>>> On 08/28/2018 08:31 PM, Stephen Hemminger wrote:
>>>> On Tue, 28 Aug 2018 17:34:27 +0430
>>>> Saber Rezvani <irsaber at zoho.com> wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> 
>>>>> I have run multi_process/symmetric_mp example in DPDK example directory.
>>>>> For a one process its throughput is line rate but as I increase the
>>>>> number of cores I see decrease in throughput. For example, If the number
>>>>> of queues set to 4 and each queue assigns to a single core, then the
>>>>> throughput will be something about 9.4. if 8 queues, then throughput
>>>>> will be 8.5.
>>>>> 
>>>>> I have read the following, but it was not convincing.
>>>>> 
>>>>> http://mails.dpdk.org/archives/dev/2015-October/024960.html
>>>>> 
>>>>> 
>>>>> I am eagerly looking forward to hearing from you, all.
>>>>> 
>>>>> 
>>>>> Best wishes,
>>>>> 
>>>>> Saber
>>>>> 
>>>>> 
>>>> Not completely surprising. If you have more cores than packet line rate
>>>> then the number of packets returned for each call to rx_burst will be less.
>>>> With large number of cores, most of the time will be spent doing reads of
>>>> PCI registers for no packets!
>>> Indeed pktgen says it is generating traffic at line rate, but receiving less than 10 Gb/s. So, it that case there should be something that causes the reduction in throughput :(
>>> 
>>> 
>> Regards,
>> Keith
>> 
> 
> 
> 

Regards,
Keith



More information about the dev mailing list