[dpdk-dev] XL710 with i40e driver drops packets on RX even on a small rates.
Zhang, Helin
helin.zhang at intel.com
Fri Jan 6 10:45:08 CET 2017
Very good to know that!
Congratulations!
/Helin
-----Original Message-----
From: Martin Weiser [mailto:martin.weiser at allegro-packets.com]
Sent: Friday, January 6, 2017 5:17 PM
To: dev at dpdk.org; Ilya Maximets <i.maximets at samsung.com>
Cc: Zhang, Helin <helin.zhang at intel.com>; Wu, Jingjing <jingjing.wu at intel.com>
Subject: Re: [dpdk-dev] XL710 with i40e driver drops packets on RX even on a small rates.
Hello,
just to let you know we were finally able to resolve the issue. It seems that the affected boards had a firmware issue with PCIe x8 v3.
When we forced the PCI slots to run at x8 v2 the issue disappeared for Test 1 and Test 2. Test 3 still produced missed packets but probably due to the reduced PCIe x8 v2 bandwidth.
We then found out that there exists a BIOS/firmware update for these boards which was issued by Supermicro in November ... unfortunately there are no changenotes whatsoever.
But lo and behold this update seems to include a fix for exactly this issue since now the XL710 is working as expected with PCIe x8 v3.
Best regards,
Martin
On 04.01.17 13:33, Martin Weiser wrote:
> Hello,
>
> I have performed some more thorough testing on 3 different machines to
> illustrate the strange results with XL710.
> Please note that all 3 systems were able to forward the traffic of
> Test
> 1 and Test 2 without packet loss when a 82599ES NIC was installed in
> the same PCI slot as the XL710 in the tests below.
>
> Here is the test setup and the test results:
>
>
> ## Test traffic
>
> In all tests the t-rex traffic generator was used to generate traffic
> on a XL710 card with the following parameters:
>
> ### Test 1
>
> ./t-rex-64 -f cap2/imix_1518.yaml -c 4 -d 60 -m 25 --flip
>
> This resulted in a 60 second run with ~1.21 Gbps traffic on each of
> the two interfaces with ~100000 packets per second on each interface.
>
> ### Test 2
>
> ./t-rex-64 -f cap2/imix_1518.yaml -c 4 -d 60 -m 100 --flip
>
> This resulted in a 60 second run with ~4.85 Gbps traffic on each of
> the two interfaces with ~400000 packets per second on each interface.
>
> ### Test 3
>
> ./t-rex-64 -f cap2/imix_1518.yaml -c 4 -d 60 -m 400 --flip
>
> This resulted in a 60 second run with ~19.43 Gbps traffic on each of
> the two interfaces with ~1600000 packets per second on each interface.
>
>
>
> ## DPDK
>
> On all systems a vanilla DPDK v16.11 testpmd was used with the
> following parameters (PCI IDs differed between systems):
>
> ./build/app/testpmd -l 1,2 -w 0000:06:00.0 -w 0000:06:00.1 -- -i
>
>
>
> ## System 1
>
> * Board: Supermicro X10SDV-TP8F
> * CPU:
> Architecture: x86_64
> CPU op-mode(s): 32-bit, 64-bit
> Byte Order: Little Endian
> CPU(s): 8
> On-line CPU(s) list: 0-7
> Thread(s) per core: 2
> Core(s) per socket: 4
> Socket(s): 1
> NUMA node(s): 1
> Vendor ID: GenuineIntel
> CPU family: 6
> Model: 86
> Model name: Intel(R) Xeon(R) CPU D-1518 @ 2.20GHz
> Stepping: 3
> CPU MHz: 800.250
> CPU max MHz: 2200.0000
> CPU min MHz: 800.0000
> BogoMIPS: 4399.58
> Virtualization: VT-x
> L1d cache: 32K
> L1i cache: 32K
> L2 cache: 256K
> L3 cache: 6144K
> NUMA node0 CPU(s): 0-7
> Flags: fpu vme de pse tsc msr pae mce cx8 apic
> sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss
> ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs
> bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni
> pclmulqdq
> dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm
> pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes
> xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt
> tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle
> avx2 smep bmi2 erms invpcid rtm cqm rdseed adx smap xsaveopt cqm_llc
> cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm arat pln pts
> * Memory channels: 2
> * Memory: 2 * 8192 MB DDR4 @ 2133 MHz
> * NIC firmware: FW 5.0 API 1.5 NVM 05.00.04 eetrack 80002505
> * i40e version: 1.4.25-k
> * OS: Ubuntu 16.04.1 LTS
> * Kernel: 4.4.0-57-generic
> * Kernel parameters: isolcpus=1,2,3,5,6,7 default_hugepagesz=1G
> hugepagesz=1G hugepages=1
>
> ### Test 1
>
> Mostly no packet loss. Sometimes ~10 packets missed of ~600000 on each
> interface when testpmd was not started in interactive mode.
>
> ### Test 2
>
> 100-300 packets of ~24000000 missed on each interface.
>
> ### Test 3
>
> 4000-5000 packets of ~96000000 missed on each interface.
>
>
>
> ## System 2
>
> * Board: Supermicro X10SDV-7TP8F
> * CPU:
> Architecture: x86_64
> CPU op-mode(s): 32-bit, 64-bit
> Byte Order: Little Endian
> CPU(s): 32
> On-line CPU(s) list: 0-31
> Thread(s) per core: 2
> Core(s) per socket: 16
> Socket(s): 1
> NUMA node(s): 1
> Vendor ID: GenuineIntel
> CPU family: 6
> Model: 86
> Model name: 06/56
> Stepping: 4
> CPU MHz: 1429.527
> CPU max MHz: 2300.0000
> CPU min MHz: 800.0000
> BogoMIPS: 3400.37
> Virtualization: VT-x
> L1d cache: 32K
> L1i cache: 32K
> L2 cache: 256K
> L3 cache: 24576K
> NUMA node0 CPU(s): 0-31
> Flags: fpu vme de pse tsc msr pae mce cx8 apic
> sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss
> ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs
> bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni
> pclmulqdq
> dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm
> pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes
> xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt
> tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle
> avx2 smep bmi2 erms invpcid rtm cqm rdseed adx smap xsaveopt cqm_llc
> cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts
> * Memory channels: 2
> * Memory: 4 * 16384 MB DDR4 @ 2133 MHz
> * NIC firmware: FW 5.0 API 1.5 NVM 05.00.04 eetrack 80002505
> * i40e version: 1.4.25-k
> * OS: Ubuntu 16.04.1 LTS
> * Kernel: 4.4.0-57-generic
> * Kernel parameters:
> isolcpus=1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,17,18,19,20,21,22,23,24,2
> 5,26,27,28,29,30,31 default_hugepagesz=1G hugepagesz=1G hugepages=1
>
> ### Test 1
>
> Mostly no packet loss of ~600000.
>
> ### Test 2
>
> 400000-500000 packets of ~24000000 missed on each interface.
>
> ### Test 3
>
> 1200000-1400000 packets of ~96000000 missed on each interface.
>
>
>
> ## System 3
>
> * Board: Supermicro X9SRW-F
> * CPU:
> Architecture: x86_64
> CPU op-mode(s): 32-bit, 64-bit
> Byte Order: Little Endian
> CPU(s): 12
> On-line CPU(s) list: 0-11
> Thread(s) per core: 2
> Core(s) per socket: 6
> Socket(s): 1
> NUMA node(s): 1
> Vendor ID: GenuineIntel
> CPU family: 6
> Model: 62
> Model name: Intel(R) Xeon(R) CPU E5-1650 v2 @ 3.50GHz
> Stepping: 4
> CPU MHz: 1200.253
> CPU max MHz: 3900.0000
> CPU min MHz: 1200.0000
> BogoMIPS: 7000.29
> Virtualization: VT-x
> L1d cache: 32K
> L1i cache: 32K
> L2 cache: 256K
> L3 cache: 12288K
> NUMA node0 CPU(s): 0-11
> Flags: fpu vme de pse tsc msr pae mce cx8 apic
> sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss
> ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs
> bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni
> pclmulqdq
> dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca
> sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c
> rdrand lahf_lm epb tpr_shadow vnmi flexpriority ept vpid fsgsbase smep
> erms xsaveopt dtherm arat pln pts
> * Memory channels: 4
> * Memory: 4 * 8192 MB DDR3 @ 1600 MHz
> * NIC firmware: FW 5.0 API 1.5 NVM 05.00.04 eetrack 80002537
> * i40e version: 1.4.25-k
> * OS: Ubuntu 16.04.1 LTS
> * Kernel: 4.4.0-57-generic
> * Kernel parameters: default_hugepagesz=1G hugepagesz=1G hugepages=1
> isolcpus=1-5,7-11
>
> ### Test 1
>
> No packets lost.
>
> ### Test 2
>
> No packets lost.
>
> ### Test 3
>
> No packets lost.
>
>
>
> Best regards,
> Martin
>
>
>
> On 03.01.17 13:18, Martin Weiser wrote:
>> Hello,
>>
>> we are also seeing this issue on one of our test systems while it
>> does not occur on other test systems with the same DPDK version (we
>> tested
>> 16.11 and current master).
>>
>> The system that we can reproduce this issue on also has a X552 ixgbe
>> NIC which can forward the exact same traffic using the same testpmd
>> parameters without a problem.
>> Even if we install a 82599ES ixgbe NIC in the same PCI slot that the
>> XL710 was in the 82599ES can forward the traffic without any drops.
>>
>> Like in the issue reported by Ilya all packet drops occur on the
>> testpmd side and are accounted as 'imissed'. Increasing the number of
>> rx descriptors only helps a little at low packet rates.
>>
>> Drops start occurring at pretty low packet rates like 100000 packets
>> per second.
>>
>> Any suggestions would be greatly appreciated.
>>
>> Best regards,
>> Martin
>>
>>
>>
>> On 22.08.16 14:06, Ilya Maximets wrote:
>>> Hello, All.
>>>
>>> I've faced with a really bad situation with packet drops on a small
>>> packet rates (~45 Kpps) while using XL710 NIC with i40e DPDK driver.
>>>
>>> The issue was found while testing PHY-VM-PHY scenario with OVS and
>>> confirmed on PHY-PHY scenario with testpmd.
>>>
>>> DPDK version 16.07 was used in all cases.
>>> XL710 firmware-version: f5.0.40043 a1.5 n5.04 e2505
>>>
>>> Test description (PHY-PHY):
>>>
>>> * Following cmdline was used:
>>>
>>> # n_desc=2048
>>> # ./testpmd -c 0xf -n 2 --socket-mem=8192,0 -w 0000:05:00.0 -v \
>>> -- --burst=32 --txd=${n_desc} --rxd=${n_desc} \
>>> --rxq=1 --txq=1 --nb-cores=1 \
>>> --eth-peer=0,a0:00:00:00:00:00 --forward-mode=mac
>>>
>>> * DPDK-Pktgen application was used as a traffic generator.
>>> Single flow generated.
>>>
>>> Results:
>>>
>>> * Packet size: 128B, rate: 90% of 10Gbps (~7.5 Mpps):
>>>
>>> On the generator's side:
>>>
>>> Total counts:
>>> Tx : 759034368 packets
>>> Rx : 759033239 packets
>>> Lost : 1129 packets
>>>
>>> Average rates:
>>> Tx : 7590344 pps
>>> Rx : 7590332 pps
>>> Lost : 11 pps
>>>
>>> All of this dropped packets are RX-dropped on testpmd's side:
>>>
>>> +++++++++++++++ Accumulated forward statistics for all ports+++++++++++++++
>>> RX-packets: 759033239 RX-dropped: 1129 RX-total: 759034368
>>> TX-packets: 759033239 TX-dropped: 0 TX-total: 759033239
>>>
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> +++++++
>>>
>>> At the same time 10G NIC with IXGBE driver works perfectly
>>> without any packet drops in the same scenario.
>>>
>>> Much worse situation with PHY-VM-PHY scenario with OVS:
>>>
>>> * testpmd application used inside guest to forward incoming packets.
>>> (almost same cmdline as for PHY-PHY)
>>>
>>> * For packet size 256 B on rate 1% of 10Gbps (~45 Kpps):
>>>
>>> Total counts:
>>> Tx : 1358112 packets
>>> Rx : 1357990 packets
>>> Lost : 122 packets
>>>
>>> Average rates:
>>> Tx : 45270 pps
>>> Rx : 45266 pps
>>> Lost : 4 pps
>>>
>>> All of this 122 dropped packets can be found in rx_dropped counter:
>>>
>>> # ovs-vsctl get interface dpdk0 statistics:rx_dropped
>>> 122
>>>
>>> And again, no issues with IXGBE on the exactly same scenario.
>>>
>>>
>>> Results of my investigation:
>>>
>>> * I found that all of this packets are 'imissed'. This means that rx
>>> descriptor ring was overflowed.
>>>
>>> * I've modified i40e driver to check the real number of free descriptors
>>> that was not still filled by the NIC and found that HW fills
>>> rx descriptors with uneven rate. Looks like it fills them using
>>> a huge batches.
>>>
>>> * So, root cause of packet drops with XL710 is somehow uneven rate of
>>> filling of the hw rx descriptors by the NIC. This leads to exhausting
>>> of rx descriptors and packet drops by the hardware. 10G IXGBE NIC works
>>> more smoothly and driver is able to refill hw ring with rx descriptors
>>> in time.
>>>
>>> * The issue becomes worse with OVS because of much bigger latencies
>>> between 'rte_eth_rx_burst()' calls.
>>>
>>> The easiest solution for this problem is to increase number of RX descriptors.
>>> Increasing up to 4096 eliminates packet drops but decreases the performance a lot:
>>>
>>> For OVS PHY-VM-PHY scenario by 10%
>>> For OVS PHY-PHY scenario by 20%
>>> For tespmd PHY-PHY scenario by 17% (22.1 Mpps --> 18.2 Mpps for 64B
>>> packets)
>>>
>>> As a result we have a trade-off between zero drop rate on small
>>> packet rates and the higher maximum performance that is very sad.
>>>
>>> Using of 16B descriptors doesn't really help with performance.
>>> Upgrading the firmware from version 4.4 to 5.04 didn't help with drops.
>>>
>>> Any thoughts? Can anyone reproduce this?
>>>
>>> Best regards, Ilya Maximets.
More information about the dev
mailing list