[dpdk-users] running multiple independent dpdk applications randomly locks up machines
stephen at networkplumber.org
Fri Aug 19 23:03:50 CEST 2016
On Fri, 19 Aug 2016 13:32:06 -0700
Zhongming Qu <zhongming at luminatewireless.com> wrote:
> As stated in the subject, running multiple dpdk applications (only one
> process per application) randomly locks up machines. Thanks in advance for
> any help.
> It is difficult to provide the exact set of information useful for
> debugging. Just listing the as much info as possible in the hope of ringing
> a bell somewhere.
> System Configuration:
> - Motherboard: Supermicro X10SRi-F (BIOS upgraded to the latest version as
> of July 2016)
> - Intel Xeon E5-2667 v3 (Haswell), no NUMA
> - 64GB DRAM
> - Ubuntu 14.04 kernel 3.13.0-49-generic
> - DPDK 16.04
> - 1024 x 2M hugepages are reserved
> - 82599ES NIC (2 x 10G) at pci_addr 02:00.0 and 02:00.1. Both ports use the
> ixgbe_uio kernel driver and the ixgbe PMD.
> Use Scenario of DPDK Application:
> - Two single-process dpdk applications, A and B, need to run simultaneously.
> - It is made sure that A and B do not have any race conditions or memory
> issues, that is, apart from dpdk.
> - Each application uses 512 x 2M hugepages (half of the total reserved
> - Each application binds to one port via `--pci-whitelist <pci_addr>`.
> - Use `-m 1024` and `--file-prefix <some_unique_id_per_pci_addr>`, as
> instructed by 19.2.3 in the Programmer's Guide (
> Description of Problem:
> - Starting and killing down A and B repeatedly every 30 seconds has a
> chance of locking up the machine.
> - No kernel var/log/syslog, no dmesg, nothing persistent, is available for
> debugging after a reboot of the frozen machine.
> - Looks like a kernel panic as it dumps some panic info to the serial
> console (not useful...) and the CapsLock and NumLock keys on a physically
> connected keyboard do not respond.
> - No particular sequence of operations of starting and killing A and B, so
> far, has been found to reliably lead to a lockup. The best effort of
> reproducing the lockup is a keep-trying-until-lockup approach.
> A Few Things Tried:
> - Via dumping logging to stderr and files, it is found that the lock up can
> happen during rte_eal_hugepage_init(), or after it, after the program is
> - It is made sure that rte_config.mem_config->memseg is properly
> initialized. That is, the total amount of memory reserved in the memseg is
> 512 x 2M hugepages.
> - Zeroing all huepages when the hugefile is created and mapped, or
> immediately after memsegs are initialized (as the second call of
> map_all_hugepages() in rte_eal_hugepage_init()) does not fix the problem.
> - By default, hugefiles in /mnt/huge are not cleaned up when the
> applications are killed. Though, cleaning them up did not solve the problem
> Thanks very much for any input!
Obviously, two applications can't share the same queue.
Also, you need to give application a different core mask; at least if you are using
poll mode like the DPDK examples.
You might be better off having one primary DPDK process and two secondary processes.
More information about the users