[dpdk-dev] [Bug 219] DPDK 18.11 builds with MLX4/MLX5 support but testpmd won't recognize the device

bugzilla at dpdk.org bugzilla at dpdk.org
Sun Mar 3 01:44:05 CET 2019


https://bugs.dpdk.org/show_bug.cgi?id=219

            Bug ID: 219
           Summary: DPDK 18.11 builds with MLX4/MLX5 support but testpmd
                    won't recognize the device
           Product: DPDK
           Version: 18.11
          Hardware: x86
                OS: Linux
            Status: CONFIRMED
          Severity: normal
          Priority: Normal
         Component: ethdev
          Assignee: dev at dpdk.org
          Reporter: debugnetiq1 at yahoo.ca
  Target Milestone: ---

For testing built 2 versions of DPDK-18.11
- one with static libs (CONFIG_RTE_BUILD_SHARED_LIB=n)
- one with shared libs (CONFIG_RTE_BUILD_SHARED_LIB=y)
In a nutshell, after building and verifying all of the below, croaks:
- with the static-libs DPDK complains of not finding shared libs
  (However DPDK is built by default with static libs)
- with the shared-libs complains of not finding the device
- incidentally pktgen-dpdk, built against the same DPDK static build, complains
of the same issue

With the static-libs DPDK complains of not finding
librte_pmd_mlx4_glue.so.18.02.0
# /opt/dpdk_install/dpdk-18.11/install/bin/testpmd \
>   -l 1-3 \
>   -n 4 \
>   -w aec9:00:02.0 \
>   --vdev="net_vdev_netvsc0,iface=eth1" \
>   -- --port-topology=chained \
>   --nb-cores 1 \
>   --forward-mode=txonly \
>   --eth-peer=0,00:0d:3a:53:13:b7 \
>   --stats-period 1
PMD: mlx4.c:947: mlx4_glue_init(): cannot load glue library:
librte_pmd_mlx4_glue.so.18.02.0: cannot open shared object file: No such file
or directory
PMD: mlx4.c:965: mlx4_glue_init(): cannot initialize PMD due to missing
run-time dependency on rdma-core libraries (libibverbs, libmlx4)
net_mlx5: mlx5.c:1712: mlx5_glue_init(): cannot load glue library:
librte_pmd_mlx5_glue.so.18.11.0: cannot open shared object file: No such file
or directory
net_mlx5: mlx5.c:1730: mlx5_glue_init(): cannot initialize PMD due to missing
run-time dependency on rdma-core libraries (libibverbs, libmlx5)
EAL: Detected 4 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: No free hugepages reported in hugepages-1048576kB
EAL: Debug dataplane logs available - lower performance
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: WARNING: cpu flags constant_tsc=yes nonstop_tsc=no -> using unreliable
clock cycles !
net_vdev_netvsc: probably using routed NetVSC interface "eth1" (index 3)
rte_pmd_tap_probe(): Initializing pmd_tap for net_tap_vsc0 as dtap0
Set txonly packet forwarding mode
Warning: NUMA should be configured manually by using --port-numa-config and
--ring-numa-config parameters along with --numa.
testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=163456, size=2176,
socket=0
testpmd: preferred mempool ops selected: ring_mp_mc
Configuring Port 0 (socket 0)
Port 0: 00:0D:3A:18:A1:73
Checking link statuses...
Done
No commandline core given, start packet forwarding
txonly packet forwarding - ports=1 - cores=1 - streams=1 - NUMA support
enabled, MP allocation mode: native
Logical Core 2 (socket 0) forwards packets on 1 streams:
  RX P=0/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=00:0D:3A:53:13:B7

  txonly packet forwarding packets/burst=32
  packet len=64 - nb packet segments=1
  nb forwarding cores=1 - nb forwarding ports=1


With the shared-libs DPDK complains about mlx4_pci_probe(): cannot access
device
# /opt/dpdk_install/dpdk-18.11/install/bin/testpmd \
>   -l 1-3 \
>   -d /opt/dpdk_install/dpdk-18.11/install/lib \
>   -n 4 \
>   -w aec9:00:02.0 \
>   --vdev="net_vdev_netvsc0,iface=eth1" \
>   -- --port-topology=chained \
>   --nb-cores 1 \
>   --forward-mode=txonly \
>   --eth-peer=0,00:0d:3a:53:13:b7 \
>   --stats-period 1
EAL: Detected 4 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: No free hugepages reported in hugepages-1048576kB
EAL: Debug dataplane logs available - lower performance
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: WARNING: cpu flags constant_tsc=yes nonstop_tsc=no -> using unreliable
clock cycles !
EAL: PCI device aec9:00:02.0 on NUMA socket 0
EAL:   probe driver: 15b3:1004 net_mlx4
PMD: mlx4.c:564: mlx4_pci_probe(): cannot access device, is mlx4_ib loaded?
EAL: Requested device aec9:00:02.0 cannot be used
net_vdev_netvsc: probably using routed NetVSC interface "eth1" (index 3)
rte_pmd_tap_probe(): Initializing pmd_tap for net_tap_vsc0 as dtap0
Set txonly packet forwarding mode
Warning: NUMA should be configured manually by using --port-numa-config and
--ring-numa-config parameters along with --numa.
testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=163456, size=2176,
socket=0
testpmd: preferred mempool ops selected: ring_mp_mc
Configuring Port 0 (socket 0)
Port 0: 00:0D:3A:18:A1:73
Checking link statuses...
Done
No commandline core given, start packet forwarding
txonly packet forwarding - ports=1 - cores=1 - streams=1 - NUMA support
enabled, MP allocation mode: native
Logical Core 2 (socket 0) forwards packets on 1 streams:
  RX P=0/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=00:0D:3A:53:13:B7
...

Nonetheless regardless of errors both versions seem to start sending packets -
not convinced this is really true

Port statistics ====================================
  ######################## NIC statistics for port 0  ########################
  RX-packets: 0          RX-missed: 0          RX-bytes:  0
  RX-errors: 0
  RX-nombuf:  0
  TX-packets: 231072     TX-errors: 0          TX-bytes:  14788608

  Throughput (since last show)
  Rx-pps:            0
  Tx-pps:       201230
  ############################################################################

Incidentally the " mlx4_pci_probe(): cannot access device" error is the same
flagged by pktgen-dpdk (which however crashes) - i.e. there is a common bug in
DPDK w/ respect to MLX4 impacting both testpmd and pktgen


# ./app/x86_64-native-linuxapp-gcc/pktgen -w aec9:00:02.0 -l 1-3  -n 4 -m 4096
--  -m [2-3].0 -l /var/tmp/pktgen.log -T

Copyright (c) <2010-2019>, Intel Corporation. All rights reserved. Powered by
DPDK
EAL: Detected 4 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: No free hugepages reported in hugepages-1048576kB
EAL: Debug dataplane logs available - lower performance
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: WARNING: cpu flags constant_tsc=yes nonstop_tsc=no -> using unreliable
clock cycles !
EAL: PCI device aec9:00:02.0 on NUMA socket 0
EAL:   probe driver: 15b3:1004 net_mlx4
PMD: mlx4.c:564: mlx4_pci_probe(): cannot access device, is mlx4_ib loaded?
EAL: Requested device aec9:00:02.0 cannot be used
Lua 5.3.5  Copyright (C) 1994-2018 Lua.org, PUC-Rio

*** Copyright (c) <2010-2019>, Intel Corporation. All rights reserved.
*** Pktgen created by: Keith Wiles -- >>> Powered by DPDK <<<

!PANIC!: *** Did not find any ports to use ***
PANIC in pktgen_config_ports():
*** Did not find any ports to use ***6:
[./app/x86_64-native-linuxapp-gcc/pktgen_() [0x48726f]]
5: [/lib64/libc.so.6(__libc_start_main+0xf5) [0x7fe75bf693d5]]
4: [./app/x86_64-native-linuxapp-gcc/pktgen_(main+0x630) [0x47efe0]]
3: [./app/x86_64-native-linuxapp-gcc/pktgen_(pktgen_config_ports+0x1611)
[0x4afcb1]]
2: [./app/x86_64-native-linuxapp-gcc/pktgen_(__rte_panic+0xb8) [0x469ec2]]
1: [./app/x86_64-native-linuxapp-gcc/pktgen_(rte_dump_stack+0x1a) [0x582baa]]
./app/x86_64-native-linuxapp-gcc/pktgen: line 7:  6970 Aborted                
$(dirname "$0")/pktgen_ "$@"




Here is what I did:
- installed Mellanox OFED 4.5.1 from "sources"
  wget
http://www.mellanox.com/downloads/ofed/MLNX_OFED-4.5-1.0.1.0/MLNX_OFED_LINUX-4.5-1.0.1.0-rhel7.6-x86_64.tgz
&& tar -zxf MLNX_OFED_LINUX-4.5-1.0.1.0-rhel7.6-x86_64.tgz
  cd MLNX_OFED_LINUX-4.5-1.0.1.0-rhel7.6-x86_64.tgz
  ./mlnxofedinstall --dpdk --upstream-libs --add-kernel-support
--enable-mlnx_tune
This builds and installs all userland OFED components, then installing the kmod
drivers
cd
/tmp/MLNX_OFED_LINUX-4.5-1.0.1.0-3.10.0-957.5.1.el7.x86_64/MLNX_OFED_LINUX-4.5-1.0.1.0-rhel7.6-ext/RPMS
&& yum install
mlnx-ofa_kernel-modules-4.5-OFED.4.5.1.0.1.1.gb4fdfac.kver.3.10.0_957.5.1.el7.x86_64.x86_64.rpm

Now building dpdk-18.11
- download, untar then enable in ./config/common_base
CONFIG_RTE_LIBRTE_MLX4_PMD=y and CONFIG_RTE_LIBRTE_MLX5_PMD="y"
export DPDK_DIR=/opt/dpdk_install/dpdk-18.11
cd $DPDK_DIR
export DPDK_BUILD=$DPDK_DIR/install
export RTE_SDK=$DPDK_DIR
export DPDK_TARGET=x86_64-native-linuxapp-gcc
export RTE_TARGET=x86_64-native-linuxapp-gcc

Enabling the following in config/common_base
CONFIG_RTE_BUILD_SHARED_LIB=n
CONFIG_RTE_LIBRTE_MLX4_PMD=y
CONFIG_RTE_LIBRTE_MLX4_DEBUG=y
CONFIG_RTE_LIBRTE_MLX4_DLOPEN_DEPS=n
CONFIG_RTE_LIBRTE_MLX5_PMD=y
CONFIG_RTE_LIBRTE_MLX5_DEBUG=y
CONFIG_RTE_LIBRTE_MLX5_DLOPEN_DEPS=n

CONFIG_RTE_LOG_DP_LEVEL=RTE_LOG_DEBUG

make config T=$DPDK_TARGET
make install T=$DPDK_TARGET DESTDIR=install

It builds fine - by default with static libraries generated under install/lib -
I can see the generated libs
 ls -l install/lib/*mlx*
-rw-r--r--. 1 root root 2126350 Mar  2 22:12 install/lib/librte_pmd_mlx4.a
-rw-r--r--. 1 root root 6613402 Mar  2 22:12 install/lib/librte_pmd_mlx5.a


# lspci -v -n
...
aec9:00:02.0 0200: 15b3:1004
        Subsystem: 15b3:61b0
        Flags: fast devsel, NUMA node 0
        Memory at fe0800000 (64-bit, prefetchable) [size=8M]
        Capabilities: [60] Express Endpoint, MSI 00
        Capabilities: [9c] MSI-X: Enable- Count=24 Masked-
        Capabilities: [40] Power Management version 0
        Kernel driver in use: vfio-pci
        Kernel modules: mlx4_core

# lsmod | grep mlx
mlx5_fpga_tools        14392  0
mlx5_ib               339996  0
ib_uverbs             125872  3 mlx5_ib,ib_ucm,rdma_ucm
mlx5_core             919535  2 mlx5_ib,mlx5_fpga_tools
mlxfw                  18227  1 mlx5_core
mlx4_ib               211832  0
ib_core               294554  10
rdma_cm,ib_cm,iw_cm,mlx4_ib,mlx5_ib,ib_ucm,ib_umad,ib_uverbs,rdma_ucm,ib_ipoib
mlx4_en               146509  0
mlx4_core             360644  2 mlx4_en,mlx4_ib
mlx_compat             28730  15
rdma_cm,ib_cm,iw_cm,mlx4_en,mlx4_ib,mlx5_ib,mlx5_fpga_tools,ib_ucm,ib_core,ib_umad,ib_uverbs,mlx4_core,mlx5_core,rdma_ucm,ib_ipoib
devlink                48345  4 mlx4_en,mlx4_ib,mlx4_core,mlx5_core
ptp                    19231  3 hv_utils,mlx4_en,mlx5_core


Now testing with testpmd but first some sanity check

# find /lib/modules/3.10.0-957.5.1.el7.x86_64/ -type f -name "*mlx*" | xargs ls
-l
-rwxr--r--. 1 root root   47688 Mar  2 17:34
/lib/modules/3.10.0-957.5.1.el7.x86_64/extra/mlnx-ofa_kernel/compat/mlx_compat.ko
-rwxr--r--. 1 root root  353296 Mar  2 17:34
/lib/modules/3.10.0-957.5.1.el7.x86_64/extra/mlnx-ofa_kernel/drivers/infiniband/hw/mlx4/mlx4_ib.ko
-rwxr--r--. 1 root root  554568 Mar  2 17:34
/lib/modules/3.10.0-957.5.1.el7.x86_64/extra/mlnx-ofa_kernel/drivers/infiniband/hw/mlx5/mlx5_ib.ko
-rwxr--r--. 1 root root  573648 Mar  2 17:34
/lib/modules/3.10.0-957.5.1.el7.x86_64/extra/mlnx-ofa_kernel/drivers/net/ethernet/mellanox/mlx4/mlx4_core.ko
-rwxr--r--. 1 root root  255656 Mar  2 17:34
/lib/modules/3.10.0-957.5.1.el7.x86_64/extra/mlnx-ofa_kernel/drivers/net/ethernet/mellanox/mlx4/mlx4_en.ko
-rwxr--r--. 1 root root 1433680 Mar  2 17:34
/lib/modules/3.10.0-957.5.1.el7.x86_64/extra/mlnx-ofa_kernel/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko
-rwxr--r--. 1 root root   25728 Mar  2 17:35
/lib/modules/3.10.0-957.5.1.el7.x86_64/extra/mlnx-ofa_kernel/drivers/net/ethernet/mellanox/mlx5/fpga/mlx5_fpga_tools.ko
-rwxr--r--. 1 root root   24728 Mar  2 17:35
/lib/modules/3.10.0-957.5.1.el7.x86_64/extra/mlnx-ofa_kernel/drivers/net/ethernet/mellanox/mlxfw/mlxfw.ko


# cat /etc/modprobe.d/ofed_mlx4.conf
(all options commented out)

With mlx4_ib loaded we have 2 pairs of interfaces eth0 + eth2, eth1 + eth3
Each pair is sharing same MAC
# ip link show
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode
DEFAULT group default qlen 1000    link/ether 00:0d:3a:4d:49:98 brd
ff:ff:ff:ff:ff:ff
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode
DEFAULT group default qlen 1000    link/ether 00:0d:3a:18:a1:73 brd
ff:ff:ff:ff:ff:ff
4: eth2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq state UP
mode DEFAULT group default qlen 1000    link/ether 00:0d:3a:4d:49:98 brd
ff:ff:ff:ff:ff:ff
5: eth3: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq state UP
mode DEFAULT group default qlen 1000    link/ether 00:0d:3a:18:a1:73 brd
ff:ff:ff:ff:ff:ff

lshw | less
     *-network:1
          description: Ethernet interface
          product: MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual
Function]
          vendor: Mellanox Technologies
          physical id: 2
          bus info: pci at aec9:00:02.0
          logical name: eth3
          version: 00
          serial: 00:0d:3a:18:a1:73
          width: 64 bits
          clock: 33MHz
          capabilities: pciexpress msix pm bus_master cap_list ethernet
physical fibre autonegotiation
          configuration: autonegotiation=on broadcast=yes driver=mlx4_en
driverversion=4.5-1.0.1 duplex=full firmware=2.41.7004 latency=0 link=yes
multicast=yes slave=yes
          resources: iomemory:f0-ef irq:0 memory:fe0800000-fe0ffffff


Using second pair with DPDK (1'st pair has the mgmt IP) 

# dpdk-devbind.py --force --bind mlx4_core aec9:00:02.0
# dpdk-devbind.py -s
Network devices using DPDK-compatible driver
============================================
aec9:00:02.0 'MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual
Function] 1004' drv=vfio-pci unused=mlx4_core

Other Network devices
=====================
a3f2:00:02.0 'MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual
Function] 1004' unused=mlx4_core,vfio-pci


We have 4 cores and huge mem pages
# cpu_layout.py
======================================================================
Core and Socket Information (as reported by '/sys/devices/system/cpu')
======================================================================
cores =  [0, 1]
sockets =  [0]
       Socket 0
       --------
Core 0 [0, 1]
Core 1 [2, 3]

# grep -i huge /proc/meminfo
AnonHugePages:     20480 kB
HugePages_Total:    2048
HugePages_Free:     2048
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

# mount | grep -i huge
cgroup on /sys/fs/cgroup/hugetlb type cgroup
(rw,nosuid,nodev,noexec,relatime,seclabel,hugetlb)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,seclabel)
hugetlbfs on /mnt/huge type hugetlbfs (rw,relatime,seclabel)
none on /mnt/huge_2mb type hugetlbfs (rw,relatime,seclabel,pagesize=2MB)

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the dev mailing list