[dpdk-dev] Mellanox PMD failure w/DPDK-2.1.0 and MLNX_OFED-3.1-1.0.3

Olga Shern olgas at mellanox.com
Thu Oct 15 14:50:02 CEST 2015


Hi Bill,

Sorry it took me a while to reply ….
We did more tests and didn’t reproduce the issue.
I also checked the code and seems that there are only 2  conditions when RD creation fails,

1.       The arguments we are passing to the RD creation function are wrong – this is not reasonable, because this is PMD code and here the behavior is not deterministic , works in most cases and doesn’t work on your setup …

2.       calloc function is failing – also not reasonable

There is a verb  application that uses accelerated verbs and res domains, raw_ethernet_bw
Example:
raw_ethernet_bw -d mlx4_0  -i 1  --client -E 00:00:00:00:01:02 --use_res_domain --verb_type=accl

Another suggestion, can you please compile PMD with debug enabled, it may give more details …

Best Regards,
Olga

From: Bill O'Hara [mailto:billtohara at gmail.com]
Sent: Saturday, October 10, 2015 12:18 AM
To: Olga Shern <olgas at mellanox.com>
Cc: dev at dpdk.org
Subject: Re: [dpdk-dev] Mellanox PMD failure w/DPDK-2.1.0 and MLNX_OFED-3.1-1.0.3

Hi Olga

Thanks for the pointer towards the use of "accelerated verbs".

Yes, SRIOV is enabled, dpdk on the hypervisor on the probed VFs. That said, it also fails on the underlying PF as far as I see (e.g. below the log shows (VF: false) for device mlx4_0 and the code fails in RD creation on this as well as on one of the VFs). I don't see any messages generated in dmesg that seem to indicate errors at any point, but extract included below.

But here's perhaps the crux! Switching off sriov and running with the new combination of dpdk and ofed against just a single PF also fails in exactly the same way (RD creation failure).

The old code continues to work. I will audit our code to make sure we're not missing something when using dpdk-2.1. In the meantime, do you have a minimal test that involves RD creation?

thanks
bill



// DPDK output for application run using dpdk-2.1 and ofed 3.1
EAL: Detected lcore 0 as core 0 on socket 0
EAL: Detected lcore 1 as core 1 on socket 0
EAL: Detected lcore 2 as core 2 on socket 0
EAL: Detected lcore 3 as core 3 on socket 0
EAL: Detected lcore 4 as core 4 on socket 0
EAL: Detected lcore 5 as core 5 on socket 0
EAL: Detected lcore 6 as core 0 on socket 0
EAL: Detected lcore 7 as core 1 on socket 0
EAL: Detected lcore 8 as core 2 on socket 0
EAL: Detected lcore 9 as core 3 on socket 0
EAL: Detected lcore 10 as core 4 on socket 0
EAL: Detected lcore 11 as core 5 on socket 0
EAL: Support maximum 128 logical core(s) by configuration.
EAL: Detected 12 lcore(s)
EAL: VFIO modules not all loaded, skip VFIO support...
EAL: Setting up physically contiguous memory...
EAL: Ask a virtual area of 0xe400000 bytes
EAL: Virtual area found at 0x7fffe6000000 (size = 0xe400000)
EAL: Ask a virtual area of 0x200000 bytes
EAL: Virtual area found at 0x7fffe5c00000 (size = 0x200000)
EAL: Ask a virtual area of 0x71800000 bytes
EAL: Virtual area found at 0x7fff74200000 (size = 0x71800000)
EAL: Ask a virtual area of 0x200000 bytes
EAL: Virtual area found at 0x7fff73e00000 (size = 0x200000)
EAL: Requesting 512 pages of size 2MB from socket 0
EAL: TSC frequency is ~2394453 KHz
EAL: Master lcore 0 is ready (tid=f7fe7940;cpuset=[0])
EAL: lcore 1 is ready (tid=e53fe700;cpuset=[1])
EAL: lcore 2 is ready (tid=e4bfd700;cpuset=[2])
EAL: lcore 3 is ready (tid=e43fc700;cpuset=[3])
EAL: PCI device 0000:01:00.0 on NUMA socket 0
EAL:   probe driver: 15b3:1003 librte_pmd_mlx4
PMD: librte_pmd_mlx4: PCI information matches, using device "mlx4_0" (VF: false)
PMD: librte_pmd_mlx4: 1 port(s) detected
PMD: librte_pmd_mlx4: port 1 MAC address is f4:52:14:8f:16:80
EAL: PCI device 0000:01:00.1 on NUMA socket 0
EAL:   probe driver: 15b3:1004 librte_pmd_mlx4
PMD: librte_pmd_mlx4: PCI information matches, using device "mlx4_1" (VF: true)
PMD: librte_pmd_mlx4: 1 port(s) detected
PMD: librte_pmd_mlx4: port 1 MAC address is b2:00:7c:2b:3f:47
EAL: PCI device 0000:01:00.2 on NUMA socket 0
EAL:   probe driver: 15b3:1004 librte_pmd_mlx4
PMD: librte_pmd_mlx4: PCI information matches, using device "mlx4_2" (VF: true)
PMD: librte_pmd_mlx4: 1 port(s) detected
PMD: librte_pmd_mlx4: port 1 MAC address is 3a:3d:c7:e0:ed:5a
EAL: PCI device 0000:01:00.3 on NUMA socket 0
EAL:   probe driver: 15b3:1004 librte_pmd_mlx4
PMD: librte_pmd_mlx4: PCI information matches, using device "mlx4_3" (VF: true)
PMD: librte_pmd_mlx4: 1 port(s) detected
PMD: librte_pmd_mlx4: port 1 MAC address is ee:6a:a6:79:24:4c
EAL: PCI device 0000:01:00.4 on NUMA socket 0
EAL:   probe driver: 15b3:1004 librte_pmd_mlx4
PMD: librte_pmd_mlx4: PCI information matches, using device "mlx4_4" (VF: true)
PMD: librte_pmd_mlx4: 1 port(s) detected
PMD: librte_pmd_mlx4: port 1 MAC address is 8a:7a:30:00:46:33
EAL: PCI device 0000:01:00.5 on NUMA socket 0
EAL:   probe driver: 15b3:1004 librte_pmd_mlx4
PMD: librte_pmd_mlx4: cannot access device, is mlx4_ib loaded?
EAL: PCI device 0000:01:00.6 on NUMA socket 0
EAL:   probe driver: 15b3:1004 librte_pmd_mlx4
PMD: librte_pmd_mlx4: cannot access device, is mlx4_ib loaded?
EAL: PCI device 0000:01:00.7 on NUMA socket 0
EAL:   probe driver: 15b3:1004 librte_pmd_mlx4
PMD: librte_pmd_mlx4: cannot access device, is mlx4_ib loaded?
EAL: PCI device 0000:01:01.0 on NUMA socket 0
EAL:   probe driver: 15b3:1004 librte_pmd_mlx4
PMD: librte_pmd_mlx4: cannot access device, is mlx4_ib loaded?
EAL: PCI device 0000:01:01.1 on NUMA socket 0
EAL:   probe driver: 15b3:1004 librte_pmd_mlx4
PMD: librte_pmd_mlx4: cannot access device, is mlx4_ib loaded?
EAL: PCI device 0000:01:01.2 on NUMA socket 0
EAL:   probe driver: 15b3:1004 librte_pmd_mlx4
PMD: librte_pmd_mlx4: cannot access device, is mlx4_ib loaded?
EAL: PCI device 0000:01:01.3 on NUMA socket 0
EAL:   probe driver: 15b3:1004 librte_pmd_mlx4
PMD: librte_pmd_mlx4: cannot access device, is mlx4_ib loaded?
EAL: PCI device 0000:01:01.4 on NUMA socket 0
EAL:   probe driver: 15b3:1004 librte_pmd_mlx4
PMD: librte_pmd_mlx4: cannot access device, is mlx4_ib loaded?
EAL: PCI device 0000:01:01.5 on NUMA socket 0
EAL:   probe driver: 15b3:1004 librte_pmd_mlx4
PMD: librte_pmd_mlx4: cannot access device, is mlx4_ib loaded?
EAL: PCI device 0000:01:01.6 on NUMA socket 0
EAL:   probe driver: 15b3:1004 librte_pmd_mlx4
PMD: librte_pmd_mlx4: cannot access device, is mlx4_ib loaded?
EAL: PCI device 0000:01:01.7 on NUMA socket 0
EAL:   probe driver: 15b3:1004 librte_pmd_mlx4
PMD: librte_pmd_mlx4: cannot access device, is mlx4_ib loaded?
EAL: PCI device 0000:01:02.0 on NUMA socket 0
EAL:   probe driver: 15b3:1004 librte_pmd_mlx4
PMD: librte_pmd_mlx4: cannot access device, is mlx4_ib loaded?
EAL: PCI device 0000:04:00.0 on NUMA socket 0
EAL:   probe driver: 8086:1521 rte_igb_pmd
EAL:   Not managed by a supported kernel driver, skipped
PMD: librte_pmd_mlx4: 0xa50ca0: TX queues number update: 0 -> 1
PMD: librte_pmd_mlx4: 0xa50ca0: RX queues number update: 0 -> 1
PMD: librte_pmd_mlx4: 0xa50ca0: RD creation failure: Cannot allocate memory
<app> panic: rx queue setup failed: ENOMEM for port 0
...


// dmesg output related to mellanox, no extra messages are generated when running the app
[    6.028657] mlx4_core: Mellanox ConnectX core driver v3.1-1.0.3 (29 Sep 2015)
[    6.028667] mlx4_core: Initializing 0000:01:00.0
[    6.561768] cgroup: systemd-logind (534) created nested cgroup for controller "memory" which has incomplete hierarchy support. Nested cgroups may change behavior in the future.
[    6.561771] cgroup: "memory" requires setting use_hierarchy to 1 on the root
[    6.781918] random: nonblocking pool is initialized
[   11.394134] mlx4_core: device is working in RoCE mode: Roce V1
[   11.394137] mlx4_core: gid_type 1 for UD QPs is not supported by the devicegid_type 0 was chosen instead
[   11.394138] mlx4_core: UD QP Gid type is: V1
[   13.084044] mlx4_core 0000:01:00.0: Enabling SR-IOV with 16 VFs
[   13.185148] pci 0000:01:00.1: [15b3:1004] type 00 class 0x028000
[   13.192291] mlx4_core: Initializing 0000:01:00.1
[   13.192324] mlx4_core 0000:01:00.1: enabling device (0000 -> 0002)
[   13.193194] mlx4_core 0000:01:00.1: Detected virtual function - running in slave mode
[   13.193215] mlx4_core 0000:01:00.1: PF is not ready - Deferring probe
[   13.193646] pci 0000:01:00.1: Driver mlx4_core requests probe deferral
[   13.193788] pci 0000:01:00.2: [15b3:1004] type 00 class 0x028000
[   13.200894] mlx4_core: Initializing 0000:01:00.2
[   13.200927] mlx4_core 0000:01:00.2: enabling device (0000 -> 0002)
[   13.201783] mlx4_core 0000:01:00.2: Detected virtual function - running in slave mode
[   13.201804] mlx4_core 0000:01:00.2: PF is not ready - Deferring probe
[   13.202229] pci 0000:01:00.2: Driver mlx4_core requests probe deferral
[   13.202363] pci 0000:01:00.3: [15b3:1004] type 00 class 0x028000
[   13.209468] mlx4_core: Initializing 0000:01:00.3
[   13.209498] mlx4_core 0000:01:00.3: enabling device (0000 -> 0002)
[   13.210378] mlx4_core 0000:01:00.3: Detected virtual function - running in slave mode
[   13.210398] mlx4_core 0000:01:00.3: PF is not ready - Deferring probe
[   13.210823] pci 0000:01:00.3: Driver mlx4_core requests probe deferral
[   13.210956] pci 0000:01:00.4: [15b3:1004] type 00 class 0x028000
[   13.218050] mlx4_core: Initializing 0000:01:00.4
[   13.218079] mlx4_core 0000:01:00.4: enabling device (0000 -> 0002)
[   13.218962] mlx4_core 0000:01:00.4: Detected virtual function - running in slave mode
[   13.218981] mlx4_core 0000:01:00.4: PF is not ready - Deferring probe
[   13.219407] pci 0000:01:00.4: Driver mlx4_core requests probe deferral
[   13.219541] pci 0000:01:00.5: [15b3:1004] type 00 class 0x028000
[   13.226628] mlx4_core: Initializing 0000:01:00.5
[   13.226658] mlx4_core 0000:01:00.5: enabling device (0000 -> 0002)
[   13.227487] mlx4_core 0000:01:00.5: Skipping virtual function:5
[   13.228041] pci 0000:01:00.6: [15b3:1004] type 00 class 0x028000
[   13.235149] mlx4_core: Initializing 0000:01:00.6
[   13.235178] mlx4_core 0000:01:00.6: enabling device (0000 -> 0002)
[   13.236005] mlx4_core 0000:01:00.6: Skipping virtual function:6
[   13.236558] pci 0000:01:00.7: [15b3:1004] type 00 class 0x028000
[   13.243666] mlx4_core: Initializing 0000:01:00.7
[   13.243696] mlx4_core 0000:01:00.7: enabling device (0000 -> 0002)
[   13.244525] mlx4_core 0000:01:00.7: Skipping virtual function:7
[   13.245074] pci 0000:01:01.0: [15b3:1004] type 00 class 0x028000
[   13.252175] mlx4_core: Initializing 0000:01:01.0
[   13.252206] mlx4_core 0000:01:01.0: enabling device (0000 -> 0002)
[   13.253038] mlx4_core 0000:01:01.0: Skipping virtual function:8
[   13.253592] pci 0000:01:01.1: [15b3:1004] type 00 class 0x028000
[   13.260693] mlx4_core: Initializing 0000:01:01.1
[   13.260725] mlx4_core 0000:01:01.1: enabling device (0000 -> 0002)
[   13.261555] mlx4_core 0000:01:01.1: Skipping virtual function:9
[   13.262106] pci 0000:01:01.2: [15b3:1004] type 00 class 0x028000
[   13.269201] mlx4_core: Initializing 0000:01:01.2
[   13.269232] mlx4_core 0000:01:01.2: enabling device (0000 -> 0002)
[   13.270062] mlx4_core 0000:01:01.2: Skipping virtual function:10
[   13.270615] pci 0000:01:01.3: [15b3:1004] type 00 class 0x028000
[   13.277699] mlx4_core: Initializing 0000:01:01.3
[   13.277731] mlx4_core 0000:01:01.3: enabling device (0000 -> 0002)
[   13.278573] mlx4_core 0000:01:01.3: Skipping virtual function:11
[   13.279129] pci 0000:01:01.4: [15b3:1004] type 00 class 0x028000
[   13.286210] mlx4_core: Initializing 0000:01:01.4
[   13.286242] mlx4_core 0000:01:01.4: enabling device (0000 -> 0002)
[   13.287070] mlx4_core 0000:01:01.4: Skipping virtual function:12
[   13.287622] pci 0000:01:01.5: [15b3:1004] type 00 class 0x028000
[   13.294705] mlx4_core: Initializing 0000:01:01.5
[   13.294736] mlx4_core 0000:01:01.5: enabling device (0000 -> 0002)
[   13.295566] mlx4_core 0000:01:01.5: Skipping virtual function:13
[   13.296120] pci 0000:01:01.6: [15b3:1004] type 00 class 0x028000
[   13.303210] mlx4_core: Initializing 0000:01:01.6
[   13.303241] mlx4_core 0000:01:01.6: enabling device (0000 -> 0002)
[   13.304072] mlx4_core 0000:01:01.6: Skipping virtual function:14
[   13.304624] pci 0000:01:01.7: [15b3:1004] type 00 class 0x028000
[   13.311716] mlx4_core: Initializing 0000:01:01.7
[   13.311747] mlx4_core 0000:01:01.7: enabling device (0000 -> 0002)
[   13.312575] mlx4_core 0000:01:01.7: Skipping virtual function:15
[   13.313133] pci 0000:01:02.0: [15b3:1004] type 00 class 0x028000
[   13.320222] mlx4_core: Initializing 0000:01:02.0
[   13.320254] mlx4_core 0000:01:02.0: enabling device (0000 -> 0002)
[   13.321089] mlx4_core 0000:01:02.0: Skipping virtual function:16
[   13.321522] mlx4_core 0000:01:00.0: Running in master mode
[   13.321582] mlx4_core 0000:01:00.0: PCIe link speed is 8.0GT/s, device supports 8.0GT/s
[   13.321583] mlx4_core 0000:01:00.0: PCIe link width is x8, device supports x8
[   13.323575] mlx4_core 0000:01:00.0: irq 46 for MSI/MSI-X
[   13.323578] mlx4_core 0000:01:00.0: irq 47 for MSI/MSI-X
[   13.323581] mlx4_core 0000:01:00.0: irq 48 for MSI/MSI-X
[   13.323583] mlx4_core 0000:01:00.0: irq 49 for MSI/MSI-X
[   13.323586] mlx4_core 0000:01:00.0: irq 50 for MSI/MSI-X
[   13.323588] mlx4_core 0000:01:00.0: irq 51 for MSI/MSI-X
[   13.323591] mlx4_core 0000:01:00.0: irq 52 for MSI/MSI-X
[   13.323593] mlx4_core 0000:01:00.0: irq 53 for MSI/MSI-X
[   13.323596] mlx4_core 0000:01:00.0: irq 54 for MSI/MSI-X
[   13.323598] mlx4_core 0000:01:00.0: irq 55 for MSI/MSI-X
[   13.323601] mlx4_core 0000:01:00.0: irq 56 for MSI/MSI-X
[   13.323604] mlx4_core 0000:01:00.0: irq 57 for MSI/MSI-X
[   13.323606] mlx4_core 0000:01:00.0: irq 58 for MSI/MSI-X
[   13.361642] mlx4_core: Initializing 0000:01:00.1
[   13.361676] mlx4_core 0000:01:00.1: enabling device (0000 -> 0002)
[   13.362546] mlx4_core 0000:01:00.1: Detected virtual function - running in slave mode
[   13.362576] mlx4_core 0000:01:00.1: Sending reset
[   13.362636] mlx4_core 0000:01:00.0: Received reset from slave:1
[   13.363170] mlx4_core 0000:01:00.1: Sending vhcr0
[   13.364464] mlx4_core 0000:01:00.1: HCA minimum page size:512
[   13.365084] mlx4_core 0000:01:00.1: Timestamping is not supported in slave mode
[   13.365086] mlx4_core: device is working in RoCE mode: Roce V1
[   13.365087] mlx4_core: gid_type 1 for UD QPs is not supported by the devicegid_type 0 was chosen instead
[   13.365090] mlx4_core: UD QP Gid type is: V1
[   13.366060] mlx4_core 0000:01:00.1: irq 59 for MSI/MSI-X
[   13.366064] mlx4_core 0000:01:00.1: irq 60 for MSI/MSI-X
[   13.366066] mlx4_core 0000:01:00.1: irq 61 for MSI/MSI-X
[   13.366069] mlx4_core 0000:01:00.1: irq 62 for MSI/MSI-X
[   13.366071] mlx4_core 0000:01:00.1: irq 63 for MSI/MSI-X
[   13.366074] mlx4_core 0000:01:00.1: irq 64 for MSI/MSI-X
[   13.366077] mlx4_core 0000:01:00.1: irq 65 for MSI/MSI-X
[   13.366079] mlx4_core 0000:01:00.1: irq 66 for MSI/MSI-X
[   13.366082] mlx4_core 0000:01:00.1: irq 67 for MSI/MSI-X
[   13.366084] mlx4_core 0000:01:00.1: irq 68 for MSI/MSI-X
[   13.366087] mlx4_core 0000:01:00.1: irq 69 for MSI/MSI-X
[   13.366090] mlx4_core 0000:01:00.1: irq 70 for MSI/MSI-X
[   13.366092] mlx4_core 0000:01:00.1: irq 71 for MSI/MSI-X
[   13.412102] mlx4_core: Initializing 0000:01:00.2
[   13.412136] mlx4_core 0000:01:00.2: enabling device (0000 -> 0002)
[   13.413013] mlx4_core 0000:01:00.2: Detected virtual function - running in slave mode
[   13.413047] mlx4_core 0000:01:00.2: Sending reset
[   13.413095] mlx4_core 0000:01:00.0: Received reset from slave:2
[   13.413539] mlx4_core 0000:01:00.2: Sending vhcr0
[   13.414861] mlx4_core 0000:01:00.2: HCA minimum page size:512
[   13.415455] mlx4_core 0000:01:00.2: Timestamping is not supported in slave mode
[   13.415456] mlx4_core: device is working in RoCE mode: Roce V1
[   13.415458] mlx4_core: gid_type 1 for UD QPs is not supported by the devicegid_type 0 was chosen instead
[   13.415458] mlx4_core: UD QP Gid type is: V1
[   13.416421] mlx4_core 0000:01:00.2: irq 72 for MSI/MSI-X
[   13.416424] mlx4_core 0000:01:00.2: irq 73 for MSI/MSI-X
[   13.416426] mlx4_core 0000:01:00.2: irq 74 for MSI/MSI-X
[   13.416429] mlx4_core 0000:01:00.2: irq 75 for MSI/MSI-X
[   13.416431] mlx4_core 0000:01:00.2: irq 76 for MSI/MSI-X
[   13.416434] mlx4_core 0000:01:00.2: irq 77 for MSI/MSI-X
[   13.416436] mlx4_core 0000:01:00.2: irq 78 for MSI/MSI-X
[   13.416439] mlx4_core 0000:01:00.2: irq 79 for MSI/MSI-X
[   13.416442] mlx4_core 0000:01:00.2: irq 80 for MSI/MSI-X
[   13.416444] mlx4_core 0000:01:00.2: irq 81 for MSI/MSI-X
[   13.416447] mlx4_core 0000:01:00.2: irq 82 for MSI/MSI-X
[   13.416449] mlx4_core 0000:01:00.2: irq 83 for MSI/MSI-X
[   13.416452] mlx4_core 0000:01:00.2: irq 84 for MSI/MSI-X
[   13.471412] mlx4_core: Initializing 0000:01:00.3
[   13.471444] mlx4_core 0000:01:00.3: enabling device (0000 -> 0002)
[   13.472320] mlx4_core 0000:01:00.3: Detected virtual function - running in slave mode
[   13.472352] mlx4_core 0000:01:00.3: Sending reset
[   13.472400] mlx4_core 0000:01:00.0: Received reset from slave:3
[   13.472842] mlx4_core 0000:01:00.3: Sending vhcr0
[   13.474183] mlx4_core 0000:01:00.3: HCA minimum page size:512
[   13.474798] mlx4_core 0000:01:00.3: Timestamping is not supported in slave mode
[   13.474800] mlx4_core: device is working in RoCE mode: Roce V1
[   13.474801] mlx4_core: gid_type 1 for UD QPs is not supported by the devicegid_type 0 was chosen instead
[   13.474803] mlx4_core: UD QP Gid type is: V1
[   13.475758] mlx4_core 0000:01:00.3: irq 85 for MSI/MSI-X
[   13.475761] mlx4_core 0000:01:00.3: irq 86 for MSI/MSI-X
[   13.475764] mlx4_core 0000:01:00.3: irq 87 for MSI/MSI-X
[   13.475766] mlx4_core 0000:01:00.3: irq 88 for MSI/MSI-X
[   13.475769] mlx4_core 0000:01:00.3: irq 89 for MSI/MSI-X
[   13.475772] mlx4_core 0000:01:00.3: irq 90 for MSI/MSI-X
[   13.475774] mlx4_core 0000:01:00.3: irq 91 for MSI/MSI-X
[   13.475777] mlx4_core 0000:01:00.3: irq 92 for MSI/MSI-X
[   13.475779] mlx4_core 0000:01:00.3: irq 93 for MSI/MSI-X
[   13.475782] mlx4_core 0000:01:00.3: irq 94 for MSI/MSI-X
[   13.475784] mlx4_core 0000:01:00.3: irq 95 for MSI/MSI-X
[   13.475787] mlx4_core 0000:01:00.3: irq 96 for MSI/MSI-X
[   13.475789] mlx4_core 0000:01:00.3: irq 97 for MSI/MSI-X
[   13.521463] mlx4_core: Initializing 0000:01:00.4
[   13.521494] mlx4_core 0000:01:00.4: enabling device (0000 -> 0002)
[   13.522370] mlx4_core 0000:01:00.4: Detected virtual function - running in slave mode
[   13.522401] mlx4_core 0000:01:00.4: Sending reset
[   13.522447] mlx4_core 0000:01:00.0: Received reset from slave:4
[   13.522894] mlx4_core 0000:01:00.4: Sending vhcr0
[   13.524209] mlx4_core 0000:01:00.4: HCA minimum page size:512
[   13.524788] mlx4_core 0000:01:00.4: Timestamping is not supported in slave mode
[   13.524790] mlx4_core: device is working in RoCE mode: Roce V1
[   13.524791] mlx4_core: gid_type 1 for UD QPs is not supported by the devicegid_type 0 was chosen instead
[   13.524792] mlx4_core: UD QP Gid type is: V1
[   13.525788] mlx4_core 0000:01:00.4: irq 98 for MSI/MSI-X
[   13.525791] mlx4_core 0000:01:00.4: irq 99 for MSI/MSI-X
[   13.525793] mlx4_core 0000:01:00.4: irq 100 for MSI/MSI-X
[   13.525796] mlx4_core 0000:01:00.4: irq 101 for MSI/MSI-X
[   13.525798] mlx4_core 0000:01:00.4: irq 102 for MSI/MSI-X
[   13.525801] mlx4_core 0000:01:00.4: irq 103 for MSI/MSI-X
[   13.525803] mlx4_core 0000:01:00.4: irq 104 for MSI/MSI-X
[   13.525806] mlx4_core 0000:01:00.4: irq 105 for MSI/MSI-X
[   13.525808] mlx4_core 0000:01:00.4: irq 106 for MSI/MSI-X
[   13.525811] mlx4_core 0000:01:00.4: irq 107 for MSI/MSI-X
[   13.525814] mlx4_core 0000:01:00.4: irq 108 for MSI/MSI-X
[   13.525816] mlx4_core 0000:01:00.4: irq 109 for MSI/MSI-X
[   13.525819] mlx4_core 0000:01:00.4: irq 110 for MSI/MSI-X
[   13.571366] mlx4_core: Initializing 0000:01:00.5
[   13.571397] mlx4_core 0000:01:00.5: enabling device (0000 -> 0002)
[   13.572227] mlx4_core 0000:01:00.5: Skipping virtual function:5
[   13.572664] mlx4_core: Initializing 0000:01:00.6
[   13.572696] mlx4_core 0000:01:00.6: enabling device (0000 -> 0002)
[   13.573534] mlx4_core 0000:01:00.6: Skipping virtual function:6
[   13.573972] mlx4_core: Initializing 0000:01:00.7
[   13.574002] mlx4_core 0000:01:00.7: enabling device (0000 -> 0002)
[   13.574837] mlx4_core 0000:01:00.7: Skipping virtual function:7
[   13.575260] mlx4_core: Initializing 0000:01:01.0
[   13.575289] mlx4_core 0000:01:01.0: enabling device (0000 -> 0002)
[   13.576119] mlx4_core 0000:01:01.0: Skipping virtual function:8
[   13.576543] mlx4_core: Initializing 0000:01:01.1
[   13.576574] mlx4_core 0000:01:01.1: enabling device (0000 -> 0002)
[   13.577405] mlx4_core 0000:01:01.1: Skipping virtual function:9
[   13.577832] mlx4_core: Initializing 0000:01:01.2
[   13.577859] mlx4_core 0000:01:01.2: enabling device (0000 -> 0002)
[   13.578699] mlx4_core 0000:01:01.2: Skipping virtual function:10
[   13.579125] mlx4_core: Initializing 0000:01:01.3
[   13.579152] mlx4_core 0000:01:01.3: enabling device (0000 -> 0002)
[   13.579981] mlx4_core 0000:01:01.3: Skipping virtual function:11
[   13.580404] mlx4_core: Initializing 0000:01:01.4
[   13.580435] mlx4_core 0000:01:01.4: enabling device (0000 -> 0002)
[   13.581265] mlx4_core 0000:01:01.4: Skipping virtual function:12
[   13.581690] mlx4_core: Initializing 0000:01:01.5
[   13.581719] mlx4_core 0000:01:01.5: enabling device (0000 -> 0002)
[   13.582548] mlx4_core 0000:01:01.5: Skipping virtual function:13
[   13.582971] mlx4_core: Initializing 0000:01:01.6
[   13.583001] mlx4_core 0000:01:01.6: enabling device (0000 -> 0002)
[   13.583831] mlx4_core 0000:01:01.6: Skipping virtual function:14
[   13.584254] mlx4_core: Initializing 0000:01:01.7
[   13.584284] mlx4_core 0000:01:01.7: enabling device (0000 -> 0002)
[   13.585115] mlx4_core 0000:01:01.7: Skipping virtual function:15
[   13.585541] mlx4_core: Initializing 0000:01:02.0
[   13.585572] mlx4_core 0000:01:02.0: enabling device (0000 -> 0002)
[   13.586403] mlx4_core 0000:01:02.0: Skipping virtual function:16
[   13.602383] mlx4_en: Mellanox ConnectX HCA Ethernet driver v3.1-1.0.3 (29 Sep 2015)
[   13.602533] mlx4_en 0000:01:00.0: registered PHC clock
[   13.602566] mlx4_en 0000:01:00.0: Activating port:1
[   13.612433] mlx4_en: 0000:01:00.0: Port 1: Using 96 TX rings
[   13.612436] mlx4_en: 0000:01:00.0: Port 1: Using 8 RX rings
[   13.612438] mlx4_en: 0000:01:00.0: Port 1:   frag:0 - size:1522 prefix:0 stride:1536
[   13.612581] mlx4_en: 0000:01:00.0: Port 1: Initializing port
[   13.614831] mlx4_en 0000:01:00.1: Activating port:1
[   13.614874] mlx4_en: 0000:01:00.1: Port 1: Assigned random MAC address b2:00:7c:2b:3f:47
[   13.625961] mlx4_core 0000:01:00.0 eth2: renamed from eth1
[   13.645392] systemd-udevd[1623]: renamed network interface eth1 to eth2
[   13.676195] mlx4_en: 0000:01:00.1: Port 1: Using 96 TX rings
[   13.676199] mlx4_en: 0000:01:00.1: Port 1: Using 8 RX rings
[   13.676201] mlx4_en: 0000:01:00.1: Port 1:   frag:0 - size:1522 prefix:0 stride:1536
[   13.676410] mlx4_en: 0000:01:00.1: Port 1: Initializing port
[   13.677547] mlx4_en 0000:01:00.2: Activating port:1
[   13.677591] mlx4_en: 0000:01:00.2: Port 1: Assigned random MAC address 3a:3d:c7:e0:ed:5a
[   13.725134] mlx4_en: 0000:01:00.2: Port 1: Using 96 TX rings
[   13.725136] mlx4_en: 0000:01:00.2: Port 1: Using 8 RX rings
[   13.725139] mlx4_en: 0000:01:00.2: Port 1:   frag:0 - size:1522 prefix:0 stride:1536
[   13.725347] mlx4_en: 0000:01:00.2: Port 1: Initializing port
[   13.726519] mlx4_en 0000:01:00.3: Activating port:1
[   13.726562] mlx4_en: 0000:01:00.3: Port 1: Assigned random MAC address ee:6a:a6:79:24:4c
[   13.730055] mlx4_en: eth2:   frag:0 - size:1522 prefix:0 stride:1536
[   13.806947] IPv6: ADDRCONF(NETDEV_UP): eth2: link is not ready
[   13.807584] mlx4_core 0000:01:00.2 p3p3: renamed from eth3
[   13.832966] mlx4_en: 0000:01:00.3: Port 1: Using 96 TX rings
[   13.832970] mlx4_en: 0000:01:00.3: Port 1: Using 8 RX rings
[   13.832973] mlx4_en: 0000:01:00.3: Port 1:   frag:0 - size:1522 prefix:0 stride:1536
[   13.833164] mlx4_en: 0000:01:00.3: Port 1: Initializing port
[   13.833585] mlx4_core 0000:01:00.1 p3p2: renamed from eth1
[   13.833587] systemd-udevd[1623]: renamed network interface eth3 to p3p3
[   13.861599] systemd-udevd[1706]: renamed network interface eth1 to p3p2
[   13.863234] mlx4_en 0000:01:00.4: Activating port:1
[   13.863278] mlx4_en: 0000:01:00.4: Port 1: Assigned random MAC address 8a:7a:30:00:46:33
[   13.879891] mlx4_core 0000:01:00.3 p3p4: renamed from eth1
[   13.897695] systemd-udevd[1623]: renamed network interface eth1 to p3p4
[   13.898662] mlx4_en: p3p2:   frag:0 - size:1522 prefix:0 stride:1536
[   13.977528] mlx4_en: 0000:01:00.4: Port 1: Using 96 TX rings
[   13.977532] mlx4_en: 0000:01:00.4: Port 1: Using 8 RX rings
[   13.977535] mlx4_en: 0000:01:00.4: Port 1:   frag:0 - size:1522 prefix:0 stride:1536
[   13.978073] mlx4_en: 0000:01:00.4: Port 1: Initializing port
[   14.030310] IPv6: ADDRCONF(NETDEV_UP): p3p2: link is not ready
[   14.052471] <mlx4_ib> mlx4_ib_add: mlx4_ib: Mellanox ConnectX InfiniBand driver v3.1-1.0.3 (29 Sep 2015)
[   14.052905] mlx4_core 0000:01:00.0: mlx4_ib_add: allocated counter index 1 for port 1
[   14.055206] mlx4_core 0000:01:00.4 p3p5: renamed from eth1
[   14.058229] mlx4_core 0000:01:00.0: mlx4_ib: multi-function enabled
[   14.059539] mlx4_core 0000:01:00.0: mlx4_ib: initializing demux service for 128 qp1 clients
[   14.081807] systemd-udevd[1623]: renamed network interface eth1 to p3p5
[   14.082657] mlx4_core 0000:01:00.1: mlx4_ib_add: allocated counter index 18 for port 1
[   14.090157] mlx4_core 0000:01:00.1: mlx4_ib: multi-function enabled
[   14.090160] mlx4_core 0000:01:00.1: mlx4_ib: operating in qp1 tunnel mode
[   14.092727] mlx4_core 0000:01:00.2: mlx4_ib_add: allocated counter index 19 for port 1
[   14.103712] mlx4_core 0000:01:00.2: mlx4_ib: multi-function enabled
[   14.103715] mlx4_core 0000:01:00.2: mlx4_ib: operating in qp1 tunnel mode
[   14.104441] mlx4_core 0000:01:00.3: mlx4_ib_add: allocated counter index 20 for port 1
[   14.110327] mlx4_core 0000:01:00.3: mlx4_ib: multi-function enabled
[   14.110330] mlx4_core 0000:01:00.3: mlx4_ib: operating in qp1 tunnel mode
[   14.111063] mlx4_core 0000:01:00.4: mlx4_ib_add: allocated counter index 21 for port 1
[   14.119068] mlx4_core 0000:01:00.4: mlx4_ib: multi-function enabled
[   14.119071] mlx4_core 0000:01:00.4: mlx4_ib: operating in qp1 tunnel mode
[   14.764446] init: plymouth-upstart-bridge main process ended, respawning
[   16.188261] mlx4_en: eth2: Link Up
[   16.188288] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
[   16.188291] mlx4_en: p3p2: Link Up
[   16.188321] mlx4_en: p3p3: Link Up
[   16.188335] mlx4_en: p3p4: Link Up
[   16.188339] IPv6: ADDRCONF(NETDEV_CHANGE): p3p2: link becomes ready
[   16.188351] mlx4_en: p3p5: Link Up
[  421.285141] Bits 55-60 of /proc/PID/pagemap entries are about to stop being page-shift some time soon. See the linux/Documentation/vm/pagemap.txt for details.
[26236.560789] mlx4_en: p3p3:   frag:0 - size:1522 prefix:0 stride:1536
[26236.667849] mlx4_en: p3p4:   frag:0 - size:1522 prefix:0 stride:1536
[26236.782208] mlx4_en: p3p5:   frag:0 - size:1522 prefix:0 stride:1536

// devices as seen by linux
# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8<http://127.0.0.1/8> scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 54:a0:50:85:79:87 brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.174/24<http://192.168.0.174/24> brd 192.168.0.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::56a0:50ff:fe85:7987/64 scope link
       valid_lft forever preferred_lft forever
3: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether f4:52:14:8f:16:80 brd ff:ff:ff:ff:ff:ff
    inet 10.10.10.2/24<http://10.10.10.2/24> brd 10.10.10.255 scope global eth2
       valid_lft forever preferred_lft forever
    inet6 fe80::f652:14ff:fe8f:1680/64 scope link
       valid_lft forever preferred_lft forever
4: p3p2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether b2:00:7c:2b:3f:47 brd ff:ff:ff:ff:ff:ff
    inet 10.10.10.3/24<http://10.10.10.3/24> brd 10.10.10.255 scope global p3p2
       valid_lft forever preferred_lft forever
    inet6 fe80::b000:7cff:fe2b:3f47/64 scope link
       valid_lft forever preferred_lft forever
5: p3p3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 3a:3d:c7:e0:ed:5a brd ff:ff:ff:ff:ff:ff
    inet6 fe80::383d:c7ff:fee0:ed5a/64 scope link
       valid_lft forever preferred_lft forever
6: p3p4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether ee:6a:a6:79:24:4c brd ff:ff:ff:ff:ff:ff
    inet6 fe80::ec6a:a6ff:fe79:244c/64 scope link
       valid_lft forever preferred_lft forever
7: p3p5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 8a:7a:30:00:46:33 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::887a:30ff:fe00:4633/64 scope link
       valid_lft forever preferred_lft forever

// our code enumerating dpdk capable ports..
# ./listdevices
Eth device info {
            port: 0
            driver name: librte_pmd_mlx4
            mac address: F4:52:14:8F:16:80
            PCI device: 0000:01:00.0
}
Eth device info {
            port: 1
            driver name: librte_pmd_mlx4
            mac address: B2:00:7C:2B:3F:47
            PCI device: 0000:01:00.1
}
Eth device info {
            port: 2
            driver name: librte_pmd_mlx4
            mac address: 3A:3D:C7:E0:ED:5A
            PCI device: 0000:01:00.2
}
Eth device info {
            port: 3
            driver name: librte_pmd_mlx4
            mac address: EE:6A:A6:79:24:4C
            PCI device: 0000:01:00.3
}
Eth device info {
            port: 4
            driver name: librte_pmd_mlx4
            mac address: 8A:7A:30:00:46:33
            PCI device: 0000:01:00.4
}


On Thu, Oct 8, 2015 at 3:27 PM, Olga Shern <olgas at mellanox.com<mailto:olgas at mellanox.com>> wrote:
Hi Bill,

Starting from DPDK 2.1 ConnectX-3 PMD is based on “accelerated verbs”,  ibv_exp_create_res_domain is coming from this new API.
Just to make sure I understand what you are doing:  you have enabled SRIOV and you are running DPDK on hypervisor on the probed VFs that you have created, right?
We did test this combination (dpdk2.1 and ofed3.1-3)on hypervisor on the PF and also on VM on VF, but in fact, I didn’t try to run DPDK on the VFs on hypervisor, I will check this.
Meanwhile, can you please send the output of the application on the start up.  Do you see any errors in dmesg?

Best Regards,
Olga

From: Bill O'Hara [mailto:billtohara at gmail.com<mailto:billtohara at gmail.com>]
Sent: Thursday, October 08, 2015 11:55 PM
To: Olga Shern
Cc: dev at dpdk.org<mailto:dev at dpdk.org>
Subject: Re: [dpdk-dev] Mellanox PMD failure w/DPDK-2.1.0 and MLNX_OFED-3.1-1.0.3

Olga

If it's all all helpful, linking our code against dpdk-2.0 and (statically) against the appropriate custom-built libibverbs that we used with it, works on those machines. There is of course no call to ibv_exp_create_res_domain() in that version of the library. But it at least confirms basic operation of the upgraded OFED and firmware on those boxes.

Is there anything else we can check or confirm for you?

thanks
bill


On Thu, Oct 8, 2015 at 9:06 AM, Bill O'Hara <billtohara at gmail.com<mailto:billtohara at gmail.com>> wrote:
Hi Olga

Firmware is version 2.35.5100. Configuration details below.

Thanks for any hints.
bill

root:~# cat /etc/modprobe.d/mlx4_core.conf
options mlx4_core port_type_array=2,2 num_vfs=16 probe_vf=4

root:~# ibstat
CA 'mlx4_0'
CA type: MT4099
Number of ports: 1
Firmware version: 2.35.5100
Hardware version: 1
Node GUID: 0xf4521403008f1680
System image GUID: 0xf4521403008f1683
Port 1:
State: Active
Physical state: LinkUp
Rate: 56
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x0c010000
Port GUID: 0xf65214fffe8f1680
Link layer: Ethernet
CA 'mlx4_1'
CA type: MT4100
Number of ports: 1
Firmware version: 2.35.5100
Hardware version: 1
Node GUID: 0x00140500c2d3b05f
System image GUID: 0xf4521403008f1683
Port 1:
State: Active
Physical state: LinkUp
Rate: 56
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x0c010000
Port GUID: 0xfc9739fffe1272c3
Link layer: Ethernet
CA 'mlx4_2'
CA type: MT4100
Number of ports: 1
Firmware version: 2.35.5100
Hardware version: 1
Node GUID: 0x00140500b90af10c
System image GUID: 0xf4521403008f1683
Port 1:
State: Active
Physical state: LinkUp
Rate: 56
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x0c010000
Port GUID: 0x20ecbbfffeefb934
Link layer: Ethernet
CA 'mlx4_3'
CA type: MT4100
Number of ports: 1
Firmware version: 2.35.5100
Hardware version: 1
Node GUID: 0x001405009661e607
System image GUID: 0xf4521403008f1683
Port 1:
State: Active
Physical state: LinkUp
Rate: 56
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x0c010000
Port GUID: 0xf4c8e6fffe5abc89
Link layer: Ethernet
CA 'mlx4_4'
CA type: MT4100
Number of ports: 1
Firmware version: 2.35.5100
Hardware version: 1
Node GUID: 0x00140500bd09e128
System image GUID: 0xf4521403008f1683
Port 1:
State: Active
Physical state: LinkUp
Rate: 56
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x0c010000
Port GUID: 0x5828e1fffe34f919
Link layer: Ethernet

On Thu, Oct 8, 2015 at 2:03 AM, Olga Shern <olgas at mellanox.com<mailto:olgas at mellanox.com>> wrote:
Hi Bill,

Can you please check the fw version that is installed on your ConnectX3?

Thanks


Sent from Samsung Mobile.

-------- Original message --------
From: Olga Shern
Date:08/10/2015 7:55 AM (GMT+00:00)
To: Bill O'Hara ,dev at dpdk.org<mailto:dev at dpdk.org>
Subject: RE: [dpdk-dev] Mellanox PMD failure w/DPDK-2.1.0 and MLNX_OFED-3.1-1.0.3

Hi Bill,

There shouldn’t be any problem with what you are doing.
We are checking this now.

Best Regards,
Olga

-----Original Message-----
From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bill O'Hara
Sent: Thursday, October 08, 2015 6:05 AM
To: dev at dpdk.org<mailto:dev at dpdk.org>
Subject: [dpdk-dev] Mellanox PMD failure w/DPDK-2.1.0 and MLNX_OFED-3.1-1.0.3

Hello

I wonder if anyone can suggest why previously working dpdk code may fail in the Mellanox pmd code in dpdk-2.1.0, seemingly due to failure to create a "resource domain" via ibv_exp_create_res_domain(). I must admit I haven't seen that verb before, and it appears to be returning null with no error message.

The DPDK log gives these hints:

PMD: librte_pmd_mlx4: 0xa4fc20: TX queues number update: 0 -> 1
PMD: librte_pmd_mlx4: 0xa4fc20: RX queues number update: 0 -> 1
PMD: librte_pmd_mlx4: 0xa4fc20: RD creation failure: Cannot allocate memory

I'm using dpdk-2.10.0 and  MLNX_OFED_LINUX-3.1-1.0.3 on ubuntu14.04 with a
connectx-3 card.

thanks
bill





More information about the dev mailing list