[dpdk-users] Crashes in KNI with pipeline application(DPAA2 PMD)

Heena Sirwani heena.sirwani at nevisnetworks.com
Thu Mar 8 13:14:15 CET 2018


Hi All,
I am using DPDK version 16.07-r0 for nxp freescale processor with DPAA2(non-numa).
I am trying to use KNI with a pipeline application wherein the application reads packets from physical interface using DPAA2 PMD in dpdk, sends them to KNI  and reads from KNI, sends read packets to physical interface.
Initially the same mempool was used for paths: rx in dpaa2 PMD for received packets inserted in rx_q and tx in KNI for alloc_q and tx_q, but there was a crash in the KNI thread in the kernel while receiving packets in kni_net_rx_normal() at memcpy. The trace is as follows:
[76730.135473] Unable to handle kernel paging request at virtual address fffe80835e400100
[76730.143386] pgd = ffff8082f2966000
[76730.146787] [fffe80835e400100] *pgd=0000000000000000
[76730.151748] Internal error: Oops: 96000004 [#1] SMP
[76730.156617] Modules linked in: rte_kni(O)
[76730.160626] CPU: 3 PID: 3335 Comm: kni_single Tainted: G           O    4.1.35-rt41 #1
[76730.168536] Hardware name: Freescale Layerscape 2088a RDB Board (DT)
[76730.174881] task: ffff80830e92cd40 ti: ffff8082f2b6c000 task.ti: ffff8082f2b6c000
[76730.182363] PC is at memcpy+0x68/0x180
[76730.186139] LR is at kni_net_rx_normal+0x294/0x308 [rte_kni]
[76730.191789] pc : [<ffff800000323e68>] lr : [<ffff7ffffc02fbdc>] pstate: 20000145
[76730.199177] sp : ffff8082f2b6fc70
[76730.202481] x29: ffff8082f2b6fc70 x28: ffff80830eaa07c0
[76730.207791] x27: 0000000000000000 x26: ffff800074c3cb00
[76730.213101] x25: ffff80830df62081 x24: ffff8082f2b6fce0
[76730.218410] x23: ffff80830eaa0000 x22: 0000000000000000
[76730.223719] x21: 000000000000003c x20: 0000000000000001
[76730.229028] x19: fffe80835e400100 x18: ffff7c01cbcc7f20
[76730.234338] x17: 0000000000000004 x16: ffff80830f00dbe8
[76730.239647] x15: 0000800000000000 x14: ffff800000c97000
[76730.244956] x13: dead000000000100 x12: ffff8000008af2a0
[76730.250265] x11: 0000000000000005 x10: 00000000010074a0
[76730.255574] x9 : ffff8082f2b6fcd0 x8 : 0000000000000000
[76730.260883] x7 : 0000000000000080 x6 : ffff8082f0d5d802
[76730.266192] x5 : 000000000000003c x4 : 0000000000000000
[76730.271501] x3 : 0000000000000030 x2 : 000000000000003c
[76730.276809] x1 : fffe80835e400100 x0 : ffff8082f0d5d802
[76730.282118]
[76730.283599] Process kni_single (pid: 3335, stack limit = 0xffff8082f2b6c028)
[76730.290640] Stack: (0xffff8082f2b6fc70 to 0xffff8082f2b70000)
[76730.296378] fc60:                                     f2b6fde0 ffff8082 fc03066c ffff7fff
[76730.304549] fc80: 0eaa07c0 ffff8083 f3384a78 ffff8082 000003e8 00000000 f3384a40 ffff8082
[76730.312719] fca0: f3384a50 ffff8082 00000000 00000000 00000000 00000000 00000000 00000000
[76730.320888] fcc0: 00000000 00000000 00000000 00000000 00000008 00000000 0e92cd40 00000001
[76730.329058] fce0: afb62081 0000ffff 000fd914 ffff8000 f2b6fda8 ffff8082 f2b6fda8 ffff8082
[76730.337228] fd00: f2b6fda8 ffff8082 1ff55cc0 ffff8083 f2b6fd40 ffff8082 000fd9b0 ffff8000
[76730.345397] fd20: f2b6fda8 ffff8082 1ff55cc0 ffff8083 00000000 00000000 00000140 00000000
[76730.353567] fd40: f2b6fd60 ffff8082 0089b5d8 ffff8000 00749f80 00000001 1ff55cc0 ffff8083
[76730.361737] fd60: f2b6fde0 ffff8082 0089b670 ffff8000 f3384a78 ffff8082 f3384a78 ffff8082
[76730.369906] fd80: 00000000 00000000 f3384a40 ffff8082 f3384a50 ffff8082 f2b6fde0 ffff8082
[76730.378076] fda0: 00000140 00000000 00000000 00000000 00000200 dead0000 00749f80 00000001
[76730.386245] fdc0: 1ff55cc0 ffff8083 000fd0f4 ffff8000 0e92cd40 ffff8083 ffffffff ffff8082
[76730.394415] fde0: f2b6fdf0 ffff8082 fc02e754 ffff7fff f2b6fe30 ffff8082 000ccadc ffff8000
[76730.402585] fe00: f2970600 ffff8082 00d390e0 ffff8000 f3384a40 ffff8082 fc02e708 ffff7fff
[76730.410754] fe20: 00000000 00000000 fc02e708 ffff7fff 00000000 00000000 00085d40 ffff8000
[76730.418924] fe40: 000cca14 ffff8000 f2970600 ffff8082 00000000 00000000 00000000 00000000
[76730.427093] fe60: f2b6fea0 ffff8082 00000000 00000000 000cca14 ffff8000 f3384a40 ffff8082
[76730.435263] fe80: 00000000 00000000 00000000 00000000 f2b6fe90 ffff8082 f2b6fe90 ffff8082
[76730.443432] fea0: 00000000 00000000 00000000 ffff8000 f2b6feb0 ffff8082 f2b6feb0 ffff8082
[76730.451602] fec0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[76730.459771] fee0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[76730.467941] ff00: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[76730.476110] ff20: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[76730.484279] ff40: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[76730.492449] ff60: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[76730.500618] ff80: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[76730.508788] ffa0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[76730.516957] ffc0: 00000000 00000000 00000005 00000000 00000000 00000000 00000000 00000000
[76730.525126] ffe0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[76730.533295] Call trace:
[76730.535734] [<ffff800000323e68>] memcpy+0x68/0x180
[76730.540548] [<ffff7ffffc03066c>] kni_net_rx+0x14/0x28 [rte_kni]
[76730.546490] [<ffff7ffffc02e754>] kni_thread_single+0x4c/0xa0 [rte_kni]
[76730.553011] [<ffff8000000ccadc>] kthread+0xc8/0xdc
[76730.557795] Code: 54000140 7100807f 54000080 540000ab (a8c12027)
[76730.563891] ---[ end trace 5377f2c149d2d3ab ]---
[76800.750221] device vEth0 left promiscuous mode
[76814.250457] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[76814.258565] pgd = ffff800000df9000
[76814.261958] [00000000] *pgd=000000838f006003, *pud=000000838f007003, *pmd=000000838f008003, *pte=0060000006000707
[76814.272235] Internal error: Oops: 96000004 [#2] SMP
[76814.277105] Modules linked in: rte_kni(O)
[76814.281115] CPU: 0 PID: 3333 Comm: eal-intr-thread Tainted: G      D    O    4.1.35-rt41 #1
[76814.289458] Hardware name: Freescale Layerscape 2088a RDB Board (DT)
[76814.295804] task: ffff80830eba8840 ti: ffff8082f0cf4000 task.ti: ffff8082f0cf4000
[76814.303283] PC is at exit_creds+0x18/0x70
[76814.307287] LR is at __put_task_struct+0x44/0x120
[76814.311982] pc : [<ffff8000000ce8ec>] lr : [<ffff8000000af4b0>] pstate: 80000145
[76814.319370] sp : ffff8082f0cf7b10
[76814.322675] x29: ffff8082f0cf7b10 x28: 0000000000040006
[76814.327986] x27: 0000000000040005 x26: ffff8082f3384a40
[76814.333293] x25: ffff800000d1dec0 x24: ffff8082f297eb10
[76814.338601] x23: ffff80830eec00d8 x22: ffff7ffffc03cd28
[76814.343910] x21: ffff8082f30f7c48 x20: 000000000000000b
[76814.349218] x19: ffff80830e92cd40 x18: 000000000000003f
[76814.354528] x17: 000000000000000c x16: 000000000000000c
[76814.359836] x15: 0000000000001000 x14: ffff80830e7c1880
[76814.365145] x13: 4000000000000000 x12: ffffffffffffffff
[76814.370454] x11: 0000000000000140 x10: ffff80831ff3baf8
[76814.375763] x9 : 00000000000000c0 x8 : ffff7bffc0000000
[76814.381073] x7 : ffff800000d2e3c0 x6 : 0000000000000040
[76814.386383] x5 : 0000000000000000 x4 : 0000000000000000
[76814.391691] x3 : 0000000000000001 x2 : ffff80830e92cd58
[76814.397000] x1 : 0000000000000000 x0 : 0000000000000000
[76814.402307]
[76814.403788] Process eal-intr-thread (pid: 3333, stack limit = 0xffff8082f0cf4028)
[76814.411263] Stack: (0xffff8082f0cf7b10 to 0xffff8082f0cf8000)
[76814.417002] 7b00:                                     f0cf7b30 ffff8082 000af4b0 ffff8000
[76814.425172] 7b20: 0e92cd40 ffff8083 000af4a8 ffff8000 f0cf7b50 ffff8082 000cd01c ffff8000
[76814.433343] 7b40: 0e92cd40 ffff8083 0000000b 00000000 f0cf7b80 ffff8082 fc02e6f0 ffff7fff
[76814.441513] 7b60: f297eb00 ffff8082 00000008 00000000 f30f7c48 ffff8082 f297eb10 ffff8082
[76814.449683] 7b80: f0cf7be0 ffff8082 001973bc ffff8000 f297eb00 ffff8082 00000008 00000000
[76814.457854] 7ba0: f30f7c48 ffff8082 0e8e73e0 ffff8083 0eec00d8 ffff8083 f297eb10 ffff8082
[76814.466024] 7bc0: f0cf7df8 ffff8082 418004fc 00000000 00040005 00000000 00000000 00000000
[76814.474195] 7be0: f0cf7c20 ffff8082 0019754c ffff8000 0eba9198 ffff8083 00000000 00000000
[76814.482365] 7c00: 00d39000 ffff8000 0eba8840 ffff8083 f31c8e40 ffff8082 00c96000 ffff8000
[76814.490535] 7c20: f0cf7c30 ffff8082 000cb11c ffff8000 f0cf7c60 ffff8082 000b45c8 ffff8000
[76814.498706] 7c40: 0eba8840 ffff8083 00000000 00000000 f0cf7cd0 ffff8082 00000001 00000000
[76814.506876] 7c60: f0cf7ce0 ffff8082 000b4d0c ffff8000 0e83d6c0 ffff8083 00000000 00000000
[76814.515047] 7c80: f0cf7e18 ffff8082 f33d0cc8 ffff8082 0e83d6c0 ffff8083 00c97df4 ffff8000
[76814.523217] 7ca0: f0cf7df8 ffff8082 418004fc 00000000 00040005 00000000 00040006 00000000
[76814.531387] 7cc0: f0cf7cd0 ffff8082 000bc1c8 ffff8000 f0cf7d10 ffff8082 000beb80 ffff8000
[76814.539557] 7ce0: f0cf7d10 ffff8082 000bed78 ffff8000 f0cf4000 ffff8082 f33d04c0 ffff8082
[76814.547728] 7d00: f0cf7e18 ffff8082 0089b984 ffff8000 f0cf7d80 ffff8082 00088fc8 ffff8000
[76814.555898] 7d20: b65851ec 0000ffff f0cf7df8 ffff8082 fffffffc ffffffff b65851f0 0000ffff
[76814.564068] 7d40: 80000000 00000000 00000015 00000000 f0cf7ec0 ffff8082 00000016 00000000
[76814.572238] 7d60: 008a6000 ffff8000 f0cf4000 ffff8082 0e3e5640 ffff8083 00c960c0 ffff8000
[76814.580408] 7d80: f0cf7ea0 ffff8082 00089628 ffff8000 00000009 00000000 00000002 00000000
[76814.588578] 7da0: ffffffff ffffffff b65851f0 0000ffff 80000000 00000000 00000015 00000000
[76814.596748] 7dc0: 0000011a 00000000 00000016 00000000 008a6000 ffff8000 f0cf4000 ffff8082
[76814.604918] 7de0: 00000002 00000000 ffffffff ffffffff 953fe730 0000ffff 00000015 00000000
[76814.613088] 7e00: 0000011a 00000000 00000016 00000000 008a6000 ffff8000 00000009 00000000
[76814.621259] 7e20: 00000000 00000000 00000000 00000000 008a6000 ffff8000 f0cf4000 ffff8082
[76814.629429] 7e40: 7a97ea00 ffff8000 00000001 00000000 0eba8840 ffff8083 000d58ec ffff8000
[76814.637599] 7e60: 00000100 dead0000 00000200 dead0000 953fe6f0 0000ffff 00085db0 ffff8000
[76814.645770] 7e80: 00000000 00000000 00000002 00000000 ffffffff ffffffff b65851f0 0000ffff
[76814.653940] 7ea0: 953fe6f0 0000ffff 00085c9c ffff8000 00000000 00000000 0e3e5640 ffff8083
[76814.662110] 7ec0: fffffffc ffffffff 953fe730 0000ffff 00000002 00000000 ffffffff ffffffff
[76814.670280] 7ee0: 00000000 00000000 00000008 00000000 00000000 00000000 00000000 00000000
[76814.678450] 7f00: 00000016 00000000 00000000 00000000 01010101 01010101 00000008 00000000
[76814.686620] 7f20: 953ff020 0000ffff 00000000 00000000 b64c0020 0000ffff 00000014 00000000
[76814.694789] 7f40: 005b66a8 00000000 b6584f30 0000ffff fffffffb ffffffff ffffffff 00000000
[76814.702959] 7f60: 00000002 00000000 0000004d 00000000 00000000 00000000 953fe730 0000ffff
[76814.711129] 7f80: 0084d000 00000000 0000004d 00000000 00000000 00000000 00000001 00000000
[76814.719299] 7fa0: 953ff020 0000ffff 953fe6f0 0000ffff b65851cc 0000ffff 953fe6f0 0000ffff
[76814.727470] 7fc0: b65851f0 0000ffff 80000000 00000000 0000004d 00000000 ffffffff ffffffff
[76814.735640] 7fe0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[76814.743809] Call trace:
[76814.746246] [<ffff8000000ce8ec>] exit_creds+0x18/0x70
[76814.751290] [<ffff8000000af4b0>] __put_task_struct+0x44/0x120
[76814.757030] [<ffff8000000cd01c>] kthread_stop+0x88/0x9c
[76814.762282] [<ffff7ffffc02e6f0>] kni_release+0x124/0x13c [rte_kni]
[76814.768455] [<ffff8000001973bc>] __fput+0x8c/0x1c8
[76814.773237] [<ffff80000019754c>] ____fput+0xc/0x14
[76814.778020] [<ffff8000000cb11c>] task_work_run+0xb4/0xec
[76814.783325] [<ffff8000000b45c8>] do_exit+0x2c4/0x994
[76814.788281] [<ffff8000000b4d0c>] do_group_exit+0x44/0xdc
[76814.793585] [<ffff8000000bed78>] get_signal+0x2b4/0x50c
[76814.798803] [<ffff800000088fc8>] do_signal+0x78/0x4f0
[76814.803847] [<ffff800000089628>] do_notify_resume+0x60/0x68
[76814.809412] Code: f9000bf3 aa0003f3 f9426c01 f9426800 (b9400021)
[76814.815513] ---[ end trace 5377f2c149d2d3ac ]---
[76814.820124] Fixing recursive fault but reboot is needed!
The mempool used was with default settings i.e. buffer_size = 2048 + sizeof(struct rte_mbuf) + RTE_PKTMBUF_HEADROOM(256),  pool_size = 32 * 1024, cache_size = 256. I guess this is related to address translations.
After this we used different mempools for rx in PMD to KNI and tx from KNI to PMD i.e. we allocated the KNI context from a different mempool(since it is a non-numa arch, this should be ok?).
After this there was no crash at kni_net_rx_normal, but there is rte_panic related to mbuf ref_cnt==0 assertion failing when allocating pktmbufs from KNI context mempool for the alloc_q.
KNI: Allocating mbufs into alloc_q for KNI
PANIC in rte_mbuf_raw_alloc():
line 1162       assert "rte_mbuf_refcnt_read(m) == 0" failed
1: [./utm-datapath(rte_dump_stack+0x20) [0x49711c]]
Aborted
I have traced calls in PMD if there have been pktmbuf frees with __rte_mbuf_raw_free and rte_mempool_put (freeing without decrementing ref count), but all frees have been done with  rte_pktmbuf_free(). I seem to have run out of debugging options.
Any help will be truly appreciated.
Thanks,
Heena
 


More information about the users mailing list