memif thread race condition on memif.disconnect()
Stephen Hemminger
stephen at networkplumber.org
Mon Oct 30 20:18:28 CET 2023
On Wed, 11 Oct 2023 19:57:56 +0000
"Bly, Mike" <mbly at ciena.com> wrote:
> Hello,
>
> We have run into a timing issue between threads when using the memif
> interface type and need some guidance.
>
> Our application has a DPDK based process operating (among other
> things) a memif server interface. The problem is exposed when this
> memif interface receives a memif.disconnect message from the remote
> client, while in the middle of an rte_eth_rx_burst() on this same
> memif interface. As the IRQ message handling is on its own thread as
> compared to the DPDK worker thread doing the rx_burst, this resulted
> in a crash. The backtraces for which have been shared below. How does
> one ensure there are guard rails in place to gracefully exit the
> rx-burst when a disconnect occurs? Or, how do we properly modify the
> code such that we defer responding to the disconnect CB after the
> rx-burst operation has completed?
>
> We are utilizing DPDK 21.11.2. I have diff'd dpdks-stable:22.11.3 in
> ./drivers/net/memif, but I do not see anything obvious that would
> address this. I did a similar diff for dpdk:23.07, but do not see
> anything obvious there either.
>
> -Mike
>
> (gdb) thread 1
> [Switching to thread 1 (Thread 0x7f17e2813600 (LWP 470))]
> #0 0x00007f17e374d225 in eth_memif_rx (queue=0x1189023b00,
> bufs=0x7f17e28100e8, nb_pkts=32) at
> ../git/drivers/net/memif/rte_eth_memif.c:338 338
> last_slot = __atomic_load_n(&ring->head, __ATOMIC_ACQUIRE); (gdb) bt
> #0 0x00007f17e374d225 in eth_memif_rx (queue=0x1189023b00,
> bufs=0x7f17e28100e8, nb_pkts=32) at
> ../git/drivers/net/memif/rte_eth_memif.c:338 #1 0x000000000047e6fb
> in rte_eth_rx_burst (nb_pkts=32, rx_pkts=0x7f17e28100e8, queue_id=0,
> port_id=<optimized out>) at /usr/include/rte_ethdev.h:5368 #2
> pmd_main_loop () at ../git/swfw/api/src/swfwPmd.c:1086 #3
> 0x000000000047f309 in pmd_launch_one_lcore (dummy=<optimized out>) at
> ../git/my_process.c:1157 #4 0x00007f17f7070e7c in eal_thread_loop
> (arg=<optimized out>) at ../git/lib/eal/linux/eal_thread.c:146 #5
> 0x00007f17f4c3da72 in start_thread (arg=<optimized out>) at
> pthread_create.c:442 #6 0x00007f17f4cbf930 in clone3 () at
> ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 (gdb) l 333
> ring_size = 1 << mq->log2_ring_size; 334 mask = ring_size
> - 1; 335 336 if (type == MEMIF_RING_C2S) { 337
> cur_slot = mq->last_head; 338 last_slot
> = __atomic_load_n(&ring->head, __ATOMIC_ACQUIRE); 339 }
> else { 340 cur_slot = mq->last_tail; 341
> last_slot = __atomic_load_n(&ring->tail,
> __ATOMIC_ACQUIRE); 342 } (gdb) p ring->head Cannot access
> memory at address 0x7f17d8e58006
>
> (gdb) thread 19
> [Switching to thread 19 (Thread 0x7f17f0804600 (LWP 468))]
> #0 0x00007f17f4caf97b in __GI___close (fd=494) at
> ../sysdeps/unix/sysv/linux/close.c:27 27 return SYSCALL_CANCEL
> (close, fd); (gdb) bt
> #0 0x00007f17f4caf97b in __GI___close (fd=494) at
> ../sysdeps/unix/sysv/linux/close.c:27 #1 0x00007f17e374f01f in
> memif_free_regions (dev=dev at entry=0x7f17f727f000
> <rte_eth_devices+99072>) at
> ../git/drivers/net/memif/rte_eth_memif.c:882 #2 0x00007f17e37475d0
> in memif_disconnect (dev=0x7f17f727f000 <rte_eth_devices+99072>) at
> ../git/drivers/net/memif/memif_socket.c:623 #3 0x00007f17f7091bd2 in
> eal_intr_process_interrupts (nfds=<optimized out>, events=<optimized
> out>) at ../git/lib/eal/linux/eal_interrupts.c:1026 #4
> out>eal_intr_handle_interrupts (totalfds=<optimized out>, pfd=20) at
> out>../git/lib/eal/linux/eal_interrupts.c:1100 #5
> out>eal_intr_thread_main (arg=<optimized out>) at
> out>../git/lib/eal/linux/eal_interrupts.c:1172 #6 0x00007f17f4c3da72
> out>in start_thread (arg=<optimized out>) at pthread_create.c:442 #7
> out>0x00007f17f4cbf930 in clone3 () at
> out>../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
>
I don't think memif maintainer has been very active.
One possibility would be the memif driver support removal event
interrupt. This would require driver and application change.
More information about the dev
mailing list