[dpdk-dev] [Bug 56] crash when freeing memory with no mlx5 device attached

bugzilla at dpdk.org bugzilla at dpdk.org
Wed May 30 15:39:45 CEST 2018


https://dpdk.org/tracker/show_bug.cgi?id=56

            Bug ID: 56
           Summary: crash when freeing memory with no mlx5 device attached
           Product: DPDK
           Version: 18.05
          Hardware: All
                OS: All
            Status: CONFIRMED
          Severity: critical
          Priority: Normal
         Component: other
          Assignee: dev at dpdk.org
          Reporter: david.marchand at 6wind.com
  Target Milestone: ---

This problem is produced when a memory free event reaches the mlx5 callback,
but no mlx5 device has been initialised (yet).

Looking at the code, the mlx5 driver always register a memory callback:

RTE_INIT(rte_mlx5_pmd_init);
static void
rte_mlx5_pmd_init(void)
{
...
        rte_mem_event_callback_register("MLX5_MEM_EVENT_CB",
                                        mlx5_mr_mem_event_cb, NULL);
}

When invoked, this callback tries to take a lock:

void                                                                            
mlx5_mr_mem_event_cb(enum rte_mem_event event_type, const void *addr,           
                     size_t len, void *arg __rte_unused)                        
{                                                                               
        struct priv *priv;                                                      
        struct mlx5_dev_list *dev_list = &mlx5_shared_data->mem_event_cb_list;  

        switch (event_type) {                                                   
        case RTE_MEM_EVENT_FREE:                                                
                rte_rwlock_write_lock(&mlx5_shared_data->mem_event_rwlock);     
                /* Iterate all the existing mlx5 devices. */                    

But this lock is not initialised unless a mlx5 device has been probed, since
its init is done in mlx5_prepare_shared_data() called from mlx5_pci_probe().


Reproducing the issue is not direct, I forced an allocation / liberation in the
testpmd code to make sure a free event would be triggered:

root at ubuntu1604:~/dpdk# git diff
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 35cf266..79c9531 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -2772,6 +2772,8 @@ main(int argc, char** argv)
        }
 #endif

+       rte_free(rte_malloc(NULL, 10000000, 0));
+
 #ifdef RTE_LIBRTE_CMDLINE
        if (strlen(cmdline_filename) != 0)
                cmdline_read_from_file(cmdline_filename);


Then:

root at ubuntu1604:~/dpdk# LD_LIBRARY_PATH=/root/rdma-core/build/lib
./build/app/testpmd --log-level .*,8 -c 0x6 -- -i --total-num-mbufs 2048
EAL: Detected lcore 0 as core 0 on socket 0
EAL: Detected lcore 1 as core 0 on socket 0
EAL: Detected lcore 2 as core 0 on socket 0
...
EAL: Trying to obtain current memory policy.
EAL: Setting policy MPOL_PREFERRED for socket 0
EAL: Restoring previous memory policy: 0
EAL: Calling mem event callback 'MLX5_MEM_EVENT_CB:(nil)'EAL: request:
mp_malloc_sync
EAL: Heap on socket 0 was expanded by 90MB
Interactive-mode selected
testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=2048, size=2176,
socket=0
testpmd: preferred mempool ops selected: ring_mp_mc
EAL: Trying to obtain current memory policy.
EAL: Setting policy MPOL_PREFERRED for socket 0
EAL: Restoring previous memory policy: 0
EAL: alloc_pages_on_heap(): couldn't allocate physically contiguous space
EAL: Trying to obtain current memory policy.
EAL: Setting policy MPOL_PREFERRED for socket 0
EAL: Restoring previous memory policy: 0
EAL: Calling mem event callback 'MLX5_MEM_EVENT_CB:(nil)'EAL: request:
mp_malloc_sync
EAL: Heap on socket 0 was expanded by 8MB
Done
EAL: Trying to obtain current memory policy.
EAL: Setting policy MPOL_PREFERRED for socket 0
EAL: Restoring previous memory policy: 0
EAL: Calling mem event callback 'MLX5_MEM_EVENT_CB:(nil)'EAL: request:
mp_malloc_sync
EAL: Heap on socket 0 was expanded by 10MB
EAL: Calling mem event callback 'MLX5_MEM_EVENT_CB:(nil)'Segmentation fault
(core dumped)


root at ubuntu1604:~/dpdk# gdb ./build/app/testpmd core
...
Core was generated by `./build/app/testpmd --log-level .*,8 -c 0x6 -- -i
--total-num-mbufs 2048'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  rte_rwlock_write_lock (rwl=<optimized out>) at
/root/dpdk/build/include/generic/rte_rwlock.h:103
103                     x = rwl->cnt;
[Current thread is 1 (Thread 0x7f1871022c00 (LWP 5732))]
(gdb) bt
#0  rte_rwlock_write_lock (rwl=<optimized out>) at
/root/dpdk/build/include/generic/rte_rwlock.h:103
#1  mlx5_mr_mem_event_cb (event_type=RTE_MEM_EVENT_FREE, addr=0x7f1474a00000,
len=10485760, arg=<optimized out>) at /root/dpdk/drivers/net/mlx5/mlx5_mr.c:884
#2  0x000000000054ae86 in eal_memalloc_mem_event_notify ()
#3  0x0000000000558994 in malloc_heap_free ()
#4  0x000000000055445f in rte_free ()
#5  0x0000000000477231 in main ()

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the dev mailing list