[dpdk-dev] [dpdk-stable] [PATCH v2] bus/pci: fix driver detach clear

Matan Azrad matan at mellanox.com
Wed Nov 20 14:44:40 CET 2019


Hi David

From: David Marchand
> On Wed, Nov 20, 2019 at 10:48 AM Matan Azrad <matan at mellanox.com>
> wrote:
> >
> > When a rte_device is unplugged, the driver should be detached from the
> > device.
> >
> > The PCI detach driver operation wrongly didn't clear the driver from
> > the device structure what remain the device in probe state from the
> > EAL point of view.
> >
> > For example, when a device is removed twice using rte_dev_remove, it
> > cause a crash in EAL.
> 
> I can see a crash when using port detach in testpmd with a virtio pci device.
> 
> testpmd> port attach 0000:07:00.0
> Attaching a new port...
> EAL: PCI device 0000:07:00.0 on NUMA socket -1
> EAL:   Invalid NUMA socket, default to 0
> EAL:   probe driver: 1af4:1041 net_virtio
> Port 1 is attached. Now total ports is 2 Done
> testpmd> port close 1
> Closing ports...
> EAL: Releasing pci mapped resource for 0000:07:00.0
> EAL: Calling pci_unmap_resource for 0000:07:00.0 at 0x2200006000 Done
> testpmd> port detach 1
> Removing a device...
> 
> Breakpoint 1, local_dev_remove (dev=0x1de64b0) at
> /root/dpdk/lib/librte_eal/common/eal_common_dev.c:315
> 315        if (dev->bus->unplug == NULL) {
> Missing separate debuginfos, use: debuginfo-install
> glibc-2.17-292.el7.x86_64 libgcc-4.8.5-39.el7.x86_64
> libpcap-1.5.3-11.el7.x86_64 numactl-libs-2.0.12-3.el7.x86_64
> (gdb) p *dev
> $1 = {next = {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x1cf8078
> "0000:07:00.0", driver = 0x16c68f0 <rte_virtio_pmd+16>, bus =
> 0x16b2640 <rte_pci_bus>, numa_node = 0, devargs = 0x1cf8060}
> (gdb) c
> Continuing.
> Device of port 1 is detached
> Now total ports is 1
> Done
> 
> 
> On the first detach, the pci bus frees the rte_pci_device which embeds the
> rte_device object.
> 
> static int
> pci_unplug(struct rte_device *dev)
> {
>         struct rte_pci_device *pdev;
>         int ret;
> 
>         pdev = RTE_DEV_TO_PCI(dev);
>         ret = rte_pci_detach_dev(pdev);
>         if (ret == 0) {
>                 rte_pci_remove_device(pdev);
>                 rte_devargs_remove(dev->devargs);
>                 free(pdev);
>         }
>         return ret;
> }
> 
> 
> 
> testpmd> port detach 1
> Removing a device...
> 
> Breakpoint 1, local_dev_remove (dev=0x1de64b0) at
> /root/dpdk/lib/librte_eal/common/eal_common_dev.c:315
> 315        if (dev->bus->unplug == NULL) {
> (gdb) p *dev
> $2 = {next = {tqe_next = 0x0, tqe_prev = 0x0}, name = 0xa <Address 0xa out
> of bounds>, driver = 0x0, bus = 0x4637, numa_node = 1, devargs =
> 0x40000002e040018}
> (gdb) c
> Continuing.
> 
> Program received signal SIGSEGV, Segmentation fault.
> 0x00000000007c1ddd in local_dev_remove (dev=0x1de64b0) at
> /root/dpdk/lib/librte_eal/common/eal_common_dev.c:315
> 315        if (dev->bus->unplug == NULL) {
> 
> 
> On the second detach, testpmd passes the same rte_device pointer it
> extracts from rte_eth_devices, but the malloc'd location has been reused
> (with watchpoint on the location, I found somewhere around
> rte_mp_request_sync/opendir()), and then *crunch* on dev->bus.
> 
> 
> From my pov:
> - testpmd is wrongly reusing a pointer coming from rte_eth_devices[],
> without caring about the port state (this is what your second patch fixes),
> - testpmd is directly kicking pointers in rte_eth_devices[] (setting
> ->device = NULL for its own logic), which is bad too,
> - this patch just hides the reuse of a freed pointer,

Yes, you right.

This patch is not needed since the rte_device is freed in remove.

Thanks.

> 
> 
> --
> David Marchand



More information about the dev mailing list