[dpdk-dev] [PATCH v10 7/8] igb_uio: fix unexpected remove issue for hotplug
Ferruh Yigit
ferruh.yigit at intel.com
Thu Sep 27 17:07:19 CEST 2018
On 8/17/2018 11:48 AM, Jeff Guo wrote:
> When a device is hotplugged out, the PCI resource is released in the
> kernel, the UIO file descriptor will disappear and the irq will be
> released. After this, a kernel crash will be caused if the igb uio driver
> tries to access or release these resources.
>
> And more, uio_remove will be called unexpectedly before uio_release
> when device be hotpluggged out, the uio_remove procedure will
> free resources that are required by uio_release. This will later affect the
> usage of interrupt as there is no way to disable the interrupt which is
> defined in uio_release.
>
> To prevent this, the hotplug removal needs to be identified and processed
> accordingly in igb uio driver.
>
> This patch proposes the addition of enum rte_udev_state in the
> rte_uio_pci_dev struct. This will store the state of the uio device as one
> of the following: probed/opened/released/removed.
>
> This patch also checks the kobject's remove_uevent_sent state to detect if
> the removal status is hotplug-out. Once a hotplug-out is detected, it will
> call uio_release and set the uio status to "removed". After that, uio will
> check the status in the uio_release function. If uio has already been
> removed, it will only free the dirty uio resource.
>
> Signed-off-by: Jeff Guo <jia.guo at intel.com>
> Acked-by: Shaopeng He <shaopeng.he at intel.com>
<...>
> @@ -331,20 +351,35 @@ igbuio_pci_open(struct uio_info *info, struct inode *inode)
>
> /* enable interrupts */
> err = igbuio_pci_enable_interrupts(udev);
> - mutex_unlock(&udev->lock);
> if (err) {
> dev_err(&dev->dev, "Enable interrupt fails\n");
> + pci_clear_master(dev);
Why pci_clear_master required here.
btw, some part of this patch conflicts with [1], which removes mutes and use
atomic refcnt operations, but introducing state seems needs mutex.
[1]
igb_uio: fix refcount if open returns error
https://patches.dpdk.org/patch/44732/
> + mutex_unlock(&udev->lock);
> return err;
> }
> + udev->state = RTE_UDEV_OPENNED;
> + mutex_unlock(&udev->lock);
> return 0;
> }
>
> +/**
> + * This gets called while closing uio device file.
> + */
> static int
> igbuio_pci_release(struct uio_info *info, struct inode *inode)
> {
> struct rte_uio_pci_dev *udev = info->priv;
> struct pci_dev *dev = udev->pdev;
>
> + if (udev->state == RTE_UDEV_REMOVED) {
> + mutex_destroy(&udev->lock);
> + igbuio_pci_release_iomem(&udev->info);
> + pci_disable_device(dev);
> + pci_set_drvdata(dev, NULL);
> + kfree(udev);
> + return 0;
This branch taken when pci_remove called before pci_release.
- At this stage is "dev" valid, since pci_remove() called?
- In this path uio_unregister_device() is missing, who unregisters uio?
- sysfs_remove_group() also missing, it is not clear if it is forgotten or left
out, what do you think move common part of pci_remove into new function and call
both in pci_remove and here?
And as a logic, can we make pci_remove clear everything, instead of doing some
cleanup here. Like:
pci_remove:
- calls pci_release
- instead of return keeps doing pci_remove work
- set state to REMOVED
pci_release:
- if state is REMOVED, return without doing nothing
btw, even after uio_unregister_device() how pci_release called?
It can help to share crash backtrace in commit log, to describe problem in more
detail.
More information about the dev
mailing list