[dpdk-dev] [PATCH v10 7/8] igb_uio: fix unexpected remove issue for hotplug

Jeff Guo jia.guo at intel.com
Thu Oct 18 07:51:49 CEST 2018


hi, ferruh

On 9/27/2018 11:07 PM, Ferruh Yigit wrote:
> On 8/17/2018 11:48 AM, Jeff Guo wrote:
>> When a device is hotplugged out, the PCI resource is released in the
>> kernel, the UIO file descriptor will disappear and the irq will be
>> released. After this, a kernel crash will be caused if the igb uio driver
>> tries to access or release these resources.
>>
>> And more, uio_remove will be called unexpectedly before uio_release
>> when device be hotpluggged out, the uio_remove procedure will
>> free resources that are required by uio_release. This will later affect the
>> usage of interrupt as there is no way to disable the interrupt which is
>> defined in uio_release.
>>
>> To prevent this, the hotplug removal needs to be identified and processed
>> accordingly in igb uio driver.
>>
>> This patch proposes the addition of enum rte_udev_state in the
>> rte_uio_pci_dev struct. This will store the state of the uio device as one
>> of the following: probed/opened/released/removed.
>>
>> This patch also checks the kobject's remove_uevent_sent state to detect if
>> the removal status is hotplug-out. Once a hotplug-out is detected, it will
>> call uio_release and set the uio status to "removed". After that, uio will
>> check the status in the uio_release function. If uio has already been
>> removed, it will only free the dirty uio resource.
>>
>> Signed-off-by: Jeff Guo <jia.guo at intel.com>
>> Acked-by: Shaopeng He <shaopeng.he at intel.com>
> <...>
>
>> @@ -331,20 +351,35 @@ igbuio_pci_open(struct uio_info *info, struct inode *inode)
>>   
>>   	/* enable interrupts */
>>   	err = igbuio_pci_enable_interrupts(udev);
>> -	mutex_unlock(&udev->lock);
>>   	if (err) {
>>   		dev_err(&dev->dev, "Enable interrupt fails\n");
>> +		pci_clear_master(dev);
> Why pci_clear_master required here.


It is because set master is before interrupt enabling, if enable 
interrupt fails should clear master before return i think.

Anyway it is not belong to this patch perspective, it could be separated 
to another one.


> btw, some part of this patch conflicts with [1], which removes mutes and use
> atomic refcnt operations, but introducing state seems needs mutex.
>
> [1]
> igb_uio: fix refcount if open returns error
> https://patches.dpdk.org/patch/44732/


yes, i see and will rework for that if need.


>> +		mutex_unlock(&udev->lock);
>>   		return err;
>>   	}
>> +	udev->state = RTE_UDEV_OPENNED;
>> +	mutex_unlock(&udev->lock);
>>   	return 0;
>>   }
>>   
>> +/**
>> + * This gets called while closing uio device file.
>> + */
>>   static int
>>   igbuio_pci_release(struct uio_info *info, struct inode *inode)
>>   {
>>   	struct rte_uio_pci_dev *udev = info->priv;
>>   	struct pci_dev *dev = udev->pdev;
>>   
>> +	if (udev->state == RTE_UDEV_REMOVED) {
>> +		mutex_destroy(&udev->lock);
>> +		igbuio_pci_release_iomem(&udev->info);
>> +		pci_disable_device(dev);
>> +		pci_set_drvdata(dev, NULL);
>> +		kfree(udev);
>> +		return 0;
> This branch taken when pci_remove called before pci_release.
> - At this stage is "dev" valid, since pci_remove() called?
> - In this path uio_unregister_device() is missing, who unregisters uio?
> - sysfs_remove_group() also missing, it is not clear if it is forgotten or left
> out, what do you think move common part of pci_remove into new function and call
> both in pci_remove and here?


It is not forgotten but specific left out, since the if uio remove 
before uio release it will cause kernel error, which is double free the

already-free irq issue when uio unregister device.


> And as a logic, can we make pci_remove clear everything, instead of doing some
> cleanup here. Like:
> pci_remove:
> - calls pci_release
> - instead of return keeps doing pci_remove work
> - set state to REMOVED
>
> pci_release:
> - if state is REMOVED, return without doing nothing


I think the logic you said here is make sense, just make release and 
remove more focus their own work.


>
> btw, even after uio_unregister_device() how pci_release called?


  The consequence of igb uio removal is that, igb uio remove be called, 
then igb uio release be called when detaching device, then if quit app 
it will call pci remove.


>
> It can help to share crash backtrace in commit log, to describe problem in more
> detail.


I will do that. And i think the 2 thing need to fix is that, one is the 
double free irq issue, the other one is give the chance to disable 
interrupt when uio remove be called before uio release. I check again 
and find that just add release before remove could both fix these

issues, so please review my coming update patch. Thanks.




More information about the dev mailing list