[dpdk-dev] [PATCH v2 1/4] bus/pci: handle device hot unplug
Guo, Jia
jia.guo at intel.com
Tue Jun 26 17:30:00 CEST 2018
hi, gaetan,
thanks for your review, see comment as bellow
On 6/22/2018 8:59 PM, Gaëtan Rivet wrote:
> Hi Jeff,
>
> Sorry, I followed this development from afar,
> I have a remark regarding this API, I think it can be made simpler.
> Details below.
>
> On Fri, Jun 22, 2018 at 07:51:05PM +0800, Jeff Guo wrote:
>> When a hardware device is removed physically or the software disables
>> it, the hot unplug occur. App need to call ether dev API to detach the
>> device, to unplug the device at the bus level and make access to the device
>> invalid. But the problem is that, the removal of the device from the
>> software lists is not going to be instantaneous, at this time if the data
>> path still read/write the device, it will cause MMIO error and result of
>> the app crash out. So a hot unplug handle mechanism need to guaranty app
>> will not crash out when hot unplug device.
>>
>> To handle device hot unplug is bus-specific behavior, this patch introduces
>> a bus ops so that each kind of bus can implement its own logic. Further,
>> this patch implements the ops for PCI bus: remap a dummy memory to avoid
>> bus read/write error.
>>
>> Signed-off-by: Jeff Guo <jia.guo at intel.com>
>> ---
>> v2->v1(v21):
>> refind commit log
>> ---
>> drivers/bus/pci/pci_common.c | 65 +++++++++++++++++++++++++++++++++
>> drivers/bus/pci/pci_common_uio.c | 33 +++++++++++++++++
>> drivers/bus/pci/private.h | 12 ++++++
>> lib/librte_eal/common/include/rte_bus.h | 16 ++++++++
>> 4 files changed, 126 insertions(+)
>>
>> diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
>> index 7215aae..74d9aa8 100644
>> --- a/drivers/bus/pci/pci_common.c
>> +++ b/drivers/bus/pci/pci_common.c
>> @@ -472,6 +472,70 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
>> return NULL;
>> }
>>
>> +static struct rte_pci_device *
>> +pci_find_device_by_addr(void *failure_addr)
>> +{
>> + struct rte_pci_device *pdev = NULL;
>> + int i;
>> +
>> + FOREACH_DEVICE_ON_PCIBUS(pdev) {
>> + for (i = 0; i != RTE_DIM(pdev->mem_resource); i++) {
>> + if ((uint64_t)(uintptr_t)failure_addr >=
>> + (uint64_t)(uintptr_t)pdev->mem_resource[i].addr &&
>> + (uint64_t)(uintptr_t)failure_addr <
>> + (uint64_t)(uintptr_t)pdev->mem_resource[i].addr +
>> + pdev->mem_resource[i].len) {
>> + RTE_LOG(ERR, EAL, "Failure address "
>> + "%16.16"PRIx64" belongs to "
>> + "device %s!\n",
>> + (uint64_t)(uintptr_t)failure_addr,
>> + pdev->device.name);
>> + return pdev;
>> + }
>> + }
>> + }
>> + return NULL;
>> +}
> You define here a new bus ops that takes either an rte_device or an
> arbitrary address as input.
>
> In the uev handler that would call this ops afterward, you similarly try
> to find either a bus using the device name, or then iterate over all
> buses and try to find one able to handle the error.
>
> This seems redundant and prone to ambiguity: should one check that the
> device address is actually linked with the provided address? If not, is
> it an improper call or a special case? This is unclear.
>
> Note: I haven't followed the previous discussion, maybe the
> dual dev_addr + failure_addr is warranted in the API here,
> if so why not. Otherwise it just seems redundant:
> the dev addr will never be within a physical BAR mapping,
> and for all buses / drivers not using physical mappings,
> the addr is only meaningful as a cue to find an internal
> resource.
>
> You can use the bus->find_device() to iterate over buses, and design
> your bus ops such that when provided with an addr, would do whatever it
> needs internally to find a relevant resource and either handle the
> error, or return that the error was not handled.
if bus->find_device() can make think more simpler, why not? i will check
that and modify it.
> Something like that:
>
> /* new bus ops: */
> /* If generic error codes are defined as part of the API,
> >0 should mean that the sigbus was not handled,
> <0 that an error occured but that one should stop trying,
> 0 that everything is ok.
> */
> int (*handle_sigbus)(void *addr);
>
> /* new rte_bus API: */
>
> static int
> bus_handle_sigbus(const struct rte_bus *bus,
> const void *addr)
> {
> /* If additional error codes are defined as part of the API,
> negative values should stop the iteration.
> In this case, rte_errno would need to be set as well.
> */
> return !(bus->handle_sigbus && bus->handle_sigbus(addr) <= 0);
> }
>
> int
> rte_bus_sigbus_handler(void *addr)
> {
> struct rte_bus *bus;
> int old_errno = rte_errno;
>
> rte_errno = 0;
> bus = rte_bus_find(NULL, bus_handle_sigbus, addr);
> if (bus == NULL) {
> /* ERROR: no bus could handle the error. */
> RTE_LOG(ERR, EAL, "No bus was able to handle the error");
> return -1;
> } else if {rte_errno != 0) {
> /* ERROR: a generic sigbus handling error. */
> RTE_LOG(ERR, EAL, "Say what the error is");
> return -1;
> }
> rte_errno = old_errno;
> return 0;
> }
>
> Which would afterward be implemented, for example in PCI bus:
>
> static rte_pci_device *
> pci_find_device_by_addr(void *addr)
> {
> struct rte_pci_device *pdev;
>
> FOREACH_DEVICE_ON_PCIBUS(pdev)
> if (&pdev->device == addr ||
> /* addr within mappings of pdev */)
> return pdev;
> return NULL;
> }
>
> static int
> pci_handle_sigbus(void *addr)
> {
> static rte_pci_device *pdev;
>
> pdev = pci_find_device_by_addr(addr);
> if (pdev == NULL)
> return -1;
> /* Here, handle uio remap if needed. */
> }
>
> -------------------------
>
> - Leave the bus iteration and check within rte_bus. Centralize the call
> in a tighter rte_bus API, do not use directly your OPS from your other
> EAL facility.
>
> - You have left several error messages for signaling success (!), or
> even simply that a bus could not handle a specific address. This is
> bad. An error should only appear on error. Otherwise, all of this can
> easily be traced using a debugger, so I don't think it's necessary
> to leave it at a DEBUG level.
>
> But in any case, remove all ERR-level messages for success.
make sense. i will check which debug message is no need at least.
> <...>
>
>> diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
>> index eb9eded..6a5609f 100644
>> --- a/lib/librte_eal/common/include/rte_bus.h
>> +++ b/lib/librte_eal/common/include/rte_bus.h
>> @@ -168,6 +168,20 @@ typedef int (*rte_bus_unplug_t)(struct rte_device *dev);
>> typedef int (*rte_bus_parse_t)(const char *name, void *addr);
>>
>> /**
>> + * Implementation a specific hot unplug handler, which is responsible
>> + * for handle the failure when hot unplug the device, guaranty the system
>> + * would not hung in the case.
>> + * @param dev
>> + * Pointer of the device structure.
>> + *
>> + * @return
>> + * 0 on success.
>> + * !0 on error.
>> + */
>> +typedef int (*rte_bus_handle_hot_unplug_t)(struct rte_device *dev,
>> + void *dev_addr);
>> +
> I don't like the name of the OPS.
> The documentation evokes only "the failure".
> So is it a handle for any and all error possibly happening to a device?
> If so, where is the input to describe the error?
> If it is only meant to handle SIGBUS, because it is a very specific
> error state only meant to happen on certain parts of the bus (the queue
> mappings, if relevant), then it makes sense to only have an arbitrary
> address as context for handling.
>
> But then, it needs to be called as such. The expected failure to be
> handled should be explicit in the name of the ops, and the documentation
> should be more precise about what a bus developper should do with the
> input.
I agree with your point of let the name more explicit, but i think here
maybe we should spit it into two ops, the one is hotplug_handler, the
other is sigbus_handler, because there are
2 path that both, data path and control path, they are also need to call
remap function when detect the hot remove event,even there are no sigbus
happen.
>> +/**
>> * Bus scan policies
>> */
>> enum rte_bus_scan_mode {
>> @@ -209,6 +223,8 @@ struct rte_bus {
>> rte_bus_plug_t plug; /**< Probe single device for drivers */
>> rte_bus_unplug_t unplug; /**< Remove single device from driver */
>> rte_bus_parse_t parse; /**< Parse a device name */
>> + rte_bus_handle_hot_unplug_t handle_hot_unplug; /**< handle hot unplug
>> + device event */
> The new ops should be added at the end of the structure.
ok.
> Regards,
More information about the dev
mailing list