[dpdk-dev] [PATCH 3/3] app/testpmd: fix callback issue for hot-unplug

Matan Azrad matan at mellanox.com
Sun Nov 11 08:31:48 CET 2018



From: Jeff Guo 
> On 11/9/2018 1:24 PM, Matan Azrad wrote:
> >
> >   From: Jeff Guo
> >> On 11/8/2018 5:35 PM, Matan Azrad wrote:
> >>> From: Jeff Guo
> >>>> On 11/8/2018 3:28 PM, Matan Azrad wrote:
> >>>>> From: Ananyev, Konstantin
> >>>>>>> -----Original Message-----
> >>>>>>> From: Guo, Jia
> >>>>>>> Sent: Wednesday, November 7, 2018 7:30 AM
> >>>>>>> To: Matan Azrad <matan at mellanox.com>; Ananyev, Konstantin
> >>>>>>> <konstantin.ananyev at intel.com>; Burakov, Anatoly
> >>>>>>> <anatoly.burakov at intel.com>; Thomas Monjalon
> >>>>>> <thomas at monjalon.net>;
> >>>>>>> Iremonger, Bernard <bernard.iremonger at intel.com>; Wu, Jingjing
> >>>>>>> <jingjing.wu at intel.com>; Lu, Wenzhuo <wenzhuo.lu at intel.com>
> >>>>>>> Cc: Yigit, Ferruh <ferruh.yigit at intel.com>; dev at dpdk.org; Zhang,
> >>>>>>> Helin <helin.zhang at intel.com>; He, Shaopeng
> >>>> <shaopeng.he at intel.com>
> >>>>>>> Subject: Re: [PATCH 3/3] app/testpmd: fix callback issue for
> >>>>>>> hot-unplug
> >>>>>>>
> >>>>>>> matan
> >>>>>>>
> >>>>>>> On 11/6/2018 2:36 PM, Matan Azrad wrote:
> >>>>>>>> Hi Jeff
> >>>>>>>>
> >>>>>>>>      From: Jeff Guo <jia.guo at intel.com>
> >>>>>>>>> Before detach device when device be hot-unplugged, the failure
> >>>>>>>>> process in user space and kernel space both need to be
> >>>>>>>>> finished, such as eal interrupt callback need to be inactive
> >>>>>>>>> before the callback be unregistered when device is being
> >>>>>>>>> cleaned. This patch add rte alarm for device detaching, with
> >>>>>>>>> that it could finish interrupt callback soon and give time to
> >>>>>>>>> let the failure process done
> >>>>>> before detaching.
> >>>>>>>>> Fixes: 2049c5113fe8 ("app/testpmd: use hotplug failure
> >>>>>>>>> handler")
> >>>>>>>>> Signed-off-by: Jeff Guo <jia.guo at intel.com>
> >>>>>>>>> ---
> >>>>>>>>>      app/test-pmd/testpmd.c | 13 ++++++++++++-
> >>>>>>>>>      1 file changed, 12 insertions(+), 1 deletion(-)
> >>>>>>>>>
> >>>>>>>>> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
> >>>>>>>>> index 9c0edca..9c673cf 100644
> >>>>>>>>> --- a/app/test-pmd/testpmd.c
> >>>>>>>>> +++ b/app/test-pmd/testpmd.c
> >>>>>>>>> @@ -2620,7 +2620,18 @@ eth_dev_event_callback(const char
> >>>>>>>>> *device_name, enum rte_dev_event_type type,
> >>>>>>>>>      				device_name);
> >>>>>>>>>      			return;
> >>>>>>>>>      		}
> >>>>>>>>> -		rmv_event_callback((void *)(intptr_t)port_id);
> >>>>>>>>> +		/*
> >>>>>>>>> +		 * Before detach device, the hot-unplug failure
> >>>> process in
> >>>>>>>>> +		 * user space and kernel space both need to be
> >>>> finished,
> >>>>>>>>> +		 * such as eal interrupt callback need to be inactive
> >>>> before
> >>>>>>>>> +		 * the callback be unregistered when device is being
> >>>> cleaned.
> >>>>>>>>> +		 * So finished interrupt callback soon here and give
> >>>> time to
> >>>>>>>>> +		 * let the work done before detaching.
> >>>>>>>>> +		 */
> >>>>>>>>> +		if (rte_eal_alarm_set(100000,
> >>>>>>>>> +				rmv_event_callback, (void
> >>>>>>>>> *)(intptr_t)port_id))
> >>>>>>>>> +			RTE_LOG(ERR, EAL,
> >>>>>>>>> +				"Could not set up deferred device
> >>>>>>>> It looks me strange to use callback and alarm to remove a device:
> >>>>>>>> Why not to use callback and that is it?
> >>>>>>>>
> >>>>>>>> I think that it's better to let the EAL to detach the device
> >>>>>>>> after all the
> >>>>>> callbacks were done and not to do it by the user callback.
> >>>>>>>> So the application\callback owners just need to clean its
> >>>>>>>> resources with understanding that after the callback the
> >>>>>>>> device(and the callback
> >>>>>>> itself) will be detached by the EAL.
> >>>>>>>
> >>>>>>>
> >>>>>>> Firstly, at the currently framework and solution, such as
> >>>>>>> callback for RTE_ETH_EVENT_INTR_RMV, still need to use the
> >>>>>>> deferred device
> >>>>>> removal,
> >>>>>>> we tend to give the control of detaching device to the
> >>>>>>> application, and the whole process is located on the user's
> >>>>>>> callback. Notify app to detach device by callback but make it
> >>>>>>> deferred,
> >> i think it is fine.
> >>>>> But the device must be detached in remove event, why not to do it
> >>>>> in
> >> EAL?
> >>>> I think it because of before detached the device, application
> >>>> should be stop the forwarding at first, then stop the device, then
> >>>> close
> >>>>
> >>>> the device, finally call eal unplug API to detach device. If eal
> >>>> directly detach device at the first step, there will be mountain
> >>>> user space error need to handle, so that is one reason why need to
> >>>> provider the remove notification to app, and let app to process it.
> >>> This is why the EAL need to detach the device only after all the
> >>> user
> >> callbacks were done.
> >>
> >>
> >> If i correctly got your meaning, you suppose to let eal to mandatory
> >> detach device but not app, app just need to stop/close port, right?
> > Yes, the app should stop,close,clean its own resources of the removed
> > device, Then, EAL to detach the device.
> >
> >> If so, it will need to modify rmv_event_callback by not invoke the
> >> detaching func and add some detaching logic to hotplug framework in eal.
> >>
> > rmv_event_callback is using by other hotplug mechanism (ETHDEV RMV
> event), so you need to use another one\ add parameter to it.
> > And yes, you need to detach the device from EAL, should be simple.
> 
> 
> I think rmv_event_callback is original use for other hotplug event (ETHDEV
> RMV event), but it still use the common hotplug mechanism(app callback and
> app detach),
> 
> so i think it will still need to face this callback issue and you could check that
> eth_event_callback also use the rte alarm to detach device.

The ETHDEV layer cannot know if more than one ethdev port is associated to the rte_device,
So, it is not correct to detach a rte_device by the ETHDEV layer. Hence, the application should decide in the ETHDEV event case.

But, in the EAL event case this is different as I explained before.

Moreover, I think that the ethdev RMV event should be deprecated one day, after the EAL hot-plug mechanism will be stable. 

> 
> so my suggestion is that, you maybe propose a good idea but let we keep on
> current mechanism until we come to a final good solution agreement, before
> that, just let
> 
> it functional.
> 

I still suggest to fix the issue by an EAL detaching.

> 
> >> It is hardly say better or worse but this new propose need to discuss
> >> more in public. And it might be better to use another specific patch to
> handler it.
> >> What do you or other guys think so?
> > Since you are fixing issue here, it can be done by a fix series.
> >
> > Other feedbacks are welcome all the time 😊
> >
> >>
> >>>>>> It is also unclear to me my we need an alarm here.
> >>>>>> First (probably wrong) impression we just try to hide some
> >>>>>> synchronization Problem by introducing delay.
> >>>>> Looks like, the issue is that the callback function memory will be
> >>>>> removed
> >>>> from the function itself (by the detach call), no?
> >>>>
> >>>>
> >>>> Answer here for both Konstantin and Matan.
> >>>>
> >>>> Yes, i think matan is right, the interrupt callback will be destroy
> >>>> in the app callback itself, the sequence is that as below
> >>>>
> >>>> hot-unplug interrupt -> interrupt callback -> app callback(return
> >>>> to finish interrupt callback, delay detaching) -> detach
> >>>> device(unregister interrupt
> >>>> callback)
> >>>>
> >>>>
> >>>>>> Konstantin
> >>>>>>
> >>>>>>> Secondly, the vfio is different with igb_uio for hot-unplug, it
> >>>>>>> register/unregister hotplug interrupt callback for each device,
> >>>>>>> so need to make  the callback done before unregister the callback.
> >>>>>>>
> >>>>>>> So I think it should be considerate as an workaround here,
> >>>>>>> before we find a better way.
> >>>>>>>
> >>>>>>>
> >>>>>>>>> removal\n");
> >>>>>>>>>      		break;
> >>>>>>>>>      	case RTE_DEV_EVENT_ADD:
> >>>>>>>>>      		RTE_LOG(ERR, EAL, "The device: %s has been
> >>>> added!\n",
> >>>>>>>>> --
> >>>>>>>>> 2.7.4


More information about the dev mailing list