[dpdk-dev] [PATCH v12 6/7] eal: add failure handle mechanism for hot-unplug

Ananyev, Konstantin konstantin.ananyev at intel.com
Tue Oct 2 18:00:58 CEST 2018



> -----Original Message-----
> From: Burakov, Anatoly
> Sent: Tuesday, October 2, 2018 4:54 PM
> To: Guo, Jia <jia.guo at intel.com>; stephen at networkplumber.org; Richardson, Bruce <bruce.richardson at intel.com>; Yigit, Ferruh
> <ferruh.yigit at intel.com>; Ananyev, Konstantin <konstantin.ananyev at intel.com>; gaetan.rivet at 6wind.com; Wu, Jingjing
> <jingjing.wu at intel.com>; thomas at monjalon.net; motih at mellanox.com; matan at mellanox.com; Van Haaren, Harry
> <harry.van.haaren at intel.com>; Zhang, Qi Z <qi.z.zhang at intel.com>; He, Shaopeng <shaopeng.he at intel.com>; Iremonger, Bernard
> <bernard.iremonger at intel.com>; arybchenko at solarflare.com; Lu, Wenzhuo <wenzhuo.lu at intel.com>; jerin.jacob at caviumnetworks.com
> Cc: jblunck at infradead.org; shreyansh.jain at nxp.com; dev at dpdk.org; Zhang, Helin <helin.zhang at intel.com>
> Subject: Re: [PATCH v12 6/7] eal: add failure handle mechanism for hot-unplug
> 
> On 02-Oct-18 1:35 PM, Jeff Guo wrote:
> > The mechanism can initially register the sigbus handler after the device
> > event monitor is enabled. When a sigbus event is captured, it will check
> > the failure address and accordingly handle the memory failure of the
> > corresponding device by invoke the hot-unplug handler. It could prevent
> > the application from crashing when a device is hot-unplugged.
> >
> > By this patch, users could call below new added APIs to enable/disable
> > the device hotplug handle mechanism. Note that it just implement the
> > hot-unplug handler in these functions, the other handler of hotplug, such
> > as handler for hotplug binding, could be add in the future if need:
> >    - rte_dev_hotplug_handle_enable
> >    - rte_dev_hotplug_handle_disable
> >
> > Signed-off-by: Jeff Guo <jia.guo at intel.com>
> > ---
> 
> <snip>
> 
> > +static void sigbus_handler(int signum, siginfo_t *info,
> > +				void *ctx __rte_unused)
> > +{
> > +	int ret;
> > +
> > +	RTE_LOG(INFO, EAL, "Thread[%d] catch SIGBUS, fault address:%p\n",
> > +		(int)pthread_self(), info->si_addr);
> > +
> > +	rte_spinlock_lock(&failure_handle_lock);
> > +	ret = rte_bus_sigbus_handler(info->si_addr);
> > +	rte_spinlock_unlock(&failure_handle_lock);
> > +	if (ret == -1) {
> > +		rte_exit(EXIT_FAILURE,
> > +			 "Failed to handle SIGBUS for hot-unplug, "
> > +			 "(rte_errno: %s)!", strerror(rte_errno));
> 
> Do we really want to exit the application on sigbus handle failure?

I'd say yes :)
What else we can do in such situation, except then die gracefully?
Konstantin

> 
> > +	} else if (ret == 1) {
> > +		if (sigbus_action_old.sa_handler)
> > +			(*(sigbus_action_old.sa_handler))(signum);
> > +		else
> > +			rte_exit(EXIT_FAILURE,
> > +				 "Failed to handle generic SIGBUS!");
> > +	}
> > +
> > +	RTE_LOG(INFO, EAL, "Success to handle SIGBUS for hot-unplug!\n");
> 
> Again, does this all need to be with INFO log level? IMO it should be DEBUG.
> 
> > +}
> > +
> > +static int cmp_dev_name(const struct rte_device *dev,
> > +	const void *_name)
> > +{
> > +	const char *name = _name;
> > +
> > +	return strcmp(dev->name, name);
> > +}
> > +
> >   static int
> 
> <snip>
> 
> >
> >   int __rte_experimental
> > @@ -220,5 +320,67 @@ rte_dev_event_monitor_stop(void)
> >   	close(intr_handle.fd);
> >   	intr_handle.fd = -1;
> >   	monitor_started = false;
> > +
> >   	return 0;
> 
> This looks like unintended change.
> 
> >   }
> > +
> > +int __rte_experimental
> > +rte_dev_sigbus_handler_register(void)
> > +{
> > +	sigset_t mask;
> > +	struct sigaction action;
> > +
> 
> <snip>
> 
> > --- a/lib/librte_eal/rte_eal_version.map
> > +++ b/lib/librte_eal/rte_eal_version.map
> > @@ -281,6 +281,8 @@ EXPERIMENTAL {
> >   	rte_dev_event_callback_unregister;
> >   	rte_dev_event_monitor_start;
> >   	rte_dev_event_monitor_stop;
> > +	rte_dev_hotplug_handle_enable;
> > +	rte_dev_hotplug_handle_disable;
> 
> Nitpicking - disable should be above enable, as E follows D in alphabet :)
> 
> >   	rte_dev_iterator_init;
> >   	rte_dev_iterator_next;
> >   	rte_devargs_add;
> >
> 
> 
> --
> Thanks,
> Anatoly


More information about the dev mailing list