[dpdk-dev] [PATCH 04/22] ethdev: enable hotplug on multi-process

Zhang, Qi Z qi.z.zhang at intel.com
Tue Jun 19 05:22:53 CEST 2018



> -----Original Message-----
> From: Burakov, Anatoly
> Sent: Monday, June 18, 2018 4:18 PM
> To: Zhang, Qi Z <qi.z.zhang at intel.com>; thomas at monjalon.net
> Cc: Ananyev, Konstantin <konstantin.ananyev at intel.com>; dev at dpdk.org;
> Richardson, Bruce <bruce.richardson at intel.com>; Yigit, Ferruh
> <ferruh.yigit at intel.com>; Shelton, Benjamin H
> <benjamin.h.shelton at intel.com>; Vangati, Narender
> <narender.vangati at intel.com>
> Subject: Re: [PATCH 04/22] ethdev: enable hotplug on multi-process
> 
> On 07-Jun-18 1:38 PM, Qi Zhang wrote:
> > The patch introduce the solution to handle different hotplug cases in
> > multi-process situation, it include below scenario:
> >
> > 1. Attach a share device from primary
> > 2. Detach a share device from primary
> > 3. Attach a share device from secondary 4. Detach a share device from
> > secondary 5. Attach a private device from secondary 6. Detach a
> > private device from secondary 7. Detach a share device from secondary
> > privately 8. Attach a share device from secondary privately
> >
> > In primary-secondary process model, we assume device is shared by
> default.
> > that means attach or detach a device on any process will broadcast to
> > all other processes through mp channel then device information will be
> > synchronized on all processes.
> >
> > Any failure during attaching process will cause inconsistent status
> > between processes, so proper rollback action should be considered.
> > Also it is not safe to detach a share device when other process still
> > use it, so a handshake mechanism is introduced, it will be implemented
> > in following separate patch.
> >
> > Scenario for Case 1, 2:
> >
> > attach device
> > a) primary attach the new device if failed goto h).
> > b) primary send attach sync request to all secondary.
> > c) secondary receive request and attach device and send reply.
> > d) primary check the reply if all success go to i).
> > e) primary send attach rollback sync request to all secondary.
> > f) secondary receive the request and detach device and send reply.
> > g) primary receive the reply and detach device as rollback action.
> > h) attach fail
> > i) attach success
> >
> > detach device
> > a) primary perform pre-detach check, if device is locked, goto i).
> > b) primary send pre-detach sync request to all secondary.
> > c) secondary perform pre-detach check and send reply.
> > d) primary check the reply if any fail goto i).
> > e) primary send detach sync request to all secondary
> > f) secondary detach the device and send reply (assume no fail)
> > g) primary detach the device.
> > h) detach success
> > i) detach failed
> >
> > Case 3, 4:
> > This will be implemented in following patch.
> 
> If these will be implemented in following patch, why spend half the commit
> message talking about it? :) 

Sorry, I didn't get your point about "see half commit to talk about it" :)
This patch covered an overview, and also the implementation of case 1,2,5,6,7,8

For case 3, 4, just below 4 lines to describe it

3. Attach a share device from secondary.
4. Detach a share device from secondary.
Case 3, 4:
This will be implemented in following patch.

> is commit doesn't implement secondary
> process functionality at all, so the commit message should probably be
> reworded to only include primary process logic, no?

OK, I will reword it to highlight the patch's scope as description at above.

> 
> >
> > Case 5, 6:
> > Secondary process can attach private device which only visible to itself,
> > in this case no IPC is involved, primary process is not allowed to have
> > private device so far.
> >
> > Case 7, 8:
> > Secondary process can also temporally to detach a share device "privately"
> > then attach it back later, this action also not impact other processes.
> >
> > APIs chenages:
> 
> Multiple typos - "chenages", "temporally", "allowd", etc.

Thanks

> 
> >
> > rte_eth_dev_attach and rte_eth_dev_attach are extended to support
> > share device attach/detach in primary-secondary process model, it will
> > be called in case 1,2,3,4.
> >
> > New API rte_eth_dev_attach_private and rte_eth_dev_detach_private are
> > introduced to cover case 5,6,7,8, this API can only be invoked in secondary
> > process.
> >  > Signed-off-by: Qi Zhang <qi.z.zhang at intel.com>
> > ---
> 
> <snip>
> 
> >   	rte_eal_mcfg_complete();
> >
> > +	if (rte_eth_dev_mp_init()) {
> > +		rte_eal_init_alert("rte_eth_dev_mp_init() failed\n");
> > +		rte_errno = ENOEXEC;
> > +		return -1;
> > +	}
> > +
> 
> Why is this done after the end of init? rte_eal_mcfg_complete() makes it
> so that secondaries can initialize, at that point all initialization
> should have been finished. I would expect this to be called after
> (before?) bus probe, since this is device-related.

OK will move ahead.

> 
> >   	return fctret;
> >   }
> >
> > diff --git a/lib/librte_ethdev/Makefile b/lib/librte_ethdev/Makefile
> > index c2f2f7d82..04e93f337 100644
> > --- a/lib/librte_ethdev/Makefile
> > +++ b/lib/librte_ethdev/Makefile
> > @@ -19,6 +19,7 @@ EXPORT_MAP := rte_ethdev_version.map
> >   LIBABIVER := 9
> >
> 
> <snip>
> 
> > +	if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
> > +
> > +		/**
> > +		 * If secondary process, we just send request to primray
> > +		 * to start the process.
> > +		 */
> > +		req.t = REQ_TYPE_ATTACH;
> > +		strlcpy(req.devargs, devargs, MAX_DEV_ARGS_LEN);
> > +
> > +		ret = rte_eth_dev_request_to_primary(&req);
> > +		if (ret) {
> > +			ethdev_log(ERR, "Failed to send device attach request to
> primary\n");
> 
> The log message is a little misleading. It can be that secondary has
> failed to send request. It can also be that it succeeded, but the attach
> itself has failed. I think a better message would be "attach request has
> failed" or something to that effect.

The return value of rte_eth_dev_request_to_primary only means communication fail,
(message not able to send, or not get reply in time).
but not the fail on attach/detach itself. (which comes from req->result)

> 
> > +			return ret;
> > +		}
> > +
> > +		*port_id = req.port_id;
> > +		return req.result;
> > +	}
> > +
> > +	ret = do_eth_dev_attach(devargs, port_id);
> > +	if (ret)
> > +		return ret;
> > +
> > +	/* send attach request to seoncary */
> > +	req.t = REQ_TYPE_ATTACH;
> > +	strlcpy(req.devargs, devargs, MAX_DEV_ARGS_LEN);
> > +	req.port_id = *port_id;
> > +	ret = rte_eth_dev_request_to_secondary(&req);
> > +	if (ret) {
> > +		ethdev_log(ERR, "Failed to send device attach request to
> secondary\n");
> 
> Same as above - log message can/might be misleading. There are a few
> other places where similar log message is present, those should be
> corrected too.

Same as above

> 
> > +		goto rollback;
> > +	}
> > +
> > +	if (req.result)
> > +		goto rollback;
> > +
> > +	return 0;
> 
> <snip>
> 
> > +{
> > +	uint32_t dev_flags;
> > +
> > +	if (rte_eal_process_type() == RTE_PROC_PRIMARY)
> > +		return -ENOTSUP;
> > +
> > +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
> > +
> > +	dev_flags = rte_eth_devices[port_id].data->dev_flags;
> > +	if (dev_flags & RTE_ETH_DEV_BONDED_SLAVE) {
> > +		ethdev_log(ERR,
> > +			"Port %" PRIu16 " is bonded, cannot detach", port_id);
> > +		return -ENOTSUP;
> > +	}
> 
> Do we have to do a similar check for failsafe devices?

Just keep it same logic as before, it could be a separate patch to fix I guess.

> 
> > +
> > +	return do_eth_dev_detach(port_id);
> > +}
> > +
> >   static int
> >   rte_eth_dev_rx_queue_config(struct rte_eth_dev *dev, uint16_t
> nb_queues)
> >   {
> > diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
> > index 36e3984ea..bb03d613b 100644
> > --- a/lib/librte_ethdev/rte_ethdev.h
> > +++ b/lib/librte_ethdev/rte_ethdev.h
> 
> <snip>
> 
> >   /**
> > + * Attach a private Ethernet device specified by arguments.
> > + * A private device is invisible to other process.
> > + * Can only be invoked in secondary process.
> > + *
> > + * @param devargs
> > + *  A pointer to a strings array describing the new device
> > + *  to be attached. The strings should be a pci address like
> > + *  '0000:01:00.0' or virtual device name like 'net_pcap0'.
> > + * @param port_id
> > + *  A pointer to a port identifier actually attached.
> > + * @return
> > + *  0 on success and port_id is filled, negative on error
> > + */
> > +int rte_eth_dev_attach_private(const char *devargs, uint16_t *port_id);
> 
> New API's should be marked as __rte_experimental.

OK

> 
> > +
> > +/**
> >    * Detach a Ethernet device specified by port identifier.
> >    * This function must be called when the device is in the
> >    * closed state.
> > + * In multi-process mode, it will sync with other process
> > + * to detach the device.
> >    *
> >    * @param port_id
> >    *   The port identifier of the device to detach.
> > @@ -1490,6 +1511,22 @@ int rte_eth_dev_attach(const char *devargs,
> uint16_t *port_id);
> 
> <snip>
> 
> > + * Detach a Ethernet device in current process.
> > + *
> > + * @param port_id
> > + *   The port identifier of the device to detach.
> > + * @param devname
> > + *   A pointer to a buffer that will be filled with the device name.
> > + *   This buffer must be at least RTE_DEV_NAME_MAX_LEN long.
> > + * @return
> > + *  0 on success and devname is filled, negative on error
> > + */
> > +int do_eth_dev_detach(uint16_t port_id);
> > +
> 
> Why is this made part of an external API? You should have a separate,
> private header file for these.

OK, will add to ethdev_private.h in v2.

> 
> >   #ifdef __cplusplus
> >   }
> >   #endif
> > diff --git a/lib/librte_ethdev/rte_ethdev_mp.c
> b/lib/librte_ethdev/rte_ethdev_mp.c
> > new file mode 100644
> > index 000000000..8ede8151d
> > --- /dev/null
> > +++ b/lib/librte_ethdev/rte_ethdev_mp.c
> > @@ -0,0 +1,195 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(c) 2010-2018 Intel Corporation
> > + */
> > +
> > +#include "rte_ethdev_driver.h"
> > +#include "rte_ethdev_mp.h"
> > +
> > +static int detach_on_secondary(uint16_t port_id)
> 
> <snip>
> 
> > +	free(da.args);
> > +	return 0;
> > +}
> > +
> > +static int handle_secondary_request(const struct rte_mp_msg *msg, const
> void *peer)
> > +{
> > +	(void)msg;
> > +	(void)(peer);
> > +	return -ENOTSUP;
> 
> Please either mark arguments as __rte_unused, or use RTE_SET_USED(blah)
> macro. Same in other similar places.

OK.

> 
> > +}
> > +
> > +static int handle_primary_response(const struct rte_mp_msg *msg, const
> void *peer)
> > +{
> > +	(void)msg;
> > +	(void)(peer);
> > +	return -ENOTSUP;
> > +}
> > +
> > +static int handle_primary_request(const struct rte_mp_msg *msg, const
> void *peer)
> > +{
> > +	const struct eth_dev_mp_req *req =
> > +		(const struct eth_dev_mp_req *)msg->param;
> 
> <snip>
> 
> > +	case REQ_TYPE_DETACH:
> > +	case REQ_TYPE_ATTACH_ROLLBACK:
> > +		ret = detach_on_secondary(req->port_id);
> > +		break;
> > +	default:
> > +		ret = -EINVAL;
> > +	}
> > +
> > +	strcpy(mp_resp.name, ETH_DEV_MP_ACTION_REQUEST);
> 
> Here and in other places: rte_strlcpy?

OK

Thanks!
Qi
> 
> --
> Thanks,
> Anatoly


More information about the dev mailing list