[dpdk-dev] [PATCH v10 03/11] net/failsafe: add fail-safe PMD

Ferruh Yigit ferruh.yigit at intel.com
Mon Jul 17 15:56:54 CEST 2017


On 7/15/2017 6:57 PM, Gaetan Rivet wrote:
> Introduce the fail-safe poll mode driver initialization and enable its
> build infrastructure.
> 
> This PMD allows for applications to benefit from true hot-plugging
> support without having to implement it.
> 
> It intercepts and manages Ethernet device removal events issued by
> slave PMDs and re-initializes them transparently when brought back.
> It also allows defining a contingency to the removal of a device, by
> designating a fail-over device that will take on transmitting operations
> if the preferred device is removed.
> 
> Applications only see a fail-safe instance, without caring for
> underlying activity ensuring their continued operations.

All PMD in a single patch is hard to review, I am sure some details
missed during the review, but taking account the histroy of the PMD I
accept this as it is, but I will rely on your support to fix issues in
the future.

> 
> Signed-off-by: Gaetan Rivet <gaetan.rivet at 6wind.com>
> Acked-by: Olga Shern <olgas at mellanox.com>

<...>

> +Usage example
> +~~~~~~~~~~~~~
> +
> +This section shows some example of using **testpmd** with a fail-safe PMD.
> +
> +#. Request huge pages:
> +
> +   .. code-block:: console
> +
> +      echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages

I think this is extra for usage sample, if you want there is generic
guide [1] that you ca reference.

[1]
http://dpdk.org/doc/guides/nics/build_and_test.html

> +
> +#. Start testpmd. The slave device should be blacklisted from normal EAL
> +   operations to avoid probing it twice when in PCI blacklist mode.
> +
> +   .. code-block:: console
> +
> +      $RTE_TARGET/build/app/testpmd -c 0xff -n 4 \
> +         -w 'net_failsafe0,mac=de:ad:be:ef:01:02,dev(84:00.0),dev(net_ring0)'
> +         -b 84:00.0 -b 00:04.0 -- -i

Do you think does it make sense to stress sub-device shouldn't be probed
by EAL, I believe it is not clear above.

> +
> +   Note that PCI blacklist mode is the default PCI operating mode. In this
> +   configuration, the fail-safe cannot proceed with its slaves if they have
> +   been probed beforehand.
> +
> +#. Alternatively, it can be used alongside any other device in whitelist mode.
> +
> +   .. code-block:: console
> +
> +      $RTE_TARGET/build/app/testpmd -c 0xff -n 4 \
> +         -w 'net_failsafe0,mac=de:ad:be:ef:01:02,dev(84:00.0),dev(net_ring0)'
> +         -w 81:00.0 -- -i
> +
<...>

> +[Features]
> +Link status          = Y

> +MTU update           = Y
> +Promiscuous mode     = Y
> +Allmulticast mode    = Y

> +VLAN filter          = Y
> +Packet type parsing  = Y

I am not sure how to document some of these features, because they
depends on sub-device capability. I guess if sub-device doesn't support
packet type parsing, this feature won't be supported?

> +Basic stats          = Y

> +Stats per queue      = Y
> +Unicast MAC filter   = Y
> +Queue start/stop     = Y
> +Jumbo frame          = Y
> +Multicast MAC filter = Y

Is above ones supported by PMD, I don't see them unless I miss something.

+ "Flow Control" seems supported.

<...>

> +# This lib depends upon:
> +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += lib/librte_eal
> +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += lib/librte_ether
> +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += lib/librte_kvargs
> +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += lib/librte_mbuf

DEPDIRS-y no more used, can be removed.

> +
> +# Basic CFLAGS:
> +CFLAGS += -std=gnu99 -Wall -Wextra

-Wall should be coming from $(WERROR_FLAGS), no need to add here.

And are you sure about gnu99, mlx drivers tends to enforce a standard
and they updated to c11, do you want to do same here.

> +CFLAGS += -O3
> +CFLAGS += -I.
> +CFLAGS += -D_DEFAULT_SOURCE
> +CFLAGS += -D_XOPEN_SOURCE=700

Is there a reason for these variables, or are these copy-paste?

> +CFLAGS += $(WERROR_FLAGS)
> +CFLAGS += -Wno-strict-prototypes
> +CFLAGS += -pedantic -DPEDANTIC

Again, just question, is pedantic mode intentional, or copy-paste?

<...>

> +static int
> +fs_eth_dev_create(struct rte_vdev_device *vdev)
> +{
> +	struct rte_eth_dev *dev;
> +	struct ether_addr *mac;
> +	struct fs_priv *priv;
> +	struct sub_device *sdev;
> +	const char *params;
> +	unsigned int socket_id;
> +	uint8_t i;
> +	int ret;
> +
> +	dev = NULL;
> +	priv = NULL;
> +	params = rte_vdev_device_args(vdev);
> +	socket_id = rte_socket_id();
> +	INFO("Creating fail-safe device on NUMA socket %u",
> +	     socket_id);

No line break required.

> +	dev = rte_eth_vdev_allocate(vdev, sizeof(*priv));
> +	if (dev == NULL) {
> +		ERROR("Unable to allocate rte_eth_dev");
> +		return -1;
> +	}
> +	priv = dev->data->dev_private;
> +	PRIV(dev)->dev = dev;

Altough this is valid, what about?

priv = PRIV(dev);
priv->dev = dev;

> +	dev->dev_ops = &failsafe_ops;
> +	TAILQ_INIT(&dev->link_intr_cbs);

Not required, rte_eth_dev_allocate() initializes this.

> +	dev->data->dev_flags = 0x0;

Not required to set zero, dev->data already memset to zero.

> +	dev->data->mac_addrs = &PRIV(dev)->mac_addrs[0];
> +	dev->data->dev_link = eth_link;
> +	PRIV(dev)->nb_mac_addr = 1;
> +	dev->rx_pkt_burst = (eth_rx_burst_t)&failsafe_rx_burst;
> +	dev->tx_pkt_burst = (eth_tx_burst_t)&failsafe_tx_burst;
> +	if (params == NULL) {

I would prefer input control as first thing in the function, before
allocating the device.

> +		ERROR("This PMD requires sub-devices, none provided");
> +		goto free_dev;
> +	}
> +	ret = fs_sub_device_create(dev, params);

This function looks like just allocates memory for sub devices, does it
make sense to rename it as fs_sub_device_alloc()?

> +	if (ret) {
> +		ERROR("Could not allocate sub_devices");
> +		goto free_dev;
> +	}
<...>

> +free_args:
> +	failsafe_args_free(dev);
> +free_subs:
> +	fs_sub_device_free(dev);
> +free_dev:
> +	rte_eth_dev_release_port(dev);

Device private data should be freed.

> +	return -1;
> +}
<...>

> +static int
> +rte_pmd_failsafe_probe(struct rte_vdev_device *vdev)
> +{
> +	const char *name;
> +
> +	name = rte_vdev_device_name(vdev);
> +	if (vdev == NULL)
> +		return -EINVAL;

I think you don't need this check, if name is NULL, probe shouldn't be
called, same for remove().

> +	INFO("Initializing " FAILSAFE_DRIVER_NAME " for %s",
> +			name);

Line break not required.

<...>

> +RTE_PMD_REGISTER_VDEV(net_failsafe, failsafe_drv);
> +RTE_PMD_REGISTER_ALIAS(net_failsafe, eth_failsafe);

I belive alias is not required for new PMDs, this is for backward
compability for old drivers.


<...>

> +int
> +failsafe_args_parse(struct rte_eth_dev *dev, const char *params)
> +{
> +	struct fs_priv *priv;
> +	char mut_params[DEVARGS_MAXLEN] = "";

Out of curiosity, what does "mut" stands for?

> +	struct rte_kvargs *kvlist = NULL;
> +	unsigned int arg_count;
> +	size_t n;
> +	int ret;
> +
> +	if (dev == NULL || params == NULL)
> +		return -EINVAL;

This check looks like redundant.

> +	priv = PRIV(dev);
> +	ret = 0;
> +	priv->subs_tx = FAILSAFE_MAX_ETHPORTS;
> +	/* default parameters */
> +	mac_from_arg = 0;

This is global value, I believe it is better to set default value where
variable defined with a comment. Here is easy to miss the default value.

> +	n = snprintf(mut_params, sizeof(mut_params), "%s", params);
> +	if (n >= sizeof(mut_params)) {
> +		ERROR("Parameter string too long (>=%zu)",
> +				sizeof(mut_params));
> +		return -ENOMEM;
> +	}
> +	ret = fs_parse_sub_devices(fs_parse_device_param,
> +				   dev, params);

Why the device argument is not defined as dev=xxx, instead of current
dev(xxx).

"dev=xxx" will be compatible with rest of the argument usage, and it
will be possible to use kvargs to parse it, which will make this code
simpler I believe.

What is the reason of using different syntax?

> +	if (ret < 0)
> +		return ret;
> +	ret = fs_remove_sub_devices_definition(mut_params);
> +	if (ret < 0)
> +		return ret;
> +	if (strnlen(mut_params, sizeof(mut_params)) > 0) {
> +		kvlist = rte_kvargs_parse(mut_params,
> +				pmd_failsafe_init_parameters);
> +		if (kvlist == NULL) {
> +			ERROR("Error parsing parameters, usage:\n"
> +				PMD_FAILSAFE_PARAM_STRING);
> +			return -1;
> +		}
> +		/* MAC addr */
> +		arg_count = rte_kvargs_count(kvlist,
> +				PMD_FAILSAFE_MAC_KVARG);
> +		if (arg_count == 1) {
> +			ret = rte_kvargs_process(kvlist,
> +					PMD_FAILSAFE_MAC_KVARG,
> +					&fs_get_mac_addr_arg,
> +					&dev->data->mac_addrs[0]);
> +			if (ret < 0)
> +				goto free_kvlist;
> +			mac_from_arg = 1;
> +		}

Is ignoring the case mac defined more than once intentional?

> +	}
> +free_kvlist:
> +	rte_kvargs_free(kvlist);
> +	return ret;
> +}
> +

<...>

> +static int
> +fs_count_device(struct rte_eth_dev *dev, const char *param,
> +		uint8_t head __rte_unused)
> +{
> +	size_t b = 0;
> +
> +	while  (param[b] != '(' &&
> +		param[b] != '\0')
> +		b++;
> +	if (strncmp(param, "dev", b) &&
> +	    strncmp(param, "exec", b)) {

I believe param "exec" will be introduced in further patches?

> +		ERROR("Unrecognized device type: %.*s", (int)b, param);
> +		return -EINVAL;
> +	}
> +	PRIV(dev)->subs_tail += 1;
> +	return 0;
> +}
> +

<...>

> +static int
> +fs_bus_init(struct rte_eth_dev *dev)
> +{
> +	struct sub_device *sdev;
> +	struct rte_devargs *da;
> +	uint8_t i;
> +	int ret;
> +
> +	FOREACH_SUBDEV(sdev, i, dev) {

Can FOREACH_SUBDEV_ST(..., DEV_PARSED) be used here?

And what do you think renaming "FOREACH_SUBDEV_ST" to
"FOREACH_SUBDEV_STATE"?

> +		if (sdev->state != DEV_PARSED)
> +			continue;
> +		da = &sdev->devargs;
> +		ret = rte_eal_hotplug_add(da->bus->name,
> +					  da->name,
> +					  da->args);

<...>

> +	/*
> +	 * We only update TX_SUBDEV if we are not started.
> +	 * If a sub_device is emitting, we will switch the TX_SUBDEV to the
> +	 * preferred port only upon starting it, so that the switch is smoother.
> +	 */
> +	if (PREFERRED_SUBDEV(dev)->state >= DEV_PROBED) 

Can you please document concept of the "prefered sub device" in
documentation?

> +		if (TX_SUBDEV(dev) != PREFERRED_SUBDEV(dev) &&
> +		    (TX_SUBDEV(dev) == NULL ||
> +		     (TX_SUBDEV(dev) && TX_SUBDEV(dev)->state < DEV_STARTED))) {
> +			DEBUG("Switching tx_dev to preferred sub_device");
> +			PRIV(dev)->subs_tx = 0;
> +		}

<...>

> +static struct rte_eth_dev_info default_infos = {
> +	.driver_name = pmd_failsafe_driver_name,

This should be dev->device->driver->name, but already overwriiten by
rte_eth_dev_info_get() so you can drop this.

> +	/* Max possible number of elements */

<...>

> +	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
> +		DEBUG("Closing sub_device %d", i);
> +		rte_eth_dev_close(PORT_ID(sdev));
> +		sdev->state = DEV_ACTIVE - 1;

Should it be better to set state to DEV_PROBED? Instead of calculation.

> +	}
> +	fs_dev_free_queues(dev);
> +}
> +

<...>

> +static void
> +fs_stats_get(struct rte_eth_dev *dev,
> +	     struct rte_eth_stats *stats)
> +{
> +	memset(stats, 0, sizeof(*stats));

memset not required, done by API

> +	if (TX_SUBDEV(dev) == NULL)
> +		return;
> +	rte_eth_stats_get(PORT_ID(TX_SUBDEV(dev)), stats);
> +}
> +

<...>

> +		sdev = TX_SUBDEV(dev);
> +		rte_eth_dev_info_get(PORT_ID(sdev), &PRIV(dev)->infos);
> +		PRIV(dev)->infos.rx_offload_capa = rx_offload_capa;

Is intention &= ?

> +		PRIV(dev)->infos.tx_offload_capa &=
> +					default_infos.tx_offload_capa;
> +		PRIV(dev)->infos.flow_type_rss_offloads &=
> +					default_infos.flow_type_rss_offloads;
> +	}
> +	rte_memcpy(infos, &PRIV(dev)->infos, sizeof(*infos));
> +}

<...>

> +	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
> +		DEBUG("Calling rte_eth_dev_set_mtu on sub_device %d", i);
> +		ret = rte_eth_dev_set_mtu(PORT_ID(sdev), mtu);
> +		if (ret) {
> +			ERROR("Operation rte_eth_dev_set_mtu failed for sub_device %d"
> +			      " with error %d", i, ret);

You can prefer to not break the log message.

> +			return ret;
> +		}
> +	}
> +	return 0;
> +}

<...>

> +
> +#define FAILSAFE_PLUGIN_DEFAULT_TIMEOUT_MS 2000

Is this related to next patches in the set?

<...>

> +enum dev_state {
> +	DEV_UNDEFINED = 0,

Setting value not required.

> +	DEV_PARSED,
> +	DEV_PROBED,
> +	DEV_ACTIVE,
> +	DEV_STARTED,
> +};

<...>



More information about the dev mailing list