[dpdk-dev] [PATCH v10 03/11] net/failsafe: add fail-safe PMD
Ferruh Yigit
ferruh.yigit at intel.com
Mon Jul 17 15:56:54 CEST 2017
On 7/15/2017 6:57 PM, Gaetan Rivet wrote:
> Introduce the fail-safe poll mode driver initialization and enable its
> build infrastructure.
>
> This PMD allows for applications to benefit from true hot-plugging
> support without having to implement it.
>
> It intercepts and manages Ethernet device removal events issued by
> slave PMDs and re-initializes them transparently when brought back.
> It also allows defining a contingency to the removal of a device, by
> designating a fail-over device that will take on transmitting operations
> if the preferred device is removed.
>
> Applications only see a fail-safe instance, without caring for
> underlying activity ensuring their continued operations.
All PMD in a single patch is hard to review, I am sure some details
missed during the review, but taking account the histroy of the PMD I
accept this as it is, but I will rely on your support to fix issues in
the future.
>
> Signed-off-by: Gaetan Rivet <gaetan.rivet at 6wind.com>
> Acked-by: Olga Shern <olgas at mellanox.com>
<...>
> +Usage example
> +~~~~~~~~~~~~~
> +
> +This section shows some example of using **testpmd** with a fail-safe PMD.
> +
> +#. Request huge pages:
> +
> + .. code-block:: console
> +
> + echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
I think this is extra for usage sample, if you want there is generic
guide [1] that you ca reference.
[1]
http://dpdk.org/doc/guides/nics/build_and_test.html
> +
> +#. Start testpmd. The slave device should be blacklisted from normal EAL
> + operations to avoid probing it twice when in PCI blacklist mode.
> +
> + .. code-block:: console
> +
> + $RTE_TARGET/build/app/testpmd -c 0xff -n 4 \
> + -w 'net_failsafe0,mac=de:ad:be:ef:01:02,dev(84:00.0),dev(net_ring0)'
> + -b 84:00.0 -b 00:04.0 -- -i
Do you think does it make sense to stress sub-device shouldn't be probed
by EAL, I believe it is not clear above.
> +
> + Note that PCI blacklist mode is the default PCI operating mode. In this
> + configuration, the fail-safe cannot proceed with its slaves if they have
> + been probed beforehand.
> +
> +#. Alternatively, it can be used alongside any other device in whitelist mode.
> +
> + .. code-block:: console
> +
> + $RTE_TARGET/build/app/testpmd -c 0xff -n 4 \
> + -w 'net_failsafe0,mac=de:ad:be:ef:01:02,dev(84:00.0),dev(net_ring0)'
> + -w 81:00.0 -- -i
> +
<...>
> +[Features]
> +Link status = Y
> +MTU update = Y
> +Promiscuous mode = Y
> +Allmulticast mode = Y
> +VLAN filter = Y
> +Packet type parsing = Y
I am not sure how to document some of these features, because they
depends on sub-device capability. I guess if sub-device doesn't support
packet type parsing, this feature won't be supported?
> +Basic stats = Y
> +Stats per queue = Y
> +Unicast MAC filter = Y
> +Queue start/stop = Y
> +Jumbo frame = Y
> +Multicast MAC filter = Y
Is above ones supported by PMD, I don't see them unless I miss something.
+ "Flow Control" seems supported.
<...>
> +# This lib depends upon:
> +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += lib/librte_eal
> +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += lib/librte_ether
> +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += lib/librte_kvargs
> +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += lib/librte_mbuf
DEPDIRS-y no more used, can be removed.
> +
> +# Basic CFLAGS:
> +CFLAGS += -std=gnu99 -Wall -Wextra
-Wall should be coming from $(WERROR_FLAGS), no need to add here.
And are you sure about gnu99, mlx drivers tends to enforce a standard
and they updated to c11, do you want to do same here.
> +CFLAGS += -O3
> +CFLAGS += -I.
> +CFLAGS += -D_DEFAULT_SOURCE
> +CFLAGS += -D_XOPEN_SOURCE=700
Is there a reason for these variables, or are these copy-paste?
> +CFLAGS += $(WERROR_FLAGS)
> +CFLAGS += -Wno-strict-prototypes
> +CFLAGS += -pedantic -DPEDANTIC
Again, just question, is pedantic mode intentional, or copy-paste?
<...>
> +static int
> +fs_eth_dev_create(struct rte_vdev_device *vdev)
> +{
> + struct rte_eth_dev *dev;
> + struct ether_addr *mac;
> + struct fs_priv *priv;
> + struct sub_device *sdev;
> + const char *params;
> + unsigned int socket_id;
> + uint8_t i;
> + int ret;
> +
> + dev = NULL;
> + priv = NULL;
> + params = rte_vdev_device_args(vdev);
> + socket_id = rte_socket_id();
> + INFO("Creating fail-safe device on NUMA socket %u",
> + socket_id);
No line break required.
> + dev = rte_eth_vdev_allocate(vdev, sizeof(*priv));
> + if (dev == NULL) {
> + ERROR("Unable to allocate rte_eth_dev");
> + return -1;
> + }
> + priv = dev->data->dev_private;
> + PRIV(dev)->dev = dev;
Altough this is valid, what about?
priv = PRIV(dev);
priv->dev = dev;
> + dev->dev_ops = &failsafe_ops;
> + TAILQ_INIT(&dev->link_intr_cbs);
Not required, rte_eth_dev_allocate() initializes this.
> + dev->data->dev_flags = 0x0;
Not required to set zero, dev->data already memset to zero.
> + dev->data->mac_addrs = &PRIV(dev)->mac_addrs[0];
> + dev->data->dev_link = eth_link;
> + PRIV(dev)->nb_mac_addr = 1;
> + dev->rx_pkt_burst = (eth_rx_burst_t)&failsafe_rx_burst;
> + dev->tx_pkt_burst = (eth_tx_burst_t)&failsafe_tx_burst;
> + if (params == NULL) {
I would prefer input control as first thing in the function, before
allocating the device.
> + ERROR("This PMD requires sub-devices, none provided");
> + goto free_dev;
> + }
> + ret = fs_sub_device_create(dev, params);
This function looks like just allocates memory for sub devices, does it
make sense to rename it as fs_sub_device_alloc()?
> + if (ret) {
> + ERROR("Could not allocate sub_devices");
> + goto free_dev;
> + }
<...>
> +free_args:
> + failsafe_args_free(dev);
> +free_subs:
> + fs_sub_device_free(dev);
> +free_dev:
> + rte_eth_dev_release_port(dev);
Device private data should be freed.
> + return -1;
> +}
<...>
> +static int
> +rte_pmd_failsafe_probe(struct rte_vdev_device *vdev)
> +{
> + const char *name;
> +
> + name = rte_vdev_device_name(vdev);
> + if (vdev == NULL)
> + return -EINVAL;
I think you don't need this check, if name is NULL, probe shouldn't be
called, same for remove().
> + INFO("Initializing " FAILSAFE_DRIVER_NAME " for %s",
> + name);
Line break not required.
<...>
> +RTE_PMD_REGISTER_VDEV(net_failsafe, failsafe_drv);
> +RTE_PMD_REGISTER_ALIAS(net_failsafe, eth_failsafe);
I belive alias is not required for new PMDs, this is for backward
compability for old drivers.
<...>
> +int
> +failsafe_args_parse(struct rte_eth_dev *dev, const char *params)
> +{
> + struct fs_priv *priv;
> + char mut_params[DEVARGS_MAXLEN] = "";
Out of curiosity, what does "mut" stands for?
> + struct rte_kvargs *kvlist = NULL;
> + unsigned int arg_count;
> + size_t n;
> + int ret;
> +
> + if (dev == NULL || params == NULL)
> + return -EINVAL;
This check looks like redundant.
> + priv = PRIV(dev);
> + ret = 0;
> + priv->subs_tx = FAILSAFE_MAX_ETHPORTS;
> + /* default parameters */
> + mac_from_arg = 0;
This is global value, I believe it is better to set default value where
variable defined with a comment. Here is easy to miss the default value.
> + n = snprintf(mut_params, sizeof(mut_params), "%s", params);
> + if (n >= sizeof(mut_params)) {
> + ERROR("Parameter string too long (>=%zu)",
> + sizeof(mut_params));
> + return -ENOMEM;
> + }
> + ret = fs_parse_sub_devices(fs_parse_device_param,
> + dev, params);
Why the device argument is not defined as dev=xxx, instead of current
dev(xxx).
"dev=xxx" will be compatible with rest of the argument usage, and it
will be possible to use kvargs to parse it, which will make this code
simpler I believe.
What is the reason of using different syntax?
> + if (ret < 0)
> + return ret;
> + ret = fs_remove_sub_devices_definition(mut_params);
> + if (ret < 0)
> + return ret;
> + if (strnlen(mut_params, sizeof(mut_params)) > 0) {
> + kvlist = rte_kvargs_parse(mut_params,
> + pmd_failsafe_init_parameters);
> + if (kvlist == NULL) {
> + ERROR("Error parsing parameters, usage:\n"
> + PMD_FAILSAFE_PARAM_STRING);
> + return -1;
> + }
> + /* MAC addr */
> + arg_count = rte_kvargs_count(kvlist,
> + PMD_FAILSAFE_MAC_KVARG);
> + if (arg_count == 1) {
> + ret = rte_kvargs_process(kvlist,
> + PMD_FAILSAFE_MAC_KVARG,
> + &fs_get_mac_addr_arg,
> + &dev->data->mac_addrs[0]);
> + if (ret < 0)
> + goto free_kvlist;
> + mac_from_arg = 1;
> + }
Is ignoring the case mac defined more than once intentional?
> + }
> +free_kvlist:
> + rte_kvargs_free(kvlist);
> + return ret;
> +}
> +
<...>
> +static int
> +fs_count_device(struct rte_eth_dev *dev, const char *param,
> + uint8_t head __rte_unused)
> +{
> + size_t b = 0;
> +
> + while (param[b] != '(' &&
> + param[b] != '\0')
> + b++;
> + if (strncmp(param, "dev", b) &&
> + strncmp(param, "exec", b)) {
I believe param "exec" will be introduced in further patches?
> + ERROR("Unrecognized device type: %.*s", (int)b, param);
> + return -EINVAL;
> + }
> + PRIV(dev)->subs_tail += 1;
> + return 0;
> +}
> +
<...>
> +static int
> +fs_bus_init(struct rte_eth_dev *dev)
> +{
> + struct sub_device *sdev;
> + struct rte_devargs *da;
> + uint8_t i;
> + int ret;
> +
> + FOREACH_SUBDEV(sdev, i, dev) {
Can FOREACH_SUBDEV_ST(..., DEV_PARSED) be used here?
And what do you think renaming "FOREACH_SUBDEV_ST" to
"FOREACH_SUBDEV_STATE"?
> + if (sdev->state != DEV_PARSED)
> + continue;
> + da = &sdev->devargs;
> + ret = rte_eal_hotplug_add(da->bus->name,
> + da->name,
> + da->args);
<...>
> + /*
> + * We only update TX_SUBDEV if we are not started.
> + * If a sub_device is emitting, we will switch the TX_SUBDEV to the
> + * preferred port only upon starting it, so that the switch is smoother.
> + */
> + if (PREFERRED_SUBDEV(dev)->state >= DEV_PROBED)
Can you please document concept of the "prefered sub device" in
documentation?
> + if (TX_SUBDEV(dev) != PREFERRED_SUBDEV(dev) &&
> + (TX_SUBDEV(dev) == NULL ||
> + (TX_SUBDEV(dev) && TX_SUBDEV(dev)->state < DEV_STARTED))) {
> + DEBUG("Switching tx_dev to preferred sub_device");
> + PRIV(dev)->subs_tx = 0;
> + }
<...>
> +static struct rte_eth_dev_info default_infos = {
> + .driver_name = pmd_failsafe_driver_name,
This should be dev->device->driver->name, but already overwriiten by
rte_eth_dev_info_get() so you can drop this.
> + /* Max possible number of elements */
<...>
> + FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
> + DEBUG("Closing sub_device %d", i);
> + rte_eth_dev_close(PORT_ID(sdev));
> + sdev->state = DEV_ACTIVE - 1;
Should it be better to set state to DEV_PROBED? Instead of calculation.
> + }
> + fs_dev_free_queues(dev);
> +}
> +
<...>
> +static void
> +fs_stats_get(struct rte_eth_dev *dev,
> + struct rte_eth_stats *stats)
> +{
> + memset(stats, 0, sizeof(*stats));
memset not required, done by API
> + if (TX_SUBDEV(dev) == NULL)
> + return;
> + rte_eth_stats_get(PORT_ID(TX_SUBDEV(dev)), stats);
> +}
> +
<...>
> + sdev = TX_SUBDEV(dev);
> + rte_eth_dev_info_get(PORT_ID(sdev), &PRIV(dev)->infos);
> + PRIV(dev)->infos.rx_offload_capa = rx_offload_capa;
Is intention &= ?
> + PRIV(dev)->infos.tx_offload_capa &=
> + default_infos.tx_offload_capa;
> + PRIV(dev)->infos.flow_type_rss_offloads &=
> + default_infos.flow_type_rss_offloads;
> + }
> + rte_memcpy(infos, &PRIV(dev)->infos, sizeof(*infos));
> +}
<...>
> + FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
> + DEBUG("Calling rte_eth_dev_set_mtu on sub_device %d", i);
> + ret = rte_eth_dev_set_mtu(PORT_ID(sdev), mtu);
> + if (ret) {
> + ERROR("Operation rte_eth_dev_set_mtu failed for sub_device %d"
> + " with error %d", i, ret);
You can prefer to not break the log message.
> + return ret;
> + }
> + }
> + return 0;
> +}
<...>
> +
> +#define FAILSAFE_PLUGIN_DEFAULT_TIMEOUT_MS 2000
Is this related to next patches in the set?
<...>
> +enum dev_state {
> + DEV_UNDEFINED = 0,
Setting value not required.
> + DEV_PARSED,
> + DEV_PROBED,
> + DEV_ACTIVE,
> + DEV_STARTED,
> +};
<...>
More information about the dev
mailing list