[dpdk-dev] [PATCH v2 3/3] net/ifcvf: add ifcvf driver

Ferruh Yigit ferruh.yigit at intel.com
Thu Mar 22 09:51:20 CET 2018


On 3/21/2018 1:21 PM, Xiao Wang wrote:
> ifcvf driver uses vdev as a control domain to manage ifc VFs that belong
> to it. It registers vDPA device ops to vhost lib to enable these VFs to be
> used as vhost data path accelerator.
> 
> Live migration feature is supported by ifc VF and this driver enables
> it based on vhost lib.
> 
> Because vDPA driver needs to set up MSI-X vector to interrupt the guest,
> only vfio-pci is supported currently.
> 
> Signed-off-by: Xiao Wang <xiao.w.wang at intel.com>
> Signed-off-by: Rosen Xu <rosen.xu at intel.com>
> ---
> v2:
> - Rebase on Zhihong's vDPA v3 patch set.
> ---
>  config/common_base                      |    6 +
>  config/common_linuxapp                  |    1 +
>  drivers/net/Makefile                    |    1 +
>  drivers/net/ifcvf/Makefile              |   40 +
>  drivers/net/ifcvf/base/ifcvf.c          |  329 ++++++++
>  drivers/net/ifcvf/base/ifcvf.h          |  156 ++++
>  drivers/net/ifcvf/base/ifcvf_osdep.h    |   52 ++
>  drivers/net/ifcvf/ifcvf_ethdev.c        | 1240 +++++++++++++++++++++++++++++++
>  drivers/net/ifcvf/rte_ifcvf_version.map |    4 +
>  mk/rte.app.mk                           |    1 +

Need .ini file to represent driver features.
Also it is good to add driver documentation and a note into release note to
announce new driver.

>  10 files changed, 1830 insertions(+)
>  create mode 100644 drivers/net/ifcvf/Makefile
>  create mode 100644 drivers/net/ifcvf/base/ifcvf.c
>  create mode 100644 drivers/net/ifcvf/base/ifcvf.h
>  create mode 100644 drivers/net/ifcvf/base/ifcvf_osdep.h
>  create mode 100644 drivers/net/ifcvf/ifcvf_ethdev.c
>  create mode 100644 drivers/net/ifcvf/rte_ifcvf_version.map
> 
> diff --git a/config/common_base b/config/common_base
> index ad03cf433..06fce1ebf 100644
> --- a/config/common_base
> +++ b/config/common_base
> @@ -791,6 +791,12 @@ CONFIG_RTE_LIBRTE_VHOST_DEBUG=n
>  #
>  CONFIG_RTE_LIBRTE_PMD_VHOST=n
>  
> +#
> +# Compile IFCVF driver
> +# To compile, CONFIG_RTE_LIBRTE_VHOST should be enabled.
> +#
> +CONFIG_RTE_LIBRTE_IFCVF=n
> +
>  #
>  # Compile the test application
>  #
> diff --git a/config/common_linuxapp b/config/common_linuxapp
> index ff98f2355..358d00468 100644
> --- a/config/common_linuxapp
> +++ b/config/common_linuxapp
> @@ -15,6 +15,7 @@ CONFIG_RTE_LIBRTE_PMD_KNI=y
>  CONFIG_RTE_LIBRTE_VHOST=y
>  CONFIG_RTE_LIBRTE_VHOST_NUMA=y
>  CONFIG_RTE_LIBRTE_PMD_VHOST=y
> +CONFIG_RTE_LIBRTE_IFCVF=y

Current syntax for PMD config options:
Virtual ones: CONFIG_RTE_LIBRTE_PMD_XXX
Physical ones: CONFIG_RTE_LIBRTE_XXX_PMD

Virtual / Physical difference most probably not done intentionally but that is
what it is right now.

Is "PMD" not added intentionally to the config option?

And what is the config time dependency of the driver, I assume VHOST is one of
them but are there more?

>  CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y
>  CONFIG_RTE_LIBRTE_PMD_TAP=y
>  CONFIG_RTE_LIBRTE_AVP_PMD=y
> diff --git a/drivers/net/Makefile b/drivers/net/Makefile
> index e1127326b..496acf2d2 100644
> --- a/drivers/net/Makefile
> +++ b/drivers/net/Makefile
> @@ -53,6 +53,7 @@ endif # $(CONFIG_RTE_LIBRTE_SCHED)
>  
>  ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
>  DIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += vhost
> +DIRS-$(CONFIG_RTE_LIBRTE_IFCVF) += ifcvf

Since this is mainly vpda driver, does it make sense to put it under
drivers/net/virtio/vpda/ifcvf

When there are more vpda driver they can go into drivers/net/virtio/vpda/*

Combining with below not registering ethdev comment, virtual driver can register
itself as vpda_ifcvf:
RTE_PMD_REGISTER_VDEV(vpda_ifcvf, ifcvf_drv);

>  endif # $(CONFIG_RTE_LIBRTE_VHOST)
>  
>  ifeq ($(CONFIG_RTE_LIBRTE_MRVL_PMD),y)
> diff --git a/drivers/net/ifcvf/Makefile b/drivers/net/ifcvf/Makefile
> new file mode 100644
> index 000000000..f3670cdf2
> --- /dev/null
> +++ b/drivers/net/ifcvf/Makefile
> @@ -0,0 +1,40 @@
> +# SPDX-License-Identifier: BSD-3-Clause
> +# Copyright(c) 2018 Intel Corporation
> +
> +include $(RTE_SDK)/mk/rte.vars.mk
> +
> +#
> +# library name
> +#
> +LIB = librte_ifcvf.a
> +
> +LDLIBS += -lpthread
> +LDLIBS += -lrte_eal -lrte_mempool -lrte_pci
> +LDLIBS += -lrte_ethdev -lrte_net -lrte_kvargs -lrte_vhost
> +LDLIBS += -lrte_bus_vdev -lrte_bus_pci
> +
> +CFLAGS += -O3
> +CFLAGS += $(WERROR_FLAGS)
> +CFLAGS += -I$(RTE_SDK)/lib/librte_eal/linuxapp/eal
> +CFLAGS += -I$(RTE_SDK)/drivers/bus/pci/linux
> +CFLAGS += -DALLOW_EXPERIMENTAL_API
> +
> +#
> +# Add extra flags for base driver source files to disable warnings in them
> +#
> +BASE_DRIVER_OBJS=$(sort $(patsubst %.c,%.o,$(notdir $(wildcard $(SRCDIR)/base/*.c))))
> +$(foreach obj, $(BASE_DRIVER_OBJS), $(eval CFLAGS_$(obj)+=$(CFLAGS_BASE_DRIVER)))

It seems no CFLAGS_BASE_DRIVER defined yet, above lines can be removed for now.

> +
> +VPATH += $(SRCDIR)/base
> +
> +EXPORT_MAP := rte_ifcvf_version.map
> +
> +LIBABIVER := 1
> +
> +#
> +# all source are stored in SRCS-y
> +#
> +SRCS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += ifcvf_ethdev.c
> +SRCS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += ifcvf.c

Is it intentionally used "RTE_LIBRTE_PMD_VHOST" because of dependency or typo?

> +
> +include $(RTE_SDK)/mk/rte.lib.mk
<...>

> +static int
> +eth_dev_ifcvf_create(struct rte_vdev_device *dev,
> +		struct rte_pci_addr *pci_addr, int devices)
> +{
> +	const char *name = rte_vdev_device_name(dev);
> +	struct rte_eth_dev *eth_dev = NULL;
> +	struct ether_addr *eth_addr = NULL;
> +	struct ifcvf_internal *internal = NULL;
> +	struct internal_list *list = NULL;
> +	struct rte_eth_dev_data *data = NULL;
> +	struct rte_pci_addr pf_addr = *pci_addr;
> +	int i;
> +
> +	list = rte_zmalloc_socket(name, sizeof(*list), 0,
> +			dev->device.numa_node);
> +	if (list == NULL)
> +		goto error;
> +
> +	/* reserve an ethdev entry */
> +	eth_dev = rte_eth_vdev_allocate(dev, sizeof(*internal));

Is this eth_dev used at all? It looks like it is only used for its private data,
if so can it be possible to use something like:

struct ifdev {
	void *private;
	struct rte_device *dev;
}

allocate memory for private and add this struct to the list, this may save
ethdev overhead.

But I can see dev_start() and dev_stop() are used, not sure if they are the
reason of the ethdev.

> +	if (eth_dev == NULL)
> +		goto error;
> +
> +	eth_addr = rte_zmalloc_socket(name, sizeof(*eth_addr), 0,
> +			dev->device.numa_node);
> +	if (eth_addr == NULL)
> +		goto error;
> +
> +	*eth_addr = base_eth_addr;
> +	eth_addr->addr_bytes[5] = eth_dev->data->port_id;
> +
> +	internal = eth_dev->data->dev_private;
> +	internal->dev_name = strdup(name);

Need to free this later and on error paths

> +	if (internal->dev_name == NULL)
> +		goto error;
> +
> +	internal->eng_addr.pci_addr = *pci_addr;
> +	for (i = 0; i < devices; i++) {
> +		pf_addr.domain = pci_addr->domain;
> +		pf_addr.bus = pci_addr->bus;
> +		pf_addr.devid = pci_addr->devid + (i + 1) / 8;
> +		pf_addr.function = pci_addr->function + (i + 1) % 8;
> +		internal->vf_info[i].pdev.addr = pf_addr;
> +		rte_spinlock_init(&internal->vf_info[i].lock);
> +	}
> +	internal->max_devices = devices;

is it max_devices or number of devices?

<...>

> +/*
> + * If this vdev is created by user, then ifcvf will be taken by

created by user?

> + * this vdev.
> + */
> +static int
> +ifcvf_take_over(struct rte_pci_addr *pci_addr, int num)
> +{
> +	uint16_t port_id;
> +	int i, ret;
> +	char devname[RTE_DEV_NAME_MAX_LEN];
> +	struct rte_pci_addr vf_addr = *pci_addr;
> +
> +	for (i = 0; i < num; i++) {
> +		vf_addr.function += i % 8;
> +		vf_addr.devid += i / 8;
> +		rte_pci_device_name(&vf_addr, devname, RTE_DEV_NAME_MAX_LEN);
> +		ret = rte_eth_dev_get_port_by_name(devname, &port_id);

Who probed this device at first place?

> +		if (ret == 0) {
> +			rte_eth_dev_close(port_id);
> +			if (rte_eth_dev_detach(port_id, devname) < 0)

This will call the driver remov() also will remove device from device list, is
it OK?

> +				return -1;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +static int
> +rte_ifcvf_probe(struct rte_vdev_device *dev)
> +{
> +	struct rte_kvargs *kvlist = NULL;
> +	int ret = 0;
> +	struct rte_pci_addr pci_addr;
> +	int devices;

devices can't be negative, and according open_int() it is uint16_t, it is
possible to pick an unsigned storage type for it.

<...>

> +static int
> +rte_ifcvf_remove(struct rte_vdev_device *dev)
> +{
> +	const char *name;
> +	struct rte_eth_dev *eth_dev = NULL;
> +
> +	name = rte_vdev_device_name(dev);
> +	RTE_LOG(INFO, PMD, "Un-Initializing ifcvf for %s\n", name);
> +
> +	/* find an ethdev entry */
> +	eth_dev = rte_eth_dev_allocated(name);
> +	if (eth_dev == NULL)
> +		return -ENODEV;
> +
> +	eth_dev_close(eth_dev);
> +	rte_free(eth_dev->data);
> +	rte_eth_dev_release_port(eth_dev);

This does memset(ethdev->data, ..), so should be called before rte_free(data)

> +
> +	return 0;
> +}
> +
> +static struct rte_vdev_driver ifcvf_drv = {
> +	.probe = rte_ifcvf_probe,
> +	.remove = rte_ifcvf_remove,
> +};
> +
> +RTE_PMD_REGISTER_VDEV(net_ifcvf, ifcvf_drv);
> +RTE_PMD_REGISTER_ALIAS(net_ifcvf, eth_ifcvf);

Alias for backport support, not needed for new drivers.

> +RTE_PMD_REGISTER_PARAM_STRING(net_ifcvf,
> +	"bdf=<bdf> "
> +	"devices=<int>");

Above says:
  #define ETH_IFCVF_DEVICES_ARG	"int"

Is argument "int" or "devices"? Using macro here helps preventing errors.

> diff --git a/drivers/net/ifcvf/rte_ifcvf_version.map b/drivers/net/ifcvf/rte_ifcvf_version.map
> new file mode 100644
> index 000000000..33d237913
> --- /dev/null
> +++ b/drivers/net/ifcvf/rte_ifcvf_version.map
> @@ -0,0 +1,4 @@
> +EXPERIMENTAL {

Please put release version here.

<...>


More information about the dev mailing list