[dpdk-dev] [PATCH v4] net/failsafe: add Rx interrupts

Gaëtan Rivet gaetan.rivet at 6wind.com
Fri Jan 19 15:11:52 CET 2018


Hi Moti,

This patch is pretty big. It would have helped review to have it divided
in smaller patches.

Overall, I wholly support adding Rx interrupt support, and I think it is
interesting to have done it using rte_service.

I am entirely unfamiliar with rte_service however, so I will take your
word for it that it does work as intended, and hope that you will help
fix issues if some are found afterward. I will only comment on logic and
coding style.

On Fri, Jan 19, 2018 at 11:32:24AM +0200, Moti Haimovsky wrote:
> This patch adds support for registering and waiting for Rx
> interrupts in failsafe PMD. This allows applications to wait
> for Rx events from the PMD using the DPDK rte_epoll subsystem.
> The failsafe PMD presents to the application a facade of a single
> device to be handled by the application while internally it manages
> several devices on behalf of the application including packets
> transmission and reception.
> The Proposed failsafe Rx interrupt scheme follows this approach.
> The failsafe PMD will present the application with a single set of Rx
> interrupt vectors representing the failsafe Rx queues, while internally
> it will serve as an interrupt proxy for its subdevices.
> This will allow applications to wait for Rx traffic from the failsafe
> PMD by registering and waiting for Rx events from its Rx queues.
> In order to support this the following is suggested:
>   * Every Rx queue in the failsafe (virtual) device will be assigned a
>     Linux event file descriptor (efd) and an enable_interrupts flag.
>   * The failsafe PMD will fill in its rte_intr_handle structure with
>     the Rx efds assigned previously and register them with the EAL.
>   * The failsafe driver will create a private epoll fd (epfd) and will
>     allocate enough space to handle all the Rx events from all its
>     subdevices.
>   * Acting as an application,
>     for each Rx queue in each active subdevice the failsafe will:
>       o Register the Rx queue with the EAL.
>       o Pass the EAL the failsafe private epoll fd as the epfd to
>         register the Rx queue event on.
>       o Pass the EAL, as a parameter, the pointer to the failsafe Rx
>         queue that handles this Rx queue.
>       o Using the DPDK service callbacks, the failsafe PMD will launch
>         an Rx proxy service that will Wait on the epoll fd for Rx events
>         from the sub-devices.
>       o For each Rx event received the proxy service will
>          - Retrieve the pointer to failsafe Rx queue that handles this
>            subdevice Rx queue from the user info returned by the EAL.
>          - Trigger a failsafe Rx event on that queue by writing to the
>            event fd unless interrupts are disabled for that queue.
>   * The failsafe pmd will also implement the rx_queue_intr_enable and
>     rx_queue_intr_disable routines that will enable and disable Rx
>     interrupts respectively on both on the failsafe and its subdevices.
> 

Were you able to count the latency introduced by the proxy?
At normal rates of reception (~9Mpps single core 10Gbps port for
example), do we lose packets by using rx interrupts (with or without the
fail-safe in-between).

> Signed-off-by: Moti Haimovsky <motih at mellanox.com>
> 
> Conflicts:
>         drivers/net/failsafe/failsafe_ops.c
>         drivers/net/failsafe/failsafe_private.h

These lines should be removed from the commitlog.

> ---
> V4:
> Fixed merge conflicts gound during integration with othe falsafe patches
> (See cover letter).
> 
> V3:
> Fixed build failures in FreeBSD10.3_64
> 
> V2:
> Modifications according to inputs from Stephen Hemminger:
> * Removed unneeded (void *) casting.
> Fixed coding style warning.
> ---
> 
>  doc/guides/nics/features/failsafe.ini   |   1 +
>  drivers/net/failsafe/Makefile           |   1 +
>  drivers/net/failsafe/failsafe.c         |   4 +
>  drivers/net/failsafe/failsafe_ether.c   |   1 +
>  drivers/net/failsafe/failsafe_intr.c    | 597 ++++++++++++++++++++++++++++++++
>  drivers/net/failsafe/failsafe_ops.c     |  28 ++
>  drivers/net/failsafe/failsafe_private.h |  44 +++
>  7 files changed, 676 insertions(+)
>  create mode 100644 drivers/net/failsafe/failsafe_intr.c
> 
> diff --git a/doc/guides/nics/features/failsafe.ini b/doc/guides/nics/features/failsafe.ini
> index a42e344..39ee579 100644
> --- a/doc/guides/nics/features/failsafe.ini
> +++ b/doc/guides/nics/features/failsafe.ini
> @@ -6,6 +6,7 @@
>  [Features]
>  Link status          = Y
>  Link status event    = Y
> +Rx interrupt         = Y
>  MTU update           = Y
>  Jumbo frame          = Y
>  Promiscuous mode     = Y
> diff --git a/drivers/net/failsafe/Makefile b/drivers/net/failsafe/Makefile
> index ea2a8fe..91a734b 100644
> --- a/drivers/net/failsafe/Makefile
> +++ b/drivers/net/failsafe/Makefile
> @@ -46,6 +46,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += failsafe_ops.c
>  SRCS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += failsafe_rxtx.c
>  SRCS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += failsafe_ether.c
>  SRCS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += failsafe_flow.c
> +SRCS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += failsafe_intr.c
>  
>  # No exported include files
>  
> diff --git a/drivers/net/failsafe/failsafe.c b/drivers/net/failsafe/failsafe.c
> index a1e1c7a..621944f 100644
> --- a/drivers/net/failsafe/failsafe.c
> +++ b/drivers/net/failsafe/failsafe.c
> @@ -251,6 +251,10 @@
>                  mac->addr_bytes[2], mac->addr_bytes[3],
>                  mac->addr_bytes[4], mac->addr_bytes[5]);
>          dev->data->dev_flags |= RTE_ETH_DEV_INTR_LSC;
> +        PRIV(dev)->intr_handle = (struct rte_intr_handle){
> +                .fd = -1,
> +                .type = RTE_INTR_HANDLE_EXT,
> +        };
>          return 0;
>  free_args:
>          failsafe_args_free(dev);
> diff --git a/drivers/net/failsafe/failsafe_ether.c b/drivers/net/failsafe/failsafe_ether.c
> index e9b0cfe..643f3d6 100644
> --- a/drivers/net/failsafe/failsafe_ether.c
> +++ b/drivers/net/failsafe/failsafe_ether.c
> @@ -283,6 +283,7 @@
>                  return;
>          switch (sdev->state) {
>          case DEV_STARTED:
> +                failsafe_rx_intr_uninstall_subdevice(sdev);
>                  rte_eth_dev_stop(PORT_ID(sdev));
>                  sdev->state = DEV_ACTIVE;
>                  /* fallthrough */
> diff --git a/drivers/net/failsafe/failsafe_intr.c b/drivers/net/failsafe/failsafe_intr.c
> new file mode 100644
> index 0000000..4d42810
> --- /dev/null
> +++ b/drivers/net/failsafe/failsafe_intr.c
> @@ -0,0 +1,597 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright 2017 Mellanox
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of the copyright holder nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +/**
> + * @file
> + * Interrupts handling for failsafe driver.
> + */
> +
> +#include <sys/epoll.h>
> +#include <unistd.h>
> +
> +#include <rte_alarm.h>
> +#include <rte_config.h>
> +#include <rte_errno.h>
> +#include <rte_ethdev.h>
> +#include <rte_interrupts.h>
> +#include <rte_io.h>
> +#include <rte_service_component.h>
> +
> +#include "failsafe_private.h"
> +
> +#define NUM_RX_PROXIES (FAILSAFE_MAX_ETHPORTS * RTE_MAX_RXTX_INTR_VEC_ID)
> +
> +/**
> + * Install failsafe Rx event proxy service.
> + * The Rx event proxy is the service that listens to Rx events from the
> + * subdevices and triggers failsafe Rx events accordingly.
> + *
> + * @param priv
> + *   Pointer to failsafe private structure.
> + * @return
> + *   0 on success, negative errno value otherwise.
> + */
> +static int
> +fs_rx_event_proxy_service(void *data)
> +{
> +        struct fs_priv *priv = data;
> +        struct rxq *rxq;
> +        struct rte_epoll_event *events = priv->rxp.evec;
> +        uint64_t u64 = 1;
> +        int i, n, rc = 0;
> +
> +        n = rte_epoll_wait(priv->rxp.efd, events, NUM_RX_PROXIES, -1);
> +        for (i = 0; i < n; i++) {
> +                rxq = events[i].epdata.data;
> +                if (rxq->enable_events && rxq->event_fd != -1) {
> +                        if (write(rxq->event_fd, &u64, sizeof(u64)) !=
> +                            sizeof(u64)) {
> +                                ERROR("failed to proxy Rx event to socket %d",

Failed should be capitalized.

> +                                       rxq->event_fd);
> +                                rc = -EIO;
> +                        }
> +                }
> +        }
> +        return rc;
> +}
> +
> +/**
> + * Uninstall failsafe Rx event proxy service.
> + *
> + * @param priv
> + *   Pointer to failsafe private structure.
> + */
> +static void
> +fs_rx_event_proxy_service_uninstall(struct fs_priv *priv)
> +{
> +        /* Unregister the event service. */
> +        switch (priv->rxp.sstate) {
> +        case SS_RUNNING:
> +                rte_service_map_lcore_set(priv->rxp.sid, priv->rxp.scid, 0);
> +                /* fall through */
> +        case SS_READY:
> +                rte_service_runstate_set(priv->rxp.sid, 0);
> +                rte_service_set_stats_enable(priv->rxp.sid, 0);
> +                rte_service_component_runstate_set(priv->rxp.sid, 0);
> +                /* fall through */
> +        case SS_REGISTERED:
> +                rte_service_component_unregister(priv->rxp.sid);
> +                /* fall through */
> +        default:
> +                break;
> +        }
> +}
> +
> +/**
> + * Install the failsafe Rx event proxy service.
> + *
> + * @param priv
> + *   Pointer to failsafe private structure.
> + * @return
> + *   0 on success, negative errno value otherwise.
> + */
> +static int
> +fs_rx_event_proxy_service_install(struct fs_priv *priv)
> +{
> +        struct rte_service_spec service;
> +        int32_t num_service_cores = rte_service_lcore_count();
> +        int ret = 0;
> +
> +        if (!num_service_cores) {

It would be better to explicictly check against 0.

> +                ERROR("Failed to install Rx interrupts, "
> +                      "no service core found");
> +                return -ENOTSUP;
> +        }
> +        /* prepare service info */
> +        memset(&service, 0, sizeof(struct rte_service_spec));
> +        snprintf(service.name, sizeof(service.name), "%s_Rx_service",
> +                 priv->dev->device->name);

You might want to use the eth_dev name here.
ConnectX-3 will have two physical ports using the same PCI id. This PCI
id is used as rte_device->name, which would result here in the same
rte_service name. I don't know if there is conflict resolution.

In any case, use the eth_dev name, it _should_ be unique.

> +        service.socket_id = priv->dev->data->numa_node;
> +        service.callback = fs_rx_event_proxy_service;
> +        service.callback_userdata = (void *)priv;
> +
> +        if (priv->rxp.sstate == SS_NO_SREVICE) {

Typo with SREVICE.

> +                uint32_t service_core_list[num_service_cores];
> +
> +                /* get a service core to work with */
> +                ret = rte_service_lcore_list(service_core_list,
> +                                             num_service_cores);
> +                if (ret <= 0) {
> +                        ERROR("Failed to install Rx interrupts, "
> +                              "service core list empty or corrupted");
> +                        return -ENOTSUP;
> +                }
> +                priv->rxp.scid = service_core_list[0];
> +                ret = rte_service_lcore_add(priv->rxp.scid);
> +                if (ret && ret != -EALREADY) {
> +                        ERROR("Failed adding service core");
> +                        return ret;
> +                }
> +                /* service core may be in "stopped" state, start it */
> +                ret = rte_service_lcore_start(priv->rxp.scid);
> +                if (ret && (ret != -EALREADY)) {
> +                        ERROR("Failed to install Rx interrupts, "
> +                              "service core not started");
> +                        return ret;
> +                }
> +                /* register our service */
> +                int32_t ret = rte_service_component_register(&service,
> +                                                             &priv->rxp.sid);
> +                if (ret) {
> +                        ERROR("service register() failed");
> +                        return -ENOEXEC;
> +                }
> +                priv->rxp.sstate = SS_REGISTERED;
> +                /* run the service */
> +                ret = rte_service_component_runstate_set(priv->rxp.sid, 1);
> +                if (ret < 0) {
> +                        ERROR("Failed Setting component runstate\n");
> +                        return ret;
> +                }
> +                ret = rte_service_set_stats_enable(priv->rxp.sid, 1);
> +                if (ret < 0) {
> +                        ERROR("Failed enabling stats\n");
> +                        return ret;
> +                }
> +                ret = rte_service_runstate_set(priv->rxp.sid, 1);
> +                if (ret < 0) {
> +                        ERROR("Failed to run service\n");
> +                        return ret;
> +                }
> +                priv->rxp.sstate = SS_READY;
> +                /* map the service with the service core */
> +                ret = rte_service_map_lcore_set(priv->rxp.sid,
> +                                                priv->rxp.scid, 1);
> +                if (ret) {
> +                        ERROR("Failed to install Rx interrupts, "
> +                              "could not map service core");
> +                        return ret;
> +                }
> +                priv->rxp.sstate = SS_RUNNING;
> +        }
> +        return 0;
> +}
> +
> +/**
> + * Install failsafe Rx event proxy subsystem.
> + * This is the way the failsafe PMD generates Rx events on behalf of its
> + * subdevices.
> + *
> + * @param priv
> + *   Pointer to failsafe private structure.
> + * @return
> + *   0 on success, negative errno value otherwise and rte_errno is set.
> + */
> +static int
> +fs_rx_event_proxy_install(struct fs_priv *priv)
> +{
> +        int rc = 0;
> +
> +        /* create the epoll to wait on for Rx events form subdevices */

-->        /* create the epoll to wait on for Rx events from subdevices */

> +        priv->rxp.efd = epoll_create1(0);
> +        if (priv->rxp.efd < 0) {
> +                rte_errno = errno;
> +                ERROR("failed to create epoll,"
> +                      " Rx interrupts will not be supported");

Failed should be capitalized.

> +                return -rte_errno;
> +        }
> +        /* allocate memory for receiving the Rx events from the subdevices. */
> +        priv->rxp.evec = calloc(NUM_RX_PROXIES, sizeof(*priv->rxp.evec));
> +        if (priv->rxp.evec == NULL) {
> +                ERROR("failed to allocate memory for event vectors,"
> +                      " Rx interrupts will not be supported");

idem.

> +                rc = -ENOMEM;
> +                goto error;
> +        }
> +        if (fs_rx_event_proxy_service_install(priv) < 0) {
> +                rc = -rte_errno;
> +                goto error;
> +        }
> +        return 0;
> +error:
> +        if (priv->rxp.efd >= 0)
> +                close(priv->rxp.efd);
> +        if (priv->rxp.evec)

Check against NULL.

> +                free(priv->rxp.evec);
> +        rte_errno = -rc;
> +        return rc;
> +}
> +
> +/**
> + * Uninstall failsafe Rx event proxy.
> + *
> + * @param priv
> + *   Pointer to failsafe private structure.
> + */
> +static void
> +fs_rx_event_proxy_uninstall(struct fs_priv *priv)
> +{
> +        fs_rx_event_proxy_service_uninstall(priv);
> +        if (priv->rxp.evec) {

Check against NULL.

> +                free(priv->rxp.evec);
> +                priv->rxp.evec = NULL;
> +        }
> +        if (priv->rxp.efd > 0) {
> +                close(priv->rxp.efd);
> +                priv->rxp.efd = -1;
> +        }
> +}
> +
> +/**
> + * Uninstall failsafe interrupt vector.
> + *
> + * @param priv
> + *   Pointer to failsafe private structure.
> + */
> +static void
> +fs_rx_intr_vec_uninstall(struct fs_priv *priv)
> +{
> +        struct rte_intr_handle *intr_handle = &priv->intr_handle;
> +
> +        if (intr_handle->intr_vec) {

Check against NULL.

> +                free(intr_handle->intr_vec);
> +                intr_handle->intr_vec = NULL;
> +        }
> +        intr_handle->nb_efd = 0;
> +}
> +/**
> + * Installs failsafe interrupt vector to be registered with EAL later on.
> + *
> + * @param priv
> + *   Pointer to failsafe private structure.
> + *
> + * @return
> + *   0 on success, negative errno value otherwise and rte_errno is set.
> + */
> +static int
> +fs_rx_intr_vec_install(struct fs_priv *priv)
> +{
> +        unsigned int i;
> +        unsigned int rxqs_n = priv->dev->data->nb_rx_queues;
> +        unsigned int n = RTE_MIN(rxqs_n, (uint32_t)RTE_MAX_RXTX_INTR_VEC_ID);
> +        unsigned int count = 0;
> +        struct rte_intr_handle *intr_handle = &priv->intr_handle;
> +
> +        /* Allocate the interrupt vector of the failsafe Rx proxy interrupts */
> +        intr_handle->intr_vec = malloc(n * sizeof(intr_handle->intr_vec[0]));
> +        if (intr_handle->intr_vec == NULL) {
> +                fs_rx_intr_vec_uninstall(priv);
> +                rte_errno = ENOMEM;
> +                ERROR("failed to allocate memory for interrupt vector,"
> +                      " Rx interrupts will not be supported");

Failed capitalized.

> +                return -rte_errno;
> +        }
> +        for (i = 0; i < n; i++) {
> +                struct rxq *rxq = priv->dev->data->rx_queues[i];
> +
> +                /* Skip queues that cannot request interrupts. */
> +                if (!rxq || rxq->event_fd < 0) {
> +                        /* Use invalid intr_vec[] index to disable entry. */
> +                        intr_handle->intr_vec[i] =
> +                                RTE_INTR_VEC_RXTX_OFFSET +
> +                                RTE_MAX_RXTX_INTR_VEC_ID;
> +                        continue;
> +                }
> +                if (count >= RTE_MAX_RXTX_INTR_VEC_ID) {
> +                        rte_errno = E2BIG;
> +                        ERROR("too many Rx queues for interrupt vector size"
> +                              " (%d), Rx interrupts cannot be enabled",

Too capitalized.

> +                              RTE_MAX_RXTX_INTR_VEC_ID);
> +                        fs_rx_intr_vec_uninstall(priv);
> +                        return -rte_errno;
> +                }
> +                intr_handle->intr_vec[i] = RTE_INTR_VEC_RXTX_OFFSET + count;
> +                intr_handle->efds[count] = rxq->event_fd;
> +                count++;
> +        }
> +        if (!count)

It would be better compared with 0.

> +                fs_rx_intr_vec_uninstall(priv);
> +        else
> +                intr_handle->nb_efd = count;
> +        return 0;
> +}
> +
> +/**
> + * RX Interrupt control per subdevice.
> + *
> + * @param sdev
> + *   Pointer to sub-device structure.
> + * @param op
> + *   The operation be performed for the vector.
> + *   Operation type of {RTE_INTR_EVENT_ADD, RTE_INTR_EVENT_DEL}.
> + * @return
> + *   - On success, zero.
> + *   - On failure, a negative value.
> + */
> +static int
> +failsafe_eth_rx_intr_ctl_subdevice(struct sub_device *sdev, int op)
> +{
> +        struct rte_eth_dev *dev;
> +        struct rte_eth_dev *fsdev;
> +        int epfd;
> +        uint16_t pid;
> +        uint16_t qid;
> +        struct rxq *fsrxq;
> +        int rc;
> +        int ret = 0;
> +
> +        if (sdev == NULL || (ETH(sdev) == NULL) ||
> +            sdev->fs_dev == NULL || (PRIV(sdev->fs_dev) == NULL)) {
> +                ERROR("Called with invalid arguments");
> +                return -EINVAL;
> +        }
> +        dev = ETH(sdev);
> +        fsdev = sdev->fs_dev;
> +        epfd = PRIV(sdev->fs_dev)->rxp.efd;
> +        pid = PORT_ID(sdev);
> +
> +        if (epfd <= 0) {
> +                if (op == RTE_INTR_EVENT_ADD) {
> +                        ERROR("proxy events are not initialized");

Proxy should be capitalized here.

> +                        return -EBADFD;
> +                } else {
> +                        return 0;
> +                }
> +        }
> +        if (dev->data->nb_rx_queues > fsdev->data->nb_rx_queues) {
> +                ERROR("subdevice has too many queues,"
> +                      " Interrupts will not be enabled");
> +                        return -E2BIG;
> +        }
> +        for (qid = 0; qid < dev->data->nb_rx_queues; qid++) {
> +                fsrxq = fsdev->data->rx_queues[qid];
> +                rc = rte_eth_dev_rx_intr_ctl_q(pid, qid, epfd,
> +                                               op, (void *)fsrxq);
> +                if (rc) {
> +                        ERROR("rte_eth_dev_rx_intr_ctl_q failed for "
> +                              "port %d  queue %d, epfd %d, error %d",
> +                              pid, qid, epfd, rc);
> +                        ret = rc;
> +                }
> +        }
> +        return ret;
> +}
> +
> +/**
> + * Install Rx interrupts subsystem for a subdevice.
> + * This is a support for dynamically adding subdevices.

So it works with Matan's patch for capturing ethdev?
Have you tested capturing ports with the following configurations:

port \ conf
failsafe rx intr       on   |   on   |  off  |  off
ethdev   rx intr       on   |   off  |  on   |  off
                                         |
        .--------------------------------'
       (_
         `-> and how should this configuration work?

> + *
> + * @param sdev
> + *   Pointer to subdevice structure.
> + *
> + * @return
> + *   0 on success, negative errno value otherwise and rte_errno is set.
> + */
> +int failsafe_rx_intr_install_subdevice(struct sub_device *sdev)
> +{
> +        int rc;
> +        int qid;
> +        struct rte_eth_dev *fsdev = sdev->fs_dev;
> +        struct rxq **rxq = (struct rxq **)fsdev->data->rx_queues;
> +        const struct rte_intr_conf *const intr_conf =
> +                                &ETH(sdev)->data->dev_conf.intr_conf;
> +
> +        if (!intr_conf->rxq)

Explicit comparison please.

> +                return 0;
> +        rc = failsafe_eth_rx_intr_ctl_subdevice(sdev, RTE_INTR_EVENT_ADD);
> +        if (rc)
> +                return rc;
> +        /* enable interrupts on already-enabled queues */
> +        for (qid = 0; qid < ETH(sdev)->data->nb_rx_queues; qid++) {
> +                if (rxq[qid]->enable_events) {
> +                        int ret = rte_eth_dev_rx_intr_enable(PORT_ID(sdev),
> +                                                             qid);
> +                        if (ret && (ret != -ENOTSUP)) {
> +                                ERROR("Failed to enable interrupts on "
> +                                      "port %d queue %d", PORT_ID(sdev), qid);
> +                                rc = ret;
> +                        }
> +                }
> +        }
> +        return rc;
> +}
> +
> +/**
> + * Uninstall Rx interrupts subsystem for a subdevice.
> + * This is a support for dynamically removing subdevices.
> + *
> + * @param sdev
> + *   Pointer to subdevice structure.
> + *
> + * @return
> + *   0 on success, negative errno value otherwise and rte_errno is set.
> + */
> +void failsafe_rx_intr_uninstall_subdevice(struct sub_device *sdev)
> +{
> +        int qid;
> +
> +        for (qid = 0; qid < ETH(sdev)->data->nb_rx_queues; qid++)
> +                rte_eth_dev_rx_intr_disable(PORT_ID(sdev), qid);
> +        failsafe_eth_rx_intr_ctl_subdevice(sdev, RTE_INTR_EVENT_DEL);
> +}
> +
> +/**
> + * Uninstall failsafe Rx interrupts subsystem.
> + *
> + * @param priv
> + *   Pointer to private structure.
> + *
> + * @return
> + *   0 on success, negative errno value otherwise and rte_errno is set.
> + */
> +void
> +failsafe_rx_intr_uninstall(struct rte_eth_dev *dev)
> +{
> +        struct fs_priv *priv = PRIV(dev);
> +        struct rte_intr_handle *intr_handle = &priv->intr_handle;
> +
> +        dev->intr_handle = NULL;
> +        rte_intr_free_epoll_fd(intr_handle);
> +        fs_rx_event_proxy_uninstall(priv);
> +        if (intr_handle->intr_vec) {

Needs an explicit comparison with NULL.

> +                free(intr_handle->intr_vec);
> +                intr_handle->intr_vec = NULL;
> +        }
> +        intr_handle->nb_efd = 0;
> +}
> +
> +/**
> + * Install failsafe Rx interrupts subsystem.
> + *
> + * @param priv
> + *   Pointer to private structure.
> + *
> + * @return
> + *   0 on success, negative errno value otherwise and rte_errno is set.
> + */
> +int
> +failsafe_rx_intr_install(struct rte_eth_dev *dev)
> +{
> +        struct fs_priv *priv = PRIV(dev);
> +        const struct rte_intr_conf *const intr_conf =
> +                        &priv->dev->data->dev_conf.intr_conf;
> +
> +        if (!intr_conf->rxq || priv->intr_handle.intr_vec != NULL)

Needs an explicit comparison.

> +                return 0;
> +        if (fs_rx_intr_vec_install(priv) < 0)
> +                return -rte_errno;
> +        if (fs_rx_event_proxy_install(priv) < 0) {
> +                fs_rx_intr_vec_uninstall(priv);
> +                return -rte_errno;
> +        }
> +        priv->intr_handle.efd_counter_size = sizeof(uint64_t);
> +        dev->intr_handle = &priv->intr_handle;
> +        return 0;
> +}
> +
> +
> +/**
> + * DPDK callback for Rx queue interrupt disable.
> + *
> + * @param dev
> + *   Pointer to Ethernet device structure.
> + * @param idx
> + *   Rx queue index.
> + *
> + * @return
> + *   0 on success, negative errno value otherwise and rte_errno is set.
> + */
> +int
> +failsafe_rx_intr_disable(struct rte_eth_dev *dev, uint16_t idx)
> +{
> +        struct rxq *rxq = dev->data->rx_queues[idx];
> +        struct sub_device *sdev;
> +        uint64_t u64;
> +        uint8_t i;
> +        int rc = 0;
> +        int ret;
> +
> +        if (!rxq || rxq->event_fd <= 0) {

Needs an explicit comparison.

> +                rte_errno = EINVAL;
> +                return -rte_errno;
> +        }
> +        rxq->enable_events = 0;
> +        FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
> +                ret = rte_eth_dev_rx_intr_disable(PORT_ID(sdev), idx);
> +                if ((ret != -ENODEV) && !fs_err(sdev, ret))
> +                        rc = ret;
> +        }
> +        /* Clear pending events */
> +        while (read(rxq->event_fd, &u64, sizeof(uint64_t)) >  0)
> +                ;
> +        if (rc)
> +                rte_errno = -rc;
> +        return rc;
> +}
> +
> +/**
> + * DPDK callback for Rx queue interrupt enable.
> + *
> + * @param dev
> + *   Pointer to Ethernet device structure.
> + * @param idx
> + *   Rx queue index.
> + *
> + * @return
> + *   0 on success, negative errno value otherwise and rte_errno is set.
> + */
> +int
> +failsafe_rx_intr_enable(struct rte_eth_dev *dev, uint16_t idx)
> +{
> +        struct rxq *rxq = dev->data->rx_queues[idx];
> +        struct sub_device *sdev;
> +        uint8_t i;
> +        int rc = 0;
> +        int ret;
> +
> +        if (!rxq || rxq->event_fd <= 0) {
> +                rte_errno = EINVAL;
> +                return -rte_errno;
> +        }
> +        /* Let the proxy service run. */
> +        if (PRIV(dev)->rxp.sstate != SS_RUNNING) {
> +                ERROR("failsafe interrupt services are not running");
> +                rte_errno = EAGAIN;
> +                return -rte_errno;
> +        }
> +        rxq->enable_events = 1;
> +        FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
> +                ret = rte_eth_dev_rx_intr_enable(PORT_ID(sdev), idx);
> +                if ((ret != -ENODEV) && !fs_err(sdev, ret))
> +                        rc = ret;
> +        }
> +        if (rc) {
> +                failsafe_rx_intr_disable(dev, idx);
> +                rte_errno = -rc;
> +        }
> +        return rc;
> +}
> diff --git a/drivers/net/failsafe/failsafe_ops.c b/drivers/net/failsafe/failsafe_ops.c
> index 0976745..b5b4eab 100644
> --- a/drivers/net/failsafe/failsafe_ops.c
> +++ b/drivers/net/failsafe/failsafe_ops.c
> @@ -32,6 +32,7 @@
>   */
>  
>  #include <stdint.h>
> +#include <unistd.h>
>  
>  #include <rte_debug.h>
>  #include <rte_atomic.h>
> @@ -160,6 +161,10 @@
>          uint8_t i;
>          int ret;
>  
> +        ret = failsafe_rx_intr_install(dev);
> +        if (ret)
> +                return ret;
> +
>          FOREACH_SUBDEV(sdev, i, dev) {
>                  if (sdev->state != DEV_ACTIVE)
>                          continue;
> @@ -170,6 +175,11 @@
>                                  continue;
>                          return ret;
>                  }
> +                ret = failsafe_rx_intr_install_subdevice(sdev);
> +                if (ret) {
> +                        rte_eth_dev_stop(PORT_ID(sdev));
> +                        return ret;
> +                }
>                  sdev->state = DEV_STARTED;
>          }
>          if (PRIV(dev)->state < DEV_STARTED)
> @@ -186,9 +196,11 @@
>  
>          PRIV(dev)->state = DEV_STARTED - 1;
>          FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_STARTED) {
> +                failsafe_rx_intr_uninstall_subdevice(sdev);
>                  rte_eth_dev_stop(PORT_ID(sdev));
>                  sdev->state = DEV_STARTED - 1;
>          }
> +        failsafe_rx_intr_uninstall(dev);
>  }
>  
>  static int
> @@ -259,6 +271,8 @@
>          if (queue == NULL)
>                  return;
>          rxq = queue;
> +        if (rxq->event_fd > 0)
> +                close(rxq->event_fd);
>          dev = rxq->priv->dev;
>          FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE)
>                  SUBOPS(sdev, rx_queue_release)
> @@ -275,6 +289,14 @@
>                  const struct rte_eth_rxconf *rx_conf,
>                  struct rte_mempool *mb_pool)
>  {
> +        /*
> +         * Fake MSIX interrupts causing rte_intr_efd_enable to
> +         * allocate an eventfd for us.
> +         */

I'm a bit sceptic about it.
This seems like subverting the API for your own mean.

The preferred way would be to extend the API, for example by introducing
a speficic type that would ask for additional eventfd allocation.

The implementation would have been very simple, but would be much
cleaner.

It seems too late now, but that should be done instead of keeping this.

> +        struct rte_intr_handle intr_handle = {
> +                .type = RTE_INTR_HANDLE_VFIO_MSIX,
> +                .efds = {-1, },
> +        };
>          struct sub_device *sdev;
>          struct rxq *rxq;
>          uint8_t i;
> @@ -300,6 +322,10 @@
>          rxq->info.nb_desc = nb_rx_desc;
>          rxq->priv = PRIV(dev);
>          rxq->sdev = PRIV(dev)->subs;
> +        ret = rte_intr_efd_enable(&intr_handle, 1);
> +        if (ret < 0)
> +                return ret;
> +        rxq->event_fd = intr_handle.efds[0];
>          dev->data->rx_queues[rx_queue_id] = rxq;
>          FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
>                  ret = rte_eth_rx_queue_setup(PORT_ID(sdev),
> @@ -781,4 +807,6 @@
>          .mac_addr_add = fs_mac_addr_add,
>          .mac_addr_set = fs_mac_addr_set,
>          .filter_ctrl = fs_filter_ctrl,
> +        .rx_queue_intr_enable = failsafe_rx_intr_enable,
> +        .rx_queue_intr_disable = failsafe_rx_intr_disable,

You should add those two ops between ".tx_queue_release" and
".flow_ctrl_get", to keep the same order as the struct eth_dev_ops.

Otherwise, it would be better as well to have those two ops implemented
as static functions within this file (as all eth_dev_ops), calling your
rxq intr implementations from here.

>  };
> diff --git a/drivers/net/failsafe/failsafe_private.h b/drivers/net/failsafe/failsafe_private.h
> index b377046..d7617c4 100644
> --- a/drivers/net/failsafe/failsafe_private.h
> +++ b/drivers/net/failsafe/failsafe_private.h
> @@ -40,6 +40,7 @@
>  #include <rte_dev.h>
>  #include <rte_ethdev.h>
>  #include <rte_devargs.h>
> +#include <rte_interrupts.h>
>  
>  #define FAILSAFE_DRIVER_NAME "Fail-safe PMD"
>  #define FAILSAFE_OWNER_NAME "Fail-safe"
> @@ -61,6 +62,13 @@
>  
>  #define DEVARGS_MAXLEN 4096
>  
> +enum rxp_service_state {
> +        SS_NO_SREVICE = 0,

typo for SREVICE.

> +        SS_REGISTERED,
> +        SS_READY,
> +        SS_RUNNING,
> +};
> +
>  /* TYPES */
>  
>  struct rxq {
> @@ -69,10 +77,25 @@ struct rxq {
>          /* next sub_device to poll */
>          struct sub_device *sdev;
>          unsigned int socket_id;
> +        int event_fd;
> +        unsigned int enable_events:1;
>          struct rte_eth_rxq_info info;
>          rte_atomic64_t refcnt[];
>  };
>  
> +struct rx_proxy {
> +        /* epoll file descriptor */
> +        int efd;
> +        /* event vector to be used by epoll */
> +        struct rte_epoll_event *evec;
> +        /* rte service id */
> +        uint32_t sid;
> +        /* service core id */
> +        uint32_t scid;
> +        enum rxp_service_state sstate;
> +
> +};
> +
>  struct txq {
>          struct fs_priv *priv;
>          uint16_t qid;
> @@ -147,6 +170,7 @@ struct fs_priv {
>          /* current capabilities */
>          struct rte_eth_dev_info infos;
>          struct rte_eth_dev_owner my_owner; /* Unique owner. */
> +        struct rte_intr_handle intr_handle; /* Port interrupt handle. */
>          /*
>           * Fail-safe state machine.
>           * This level will be tracking state of the EAL and eth
> @@ -159,8 +183,28 @@ struct fs_priv {
>          unsigned int pending_alarm:1; /* An alarm is pending */
>          /* flow isolation state */
>          int flow_isolated:1;
> +        /*
> +         * Rx interrupts/events proxy.
> +         * The PMD issues Rx events to the EAL on behalf of its subdevices,
> +         *  it does that by registering event queues to the EAL. Each such
> +         *  queue represents a failsafe Rx queue. A PMD service thread listens
> +         *  to all the Rx events of of all the failsafe subdevices.
> +         *  When an Rx event is issued by a subdevice Rx queue it will be
> +         *  caught by the service and delivered by it to the appropriate
> +         *  failsafe event queue.
> +         */
> +        struct rx_proxy rxp;

Not very important, but can you put this before the :1 bitfields?

>  };
>  
> +/* FAILSAFE_INTR */
> +
> +int failsafe_rx_intr_install(struct rte_eth_dev *dev);
> +void failsafe_rx_intr_uninstall(struct rte_eth_dev *dev);
> +int failsafe_rx_intr_enable(struct rte_eth_dev *dev, uint16_t idx);
> +int failsafe_rx_intr_disable(struct rte_eth_dev *dev, uint16_t idx);
> +int failsafe_rx_intr_install_subdevice(struct sub_device *sdev);
> +void failsafe_rx_intr_uninstall_subdevice(struct sub_device *sdev);
> +
>  /* MISC */
>  
>  int failsafe_hotplug_alarm_install(struct rte_eth_dev *dev);

Overall good quality code, thanks!

-- 
Gaëtan Rivet
6WIND


More information about the dev mailing list