[dpdk-dev] [PATCH v6 4/8] eal/linux: add per rx queue interrupt handling based on VFIO

Thomas Monjalon thomas.monjalon at 6wind.com
Fri Feb 27 15:13:21 CET 2015


Hi Cunming,

First, sorry to have to say that, but it is not easy to read discussions
where quote marks are not used. I re-insert them for clarity.

Comments below.

2015-02-27 12:22, Liang, Cunming:
> From: David Marchand [mailto:david.marchand at 6wind.com]
> Sent: Friday, February 27, 2015 6:34 PM
> 
> > I am not really comfortable with this api.
> > 
> > This is just creating something on top of the standard epoll api with
> > limitations. In the end, we could just use an external lib that does this
> > already.
> 
> [Liang, Cunming] Not really, I think. We try to protect the data inside
> ‘rte_intr_handle’, it doesn’t expect user to understand the things defined
> inside ‘rte_intr_handle’.
> It’s better typedef ‘rte_intr_handle’ as a raw integer ID, having a function
> to get it from a ethdev. Then all the interrupt api is around it.
> It provides the common pci NIC devices rxtx interrupt processing approach.
> For the limitations, we can fix it step by step.
> 
> > So ok, this will work for your limited use case, but this will not be
> > really useful for anything else.
> > Not sure it has its place in eal, this is more an example to me.
> 
> [Liang, Cunming] ‘limited use case’ do you means only for rxtx ?
> It don’t expect to provide a generic event mechanism (like libev/libevent
> does), but a simple way to allow PMD work with DMA interrupt. It mainly
> abstract for rx interrupt purpose. I appreciate if you could help to list
> more useful cases.

You don't expect to provide a generic event mechanism but application
developpers could need to wait for many events at once, not only Rx ones.
That's why it's better to provide only the needed parts to use something
generic like libevent.
And we should avoid reinventing the wheel.

> > > +static void
> > > +eal_intr_process_rxtx_interrupts(struct rte_intr_handle *intr_handle,
> > > +                                struct epoll_event *events,
> > > +                                uint32_t *vec, int nfds)
> > > +{
> > > +       int i, bytes_read;
> > > +       union rte_intr_read_buffer buf;
> > > +       int fd;
> > > +
> > > +       for (i = 0; i < nfds; i++) {
> > > +               /* set the length to be read for different handle type */
> > > +               switch (intr_handle->type) {
> > > +               case RTE_INTR_HANDLE_UIO:
> > > +                       bytes_read = sizeof(buf.uio_intr_count);
> > > +                       break;
> > > +               case RTE_INTR_HANDLE_ALARM:
> > > +                       bytes_read = sizeof(buf.timerfd_num);
> > > +                       break;
> > > +#ifdef VFIO_PRESENT
> > > +               case RTE_INTR_HANDLE_VFIO_MSIX:
> > > +               case RTE_INTR_HANDLE_VFIO_MSI:
> > > +               case RTE_INTR_HANDLE_VFIO_LEGACY:
> > > +                       bytes_read = sizeof(buf.vfio_intr_count);
> > > +                       break;
> > > +#endif
> > > +               default:
> > > +                       bytes_read = 1;
> > > +                       break;
> > > +               }
> > > +
> > > +               /**
> > > +               * read out to clear the ready-to-be-read flag
> > > +               * for epoll_wait.
> > > +               */
> > > +               vec[i] = events[i].data.u32;
> > > +               assert(vec[i] < VFIO_MAX_RXTX_INTR_ID);
> > > +
> > > +               fd = intr_handle->efds[vec[i]];
> > > +               bytes_read = read(fd, &buf, bytes_read);
> > > +               if (bytes_read < 0)
> > > +                       RTE_LOG(ERR, EAL, "Error reading from file "
> > > +                               "descriptor %d: %s\n", fd, strerror(errno));
> > > +               else if (bytes_read == 0)
> > > +                       RTE_LOG(ERR, EAL, "Read nothing from file "
> > > +                               "descriptor %d\n", fd);
> > > +       }
> > > +}
> > 
> > Why unconditionnally read ?
> > You are absorbing events from the application if the application gave you
> > an external epfd and populated it with its own fds.
> 
> [Liang, Cunming] The vector number was checked. If an external epfd
> populated some event carry fd rather than a data.u32 but the value
> inside the valid range, it considers as a valid vector number. No matter
> the read success or not, it always notify the event. Do you have any
> suggestion used here to check the condition ?
> 
> > > +static int init_tls_epfd(void)
> > > +{
> > > +       int pfd = epoll_create(1);
> > > +       if (pfd < 0) {
> > > +               RTE_LOG(ERR, EAL,
> > > +                       "Cannot create epoll instance\n");
> > > +               return -1;
> > > +       }
> > > +       return pfd;
> > > +}
> > > +
> > > +int
> > > +rte_intr_rx_wait(struct rte_intr_handle *intr_handle, int epfd,
> > > +                uint32_t *vec, uint16_t num)
> > > +{
> > 
> > In the end, this "rx" does not mean anything to eal.
> 
> [Liang, Cunming] That’s a good point. I tried to remove ‘rx’ and use a
> generic word here. ‘rte_intr_wait’ looks like too generic,
> ‘rte_intr_epfd_wait’ looks not abstract with bsd.
> As the function only serves for rxtx vector, so using the rx prefix.
> Which name do you prefer ?

You should understand that you are trying to wrongly replace a generic lib.
The best name is probably /dev/null.

> > > +#define MAX_EVENTS      8
> > > +       struct epoll_event events[MAX_EVENTS];
> > > +       int ret, nfds = 0;
> > > +
> > > +       if (!intr_handle || !vec) {
> > > +               RTE_LOG(ERR, EAL, "invalid input parameter\n");
> > > +               return -1;
> > > +       }
> > > +
> > > +       if (intr_handle->type != RTE_INTR_HANDLE_VFIO_MSIX) {
> > > +               RTE_LOG(ERR, EAL, "intr type should be VFIO_MSIX\n");
> > > +               return -1;
> > > +       }
> > > +
> > > +       if (epfd == RTE_EPOLL_FD_ANY) {
> > > +               /* using per thread epoll fd */
> > > +               if (unlikely(RTE_PER_LCORE(_epfd) == -1))
> > > +                       RTE_PER_LCORE(_epfd) = init_tls_epfd();
> > > +               epfd = RTE_PER_LCORE(_epfd);
> > > +       }
> > 
> > Rather than testing every time, this should be set by the caller,
> > i.e. epfd is always valid.
> > If application does not want to create a epfd, then it calls
> > rte_intr_rx_wait with RTE_EPOLL_FD_ANY (this name is not well chosen)
> > that is a macro wrapped to RTE_PER_LCORE(_epfd).
> 
> [Liang, Cunming] It sounds good to me. As we don’t expect to expose
> *rte_per_lcore__epfd* as an public symbol, so will define rte_epfd()
> instread.
> Within rte_epfd(), if RTE_PER_LCORE(_epfd) not assigned, then
> init_tls_epfd() once.
> 
> > init_tls_epfd() should be called only once at init time.
> > No need to check every time.
> 
> [Liang, Cunming] As it probably not need per thread epfd at all.
> So I prefer to create it when it real needed as above I mentioned.

> > > +       do {
> > > +               ret = epoll_wait(epfd, events,
> > > +                                RTE_MIN(num, MAX_EVENTS),
> > > +                                EAL_INTR_EPOLL_WAIT_FOREVER);
> > > +               if (unlikely(ret < 0)) {
> > > +                       /* epoll_wait fail */
> > > +                       RTE_LOG(ERR, EAL, "epoll_wait returns with fail\n");
> > > +                       return -1;
> > > +               } else if (ret > 0) {
> > > +                       /* epoll_wait has at least one fd ready to read */
> > > +                       eal_intr_process_rxtx_interrupts(intr_handle, events,
> > > +                                                        vec, ret);
> > > +                       num -= ret;
> > > +                       vec += ret;
> > > +                       nfds += ret;
> > > +               } else if (nfds > 0)
> > > +                       break;
> > > +       } while (num > 0);
> > > +
> > > +       return nfds;
> > > +}
> >
> > You are blocking unless all fds have been set, so you are serialising
> > all events.
> 
> [Liang, Cunming] I’m not sure fully got your point. If any event arrives,
> it gets back. Do you means if no fds added in, it’s always blocking.
> You expect to have a timeout return ?

> > >                         RTE_LOG(ERR, EAL, "  cannot get IRQ info, "
> > > 
> > > -                                       "error %i (%s)\n", errno, strerror(errno));
> > > +                               "error %i (%s)\n", errno, strerror(errno));
> > 
> > Garbage, this has nothing to do with the patch.
> 
> [Liang, Cunming] It’s for line number exceed 80 margin complain.

The title of the patch is "add per rx queue interrupt handling based on VFIO".
So this kind of modification is a garbage.
Sorry, I won't play the game "idem / the same" below ;)

> > > -                       if (internal_config.vfio_intr_mode != RTE_INTR_MODE_NONE) {
> > > +                       if (internal_config.vfio_intr_mode !=
> > > +                           RTE_INTR_MODE_NONE) {
> > >                                 RTE_LOG(ERR, EAL,
> > > -                                               "  interrupt vector does not support eventfd!\n");
> > > +                                       "  interrupt vector "
> > > +                                       "does not support eventfd!\n");> > 
> > 
> > Idem.
> 
> [Liang, Cunming] The same.

> > > -                                       "error %i (%s)\n", errno, strerror(errno));
> > > +                               "error %i (%s)\n", errno, strerror(errno));
> > 
> > Idem.
> 
> [Liang, Cunming] The same.

> > >                 dev->intr_handle.vfio_dev_fd = vfio_dev_fd;
> > > -
> > 
> > Idem.
> 
> [Liang, Cunming] Accept.



More information about the dev mailing list