[dpdk-dev] [PATCH v6 4/8] eal/linux: add per rx queue interrupt handling based on VFIO

Liang, Cunming cunming.liang at intel.com
Sat Feb 28 02:45:52 CET 2015


Thanks Thomas.
It's my fault that directly reply David's mail, haven't notice his mail isn't in a plain text mode.

> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Friday, February 27, 2015 10:13 PM
> To: Liang, Cunming
> Cc: David Marchand; dev at dpdk.org; Stephen Hemminger; Zhou, Danny
> Subject: Re: [PATCH v6 4/8] eal/linux: add per rx queue interrupt handling based
> on VFIO
> 
> Hi Cunming,
> 
> First, sorry to have to say that, but it is not easy to read discussions
> where quote marks are not used. I re-insert them for clarity.
> 
> Comments below.
> 
> 2015-02-27 12:22, Liang, Cunming:
> > From: David Marchand [mailto:david.marchand at 6wind.com]
> > Sent: Friday, February 27, 2015 6:34 PM
> >
> > > I am not really comfortable with this api.
> > >
> > > This is just creating something on top of the standard epoll api with
> > > limitations. In the end, we could just use an external lib that does this
> > > already.
> >
> > [Liang, Cunming] Not really, I think. We try to protect the data inside
> > ‘rte_intr_handle’, it doesn’t expect user to understand the things defined
> > inside ‘rte_intr_handle’.
> > It’s better typedef ‘rte_intr_handle’ as a raw integer ID, having a function
> > to get it from a ethdev. Then all the interrupt api is around it.
> > It provides the common pci NIC devices rxtx interrupt processing approach.
> > For the limitations, we can fix it step by step.
> >
> > > So ok, this will work for your limited use case, but this will not be
> > > really useful for anything else.
> > > Not sure it has its place in eal, this is more an example to me.
> >
> > [Liang, Cunming] ‘limited use case’ do you means only for rxtx ?
> > It don’t expect to provide a generic event mechanism (like libev/libevent
> > does), but a simple way to allow PMD work with DMA interrupt. It mainly
> > abstract for rx interrupt purpose. I appreciate if you could help to list
> > more useful cases.
> 
> You don't expect to provide a generic event mechanism but application
> developpers could need to wait for many events at once, not only Rx ones.
> That's why it's better to provide only the needed parts to use something
> generic like libevent.
> And we should avoid reinventing the wheel.
[LCM] Ok, I get you. I have a simple proposal to allow either RX event or other events can be handled in rte_intr_wait().
For the input data 'epoll_data', instead of using 'u32', let's keep use 'int fd'.
If the most significant bit is 0, event[n] stands for a fd. If it's 1, event[0]&0xFFFF stands for a vector number.
So during 'rte_intr_set', it get 16bit vector number and encode it as a 32bit int with the most significant bit 1.
Then on 'rte_intr_wait', only process the data.fd with the most significant bit 1. And bypass the user fd.
'rte_intr_wait(struct rte_intr_handle *intr_handle, int epfd, int *event, uint16_t num)'.
As user already can assign an epfd, so they can add any normal event fd into the epfd. Make sense ?
> 
> > > > +static void
> > > > +eal_intr_process_rxtx_interrupts(struct rte_intr_handle *intr_handle,
> > > > +                                struct epoll_event *events,
> > > > +                                uint32_t *vec, int nfds)
> > > > +{
> > > > +       int i, bytes_read;
> > > > +       union rte_intr_read_buffer buf;
> > > > +       int fd;
> > > > +
> > > > +       for (i = 0; i < nfds; i++) {
> > > > +               /* set the length to be read for different handle type */
> > > > +               switch (intr_handle->type) {
> > > > +               case RTE_INTR_HANDLE_UIO:
> > > > +                       bytes_read = sizeof(buf.uio_intr_count);
> > > > +                       break;
> > > > +               case RTE_INTR_HANDLE_ALARM:
> > > > +                       bytes_read = sizeof(buf.timerfd_num);
> > > > +                       break;
> > > > +#ifdef VFIO_PRESENT
> > > > +               case RTE_INTR_HANDLE_VFIO_MSIX:
> > > > +               case RTE_INTR_HANDLE_VFIO_MSI:
> > > > +               case RTE_INTR_HANDLE_VFIO_LEGACY:
> > > > +                       bytes_read = sizeof(buf.vfio_intr_count);
> > > > +                       break;
> > > > +#endif
> > > > +               default:
> > > > +                       bytes_read = 1;
> > > > +                       break;
> > > > +               }
> > > > +
> > > > +               /**
> > > > +               * read out to clear the ready-to-be-read flag
> > > > +               * for epoll_wait.
> > > > +               */
> > > > +               vec[i] = events[i].data.u32;
> > > > +               assert(vec[i] < VFIO_MAX_RXTX_INTR_ID);
> > > > +
> > > > +               fd = intr_handle->efds[vec[i]];
> > > > +               bytes_read = read(fd, &buf, bytes_read);
> > > > +               if (bytes_read < 0)
> > > > +                       RTE_LOG(ERR, EAL, "Error reading from file "
> > > > +                               "descriptor %d: %s\n", fd, strerror(errno));
> > > > +               else if (bytes_read == 0)
> > > > +                       RTE_LOG(ERR, EAL, "Read nothing from file "
> > > > +                               "descriptor %d\n", fd);
> > > > +       }
> > > > +}
> > >
> > > Why unconditionnally read ?
> > > You are absorbing events from the application if the application gave you
> > > an external epfd and populated it with its own fds.
> >
> > [Liang, Cunming] The vector number was checked. If an external epfd
> > populated some event carry fd rather than a data.u32 but the value
> > inside the valid range, it considers as a valid vector number. No matter
> > the read success or not, it always notify the event. Do you have any
> > suggestion used here to check the condition ?
> >
> > > > +static int init_tls_epfd(void)
> > > > +{
> > > > +       int pfd = epoll_create(1);
> > > > +       if (pfd < 0) {
> > > > +               RTE_LOG(ERR, EAL,
> > > > +                       "Cannot create epoll instance\n");
> > > > +               return -1;
> > > > +       }
> > > > +       return pfd;
> > > > +}
> > > > +
> > > > +int
> > > > +rte_intr_rx_wait(struct rte_intr_handle *intr_handle, int epfd,
> > > > +                uint32_t *vec, uint16_t num)
> > > > +{
> > >
> > > In the end, this "rx" does not mean anything to eal.
> >
> > [Liang, Cunming] That’s a good point. I tried to remove ‘rx’ and use a
> > generic word here. ‘rte_intr_wait’ looks like too generic,
> > ‘rte_intr_epfd_wait’ looks not abstract with bsd.
> > As the function only serves for rxtx vector, so using the rx prefix.
> > Which name do you prefer ?
> 
> You should understand that you are trying to wrongly replace a generic lib.
> The best name is probably /dev/null.
[Liang, Cunming] If allowing other user fd added into the epfd instance, I feel ok to rename it to rte_intr_wait().
> 
> > > > +#define MAX_EVENTS      8
> > > > +       struct epoll_event events[MAX_EVENTS];
> > > > +       int ret, nfds = 0;
> > > > +
> > > > +       if (!intr_handle || !vec) {
> > > > +               RTE_LOG(ERR, EAL, "invalid input parameter\n");
> > > > +               return -1;
> > > > +       }
> > > > +
> > > > +       if (intr_handle->type != RTE_INTR_HANDLE_VFIO_MSIX) {
> > > > +               RTE_LOG(ERR, EAL, "intr type should be VFIO_MSIX\n");
> > > > +               return -1;
> > > > +       }
> > > > +
> > > > +       if (epfd == RTE_EPOLL_FD_ANY) {
> > > > +               /* using per thread epoll fd */
> > > > +               if (unlikely(RTE_PER_LCORE(_epfd) == -1))
> > > > +                       RTE_PER_LCORE(_epfd) = init_tls_epfd();
> > > > +               epfd = RTE_PER_LCORE(_epfd);
> > > > +       }
> > >
> > > Rather than testing every time, this should be set by the caller,
> > > i.e. epfd is always valid.
> > > If application does not want to create a epfd, then it calls
> > > rte_intr_rx_wait with RTE_EPOLL_FD_ANY (this name is not well chosen)
> > > that is a macro wrapped to RTE_PER_LCORE(_epfd).
> >
> > [Liang, Cunming] It sounds good to me. As we don’t expect to expose
> > *rte_per_lcore__epfd* as an public symbol, so will define rte_epfd()
> > instread.
> > Within rte_epfd(), if RTE_PER_LCORE(_epfd) not assigned, then
> > init_tls_epfd() once.
> >
> > > init_tls_epfd() should be called only once at init time.
> > > No need to check every time.
> >
> > [Liang, Cunming] As it probably not need per thread epfd at all.
> > So I prefer to create it when it real needed as above I mentioned.
> 
> > > > +       do {
> > > > +               ret = epoll_wait(epfd, events,
> > > > +                                RTE_MIN(num, MAX_EVENTS),
> > > > +                                EAL_INTR_EPOLL_WAIT_FOREVER);
> > > > +               if (unlikely(ret < 0)) {
> > > > +                       /* epoll_wait fail */
> > > > +                       RTE_LOG(ERR, EAL, "epoll_wait returns with fail\n");
> > > > +                       return -1;
> > > > +               } else if (ret > 0) {
> > > > +                       /* epoll_wait has at least one fd ready to read */
> > > > +                       eal_intr_process_rxtx_interrupts(intr_handle, events,
> > > > +                                                        vec, ret);
> > > > +                       num -= ret;
> > > > +                       vec += ret;
> > > > +                       nfds += ret;
> > > > +               } else if (nfds > 0)
> > > > +                       break;
> > > > +       } while (num > 0);
> > > > +
> > > > +       return nfds;
> > > > +}
> > >
> > > You are blocking unless all fds have been set, so you are serialising
> > > all events.
> >
> > [Liang, Cunming] I’m not sure fully got your point. If any event arrives,
> > it gets back. Do you means if no fds added in, it’s always blocking.
> > You expect to have a timeout return ?
> 
> > > >                         RTE_LOG(ERR, EAL, "  cannot get IRQ info, "
> > > >
> > > > -                                       "error %i (%s)\n", errno, strerror(errno));
> > > > +                               "error %i (%s)\n", errno, strerror(errno));
> > >
> > > Garbage, this has nothing to do with the patch.
> >
> > [Liang, Cunming] It’s for line number exceed 80 margin complain.
> 
> The title of the patch is "add per rx queue interrupt handling based on VFIO".
> So this kind of modification is a garbage.
> Sorry, I won't play the game "idem / the same" below ;)
[Liang, Cunming] That's actually a question in my mind.
For example, there's one line change in diff file. It usually has several line code around it.
If these line code can't pass code checkpatch rule, what's the correct way to handle it?
Do you suggest having a patch first to correct the format, and then follows the patch to submit the formal content.

> 
> > > > -                       if (internal_config.vfio_intr_mode != RTE_INTR_MODE_NONE)
> {
> > > > +                       if (internal_config.vfio_intr_mode !=
> > > > +                           RTE_INTR_MODE_NONE) {
> > > >                                 RTE_LOG(ERR, EAL,
> > > > -                                               "  interrupt vector does not support eventfd!\n");
> > > > +                                       "  interrupt vector "
> > > > +                                       "does not support eventfd!\n");> >
> > >
> > > Idem.
> >
> > [Liang, Cunming] The same.
> 
> > > > -                                       "error %i (%s)\n", errno, strerror(errno));
> > > > +                               "error %i (%s)\n", errno, strerror(errno));
> > >
> > > Idem.
> >
> > [Liang, Cunming] The same.
> 
> > > >                 dev->intr_handle.vfio_dev_fd = vfio_dev_fd;
> > > > -
> > >
> > > Idem.
> >
> > [Liang, Cunming] Accept.



More information about the dev mailing list