[dpdk-dev] MSI-X vector #1 seems to be stalled sometimes after VF reset (ixgbe)
ruslan at purestorage.com
Tue Jan 10 03:42:10 CET 2017
Attached are 2 patches, and the discussion below is related to the slightly modified version of the dpdk 16.07 library: interrupts_excerpt.patch and dpdk_vfreset.patch
1. We use a single-shot interrupt mechanism for the RX queue (vector #1, intr_handle.efdsfile descriptor).
When we receive first interrupt, we start a polling thread. When the polling thread becomes idle again, we enable interrupts.
(to enable interrupts, we use rte_eth_dev_rx_intr_enable, queue_id = 0)
2. We enable interrupts right away for mailbox, reset adapter notifications (vector #0, intr_handle.fd file descriptor)
(to enable interrupts, we use rte_eth_dev_rx_intr_enable, queue_id = UINT16_MAX which we reserved for non-RX interrupts)
3. Changes related to interrupt setup and enabling/disabling are in interrupt_excerpt.patch
Changes: Seems like writing to the register already implies OR-semantic in interrupt enabling, so it does not seem be necessary to read previous value of the register (especially that now we have to have 2 vectors and want to avoid any race condition between reading and writing the register). Also, rte_intr_enable is going to write the same configuration to VFIO which does not seem to be necessary. Could you confirm and/or clarify that?
For disabling interrupt, it seems we have to use a different register.
4. Changes related to resetting devices are in dpdk_vfreset.patch
We used an unofficial patch from http://dpdk.org/dev/patchwork/patch/14009/ as the model. The patch is doing pretty the same thing but just maintains a state machine for our convenience, so that we can have a loop outside the reset function.
5. We see an intermittent stall of interrupt vector #1 when links are toggled. It does not always happen but only intermittently. Vector #0 still seems to work fine because we are able to get mailbox interrupts (when the adapter is reset).
Our current suspicion is that it may have something to do to the reset adapter handling in the unofficial patch (which, in turns, relies on dev_stop/dev_start functions). It appears that vector #1 (RX) interrupts are stalled intermittently only after the adapter reset takes place.
Please give your advice / suggestions.
More information about the dev