[dpdk-dev] [Bug 388] ixgbe: link state race condition can occur when starting a fiber port

bugzilla at dpdk.org bugzilla at dpdk.org
Sat Feb 1 00:20:22 CET 2020


            Bug ID: 388
           Summary: ixgbe: link state race condition can occur when
                    starting a fiber port
           Product: DPDK
           Version: 19.08
          Hardware: x86
                OS: Linux
            Status: UNCONFIRMED
          Severity: normal
          Priority: Normal
         Component: ethdev
          Assignee: dev at dpdk.org
          Reporter: mgsmith at netgate.com
  Target Milestone: ---

Created attachment 81
  --> https://bugs.dpdk.org/attachment.cgi?id=81&action=edit


If the link is down when ports on an SFP+ X552 (device ID 0x15ac) are started,
a race condition can occur that prevents them from working when the link peer
becomes available and the link comes up.

If 2 ports are started individually with some time in between them, the issue
is not observed. The race condition seems to occur only when one port is
started and then the other is started immediately afterwards (e.g. via script
or control plane programmatically applying configuration).

Steps to reproduce:

1. Install FD.IO VPP packages (available at
https://packagecloud.io/fdio/release - vpp, vpp-lib, vpp-plugins needed) on a
CentOS 7 system with X552 SFP+ devices attached.
2. If the X552 ports are bound to the kernel ixgbe driver, take them
administratively down so VPP will take over management via '[sudo] ifdown
3. Start VPP with '[sudo] systemctl start vpp'.
4. Create a text file commands.txt containing API commands to start the ports:
echo 'sw_interface_set_flags sw_if_index 1 admin-up
sw_interface_set_flags sw_if_index 2 admin-up' > commands.txt
5. Remove the SFP+ cables from the X552 ports so that link will not be
established when they are brought up.
6. Run commands to start both ports in rapid succession with '[sudo]
vpp_api_test in commands.txt'
7. Check the link state by running '[sudo] vppctl show hardware-interface'. The
link speed should be displayed as "Unknown" and the link state should be
displayed as "no carrier".
8. Connect an SFP+ cable between the two ports.
9. Check the link state again. One port may should that it is up and the link
speed now. The other should still report Unknown/no carrier.

Actual results:

The second port started reports that it's link is down and never recovers, even
if the port is stopped and restarted.

Expected results:

The second port reports that it's link is up and can forward and receive

Build date and hardware:

Observed in DPDK 19.08 (VPP 20.01). Current DPDK master branch appears to have
the same issue.
Observed on a Xeon-D 1537 SoC with 2 copper i350 ports and 2 SFP+ X552 ports.

Additional information:

Attached gdb and found that when rte_eth_link_get_nowait() is called for the
port which was having the issue, ixgbe_dev_link_update_share() would return
before attempting to check the link state because the
IXGBE_FLAG_NEED_LINK_CONFIG flag was set on the struct ixgbe_interrupt for the
device. Further exploration showed that following sequence of events occurred:

1. ixgbe_dev_link_update_share() sets the IXGBE_FLAG_NEED_LINK_CONFIG flag and
schedules ixgbe_dev_setup_link_alarm_handler() to run after 10us.
2. ixgbe_dev_start() is executed and cancels the execution of
3. Since ixgbe_dev_setup_link_alarm_handler() is where the
IXGBE_FLAG_NEED_LINK_CONFIG flag would normally be cleared and its execution
was cancelled, the flag remains set. All subsequent calls to
ixgbe_dev_link_update_share() return early and never actually check the link
state again.

The attached patch seems to fix the issue.

You are receiving this mail because:
You are the assignee for the bug.

More information about the dev mailing list