[dpdk-dev] [Bug 551] LACP failover with 802.3ad bond mode 4 takes long time

bugzilla at dpdk.org bugzilla at dpdk.org
Fri Oct 9 20:43:15 CEST 2020


https://bugs.dpdk.org/show_bug.cgi?id=551

            Bug ID: 551
           Summary: LACP failover with 802.3ad bond mode 4 takes long time
           Product: DPDK
           Version: 20.11
          Hardware: All
                OS: All
            Status: UNCONFIRMED
          Severity: major
          Priority: Normal
         Component: ethdev
          Assignee: dev at dpdk.org
          Reporter: kiran.kn80 at gmail.com
  Target Milestone: ---

When one of the bond slaves with 802.3ad is disabled, the switchover takes
almost 6 seconds which is not acceptable for any Telcos. We need sub-second
switchover time like in linux.

Testing with Juniper QFX switch.

The reason is system ID is changing (to that of the other slave device) when
one of the active slaves go down. This causes re-negotiation and hence takes a
lot of time to converge.

Is the system ID expected to be different for each link? Shouldn't it be the
same for all links?

As you can see below, system id of slave 0 is {0xac, 0x1f, 0x6b, 0x8d, 0xd7,
0xbd}
system id of slave 1 is  {0xac, 0x1f, 0x6b, 0x8d, 0xd7, 0xbc}

Due to this, when the active slave goes down, system id changes.

Shown this to Doherty, Declan <declan.doherty at intel.com>

----- Logs from DPDK application ----- 
Breakpoint 1, rx_machine (internals=0x11409edf00, slave_id=1, lacp=0x13ae8b4be)
at
/root/contrail/third_party/dpdk/drivers/net/bonding/rte_eth_bond_8023ad.c:326
326    
/root/contrail/third_party/dpdk/drivers/net/bonding/rte_eth_bond_8023ad.c: No
such file or directory.
(gdb) p/x *port
$2 = {actor_state = 0x3f, actor = {system_priority = 0xffff, system =
{addr_bytes = {0xac, 0x1f, 0x6b,
0x8d, 0xd7, 0xbd}}, key = 0x2100, port_priority = 0xff00, port_number = 0x200},
partner_state = 0x3f, partner = {system_priority = 0x7f00, system = {addr_bytes
= {0x5c, 0x45, 0x27,
0x49, 0x64, 0x8c}}, key = 0x1500, port_priority = 0x7f00, port_number =
0x1000},
sm_flags = 0x202, selected = 0x2, forced_rx_flags = 0x1, current_while_timer =
0x23e52ecad117e2,
periodic_timer = 0x23e52d82f5c5a2, wait_while_timer = 0x23dae7bc8488e2,
tx_machine_timer = 0x23e52d4e8213a1, tx_marker_timer = 0x0, aggregator_port_id
= 0x0,
mbuf_pool = 0x113f5db580, rx_ring = 0x113fa00bc0, tx_ring = 0x113fa00980,
rx_marker_timer = 0x0,
warning_timer = 0x23db47b16ff07d, warnings_to_show = 0x10, slow_pool = 0x0}

Breakpoint 1, rx_machine (internals=0x11409edf00, slave_id=0, lacp=0x13d7ce87e)
at
/root/contrail/third_party/dpdk/drivers/net/bonding/rte_eth_bond_8023ad.c:326
326     in
/root/contrail/third_party/dpdk/drivers/net/bonding/rte_eth_bond_8023ad.c
(gdb) p/x *port
$3 = {actor_state = 0x8f, actor = {system_priority = 0xffff, system =
{addr_bytes = {0xac, 0x1f, 0x6b,
0x8d, 0xd7, 0xbc}}, key = 0x2100, port_priority = 0xff00, port_number = 0x100},
partner_state = 0x3f, partner = {system_priority = 0x7f00, system = {addr_bytes
= {0x5c, 0x45, 0x27,
0x49, 0x64, 0x8c}}, key = 0x1500, port_priority = 0x7f00, port_number =
0x1100},
sm_flags = 0x202, selected = 0x2, forced_rx_flags = 0x1, current_while_timer =
0x23e537ad7f62bc,
periodic_timer = 0x23e5369a1fcadb, wait_while_timer = 0x23da0985aca4f4,
tx_machine_timer = 0x23e53665ac40d8, tx_marker_timer = 0x0, aggregator_port_id
= 0x0,
mbuf_pool = 0x114022b800, rx_ring = 0x1140607600, tx_ring = 0x11406073c0,
rx_marker_timer = 0x0,
warning_timer = 0x23e536a73cfc50, warnings_to_show = 0x10, slow_pool = 0x0}

---------


---- Logs from Juniper QFX switch ----
= =

root at a6-qfx1# run show lacp interfaces ae20
Aggregated interface: ae20
    LACP state:       Role   Exp   Def  Dist  Col  Syn  Aggr  Timeout  Activity
      xe-0/0/23      Actor    No    No   Yes  Yes  Yes   Yes     Fast    Active
      xe-0/0/23    Partner    No    No   Yes  Yes  Yes   Yes     Fast    Active
      xe-0/0/20      Actor    No    No   Yes  Yes  Yes   Yes     Fast    Active
      xe-0/0/20    Partner    No    No   Yes  Yes  Yes   Yes     Fast    Active
    LACP protocol:        Receive State  Transmit State          Mux State
      xe-0/0/23                 Current   Fast periodic Collecting distributing
      xe-0/0/20                 Current   Fast periodic Collecting distributing

[edit]
root at a6-qfx1# run show interfaces ae20 extensive | find LACP
    LACP info:        Role     System             System      Port    Port 
Port
                             priority          identifier  priority  number  
key
      xe-0/0/20.0    Actor        127  5c:45:27:49:64:8c       127      17   
21
      xe-0/0/20.0  Partner      65535  ac:1f:6b:8d:d7:bc       255       1   
33
      xe-0/0/23.0    Actor        127  5c:45:27:49:64:8c       127      16   
21
      xe-0/0/23.0  Partner      65535  ac:1f:6b:8d:d7:bc       255       2   
33
    LACP Statistics:       LACP Rx     LACP Tx   Unknown Rx   Illegal Rx
      xe-0/0/20.0          1541355     1441147            0            0
      xe-0/0/23.0          1727402     1601884            0            0
    Marker Statistics:   Marker Rx     Resp Tx   Unknown Rx   Illegal Rx
      xe-0/0/20.0                0           0            0            0
      xe-0/0/23.0                0           0            0            0
    Protocol eth-switch, MTU: 9216, Generation: 164, Route table: 0
      Flags: None


05:28:19.157776  In LACPv1, length 110
        Actor Information TLV (0x01), length 20
          System ac:1f:6b:8d:d7:bc, System Priority 65535, Key 33, Port 2, Port
Priority 255
          State Flags [Activity, Timeout, Aggregation, Synchronization,
Collecting, Distributing]
        Partner Information TLV (0x02), length 20
          System 5c:45:27:49:64:8c, System Priority 127, Key 21, Port 16, Port
Priority 127
          State Flags [Activity, Timeout, Aggregation, Synchronization,
Collecting, Distributing]
        Collector Information TLV (0x03), length 16
          Max Delay 0
        Terminator TLV (0x00), length 0


[edit]
root at a6-qfx1#

[edit]
root at a6-qfx1# set interfaces xe-0/0/20 disable

root at a6-qfx1# commit
commit complete

[edit]
root at a6-qfx1# run show lacp interfaces ae20
Aggregated interface: ae20
    LACP state:       Role   Exp   Def  Dist  Col  Syn  Aggr  Timeout  Activity
      xe-0/0/23      Actor    No    No   Yes  Yes  Yes   Yes     Fast    Active
      xe-0/0/23    Partner    No    No   Yes  Yes  Yes   Yes     Fast    Active
      xe-0/0/20      Actor    No   Yes    No   No   No   Yes     Fast    Active
      xe-0/0/20    Partner    No   Yes    No   No   No   Yes     Fast   Passive
    LACP protocol:        Receive State  Transmit State          Mux State
      xe-0/0/23                 Current   Fast periodic Collecting distributing
      xe-0/0/20           Port disabled     No periodic           Detached

[edit]
root at a6-qfx1# run show interfaces ae20 extensive | find LACP
    LACP info:        Role     System             System      Port    Port 
Port
                             priority          identifier  priority  number  
key
      xe-0/0/20.0    Actor        127  5c:45:27:49:64:8c       127      17   
21
      xe-0/0/20.0  Partner      65535  ac:1f:6b:8d:d7:bd         1      17   
33
      xe-0/0/23.0    Actor        127  5c:45:27:49:64:8c       127      16   
21
      xe-0/0/23.0  Partner      65535  ac:1f:6b:8d:d7:bd       255       2   
33 =>>> notice the change in Linux system-id “:bc to :bd”
    LACP Statistics:       LACP Rx     LACP Tx   Unknown Rx   Illegal Rx
      xe-0/0/20.0          1541397     1441186            0            0
      xe-0/0/23.0          1727471     1601948            0            0
    Marker Statistics:   Marker Rx     Resp Tx   Unknown Rx   Illegal Rx
      xe-0/0/20.0                0           0            0            0
      xe-0/0/23.0                0           0            0            0
    Protocol eth-switch, MTU: 9216, Generation: 164, Route table: 0
      Flags: None

[edit]
root at a6-qfx1#

05:28:20.306105  In LACPv1, length 110
        Actor Information TLV (0x01), length 20
          System ac:1f:6b:8d:d7:bd, System Priority 65535, Key 33, Port 2, Port
Priority 255
          State Flags [Activity, Timeout, Aggregation, Synchronization,
Collecting]
        Partner Information TLV (0x02), length 20
          System 5c:45:27:49:64:8c, System Priority 127, Key 21, Port 16, Port
Priority 127
          State Flags [Activity, Timeout, Aggregation, Synchronization]
        Collector Information TLV (0x03), length 16
          Max Delay 0
        Terminator TLV (0x00), length 0

= =
--------

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the dev mailing list