CONTAP-447334: ONTAP ports in LACP ifgrp flap repeatedly
Issue
- After upgrading to ONTAP 9.16.1 or later, the following events are repeatedly seen in the EMS log::
[node-01: clock: net.ifgrp.lacp.link.inactive:error]: ifgrp a0a, port e0c has transitioned to an inactive state. The interface group is in a degraded state.
[node-01: vifmgr: vifmgr.portdown:notice]: A link down event was received on node node-01, port a0a.
[node-01: vifmgr: vifmgr.portdown:notice]: A link down event was received on node node-01, port a0a-10.
- Packet trace shows that generally, ONTAP sends LACP PDUs on a correct tempo, with 1 a second gap between packets.
- Note: Ontap side packet traces should be taken on the individual physical ports to ensure the outbound lacp traffic is captured
- This tempo holds steady for 10 seconds.
- Then ONTAP deviates from that correct tempo.
- There is a 2.6 second gap between the previous ONTAP packet and the next ONTAP packet.
- Then the correct 1 second tempo resumes for 10 seconds.
- Then the aberrant 2.6 second gap occurs, and the cycle repeats.
- As time goes on, even larger gaps are occasionally observed.
- This is only occurring when the switch is configured for lacp short timeout (commonly called fast timeout in switch vendor documentation)