ANDU paused due to the failure of cluster management LIF migration
Applies to
- ONTAP 9
- Automated Non-disruptive Upgrade (ANDU)
Issue
- ANDU is paused with the following events:
[Node-01: upgrademgr: upgrademgr.update.pausedErr:debug]: The automated update of the cluster has been paused due to the following reason: Node "Node-02": Error: {Failed to migrate data LIFs from node "Node-01".}, Action: {Migrate all of the data LIFs using the "network interface migrate-all -node Node-01" command.}.
[Node-01: notifyd: callhome.andu.pausederr:alert]: params: {'epoch': '68XXXXXd-5XX6-4XX6-a068-6XXXXXXXXXb5', 'subject': 'AUTOMATED NDU PAUSED ON NODE: Node-02'}
Sat Jan 18 10:24:44 +0530 [Node-01: vifmgr: vifmgr.lifs.noredundancy:alert]: No redundancy in the failover configuration for 2 LIFs assigned to node "Node-01". LIFs: xxxx:Node-01_mgmt1, XXXX:cluster_mgmt
Sat Jan 18 10:24:44 +0530 [Node-01: vifmgr: vifmgr.lif.subnetMisconfig:error]: LIFs in subnet 10.254.xx.xx/23 of IPspace "Default" are configured on ports in multiple broadcast domains: Default, Default-3
- The failover targets for the data LIFs are properly defined and they are migrated successfully.
- From the
VIFMGR
log it was we seen that Node-02 has an undefined status for the node management LIFs of both nodes as well as the cluster management LIF:
00000004.000035f3 02a0312c Thu Sep 28 2023 11:16:40 -07:00 [kern_vifmgr:info:6907] [0x80ac8fa00] [anon-ns::table_to_vifmgr_log] 1013 8 - undef active 4294967295 10.20.225.72 255.255.255.128 - - - Node-02_mgmt1 local-only up 0 mgmt true false - false 101 8 Default-1 - - - fc07ebda-11c9-11ee-ac23-d039eaa8067f - - - up - true - - - - - - 4294967295 - - - - 4 - 1687957225 true Node-02 -
00000004.000035f4 02a0312c Thu Sep 28 2023 11:16:40 -07:00 [kern_vifmgr:info:6907] [0x80ac8fa00] [anon-ns::table_to_vifmgr_log] 1022 1 - undef active 4294967295 10.20.225.71 255.255.255.128 - - - Node-01_mgmt1 local-only up 0 mgmt true false - false 101 1 Default - - - 5c586a2d-11c9-11ee-bcea-d039eaa7fe69 - - - up - true - - - - - - 4294967295 - - - - 4 - 1687957352 true Node-01 -
00000004.000035f7 02a0312c Thu Sep 28 2023 11:16:40 -07:00 [kern_vifmgr:info:6907] [0x80ac8fa00] [anon-ns::table_to_vifmgr_log] 1025 1 - undef active 4294967295 10.20.225.70 255.255.255.128 - - - cluster_mgmt broadcast-domain-wide up 0 mgmt true false - false 101 1 Default - - 5e998d12-11c9-11ee-bcea-d039eaa7fe69 a10dc3f5-11c9-11ee-bcea-d039eaa7fe69 - - - up - true - - - - - - 4294967295 - - - false 4 - - false Node1-01 -
- Examination of the broadcast domains show that the e0M port for Node-02 is not in the expected Default broadcast domain:
IPspace Name Cluster Default Default Default
Layer 2 Broadcast Domain Cluster Default Default-1 SVM
Broadcast Domain ID 1 2 3 4
Configured MTU 9000 1500 1500 1500
Ports Node-01:e0a
Node-01:e0b
Node-02:e0a Node-01:e0M Node-02:e0M Node-01:a0a
Node-02:e0b
- As such, the cluster management LIF could not be migrated to Node-02 during the ANDU process, causing the ANDU to pause.
- In another scenario, the e0M port of Node-02 was not in any broadcast domain,which was due to the nodes having been joined after the cluster was initially setup.
IPspace Name Cluster Default Default
Layer 2 Broadcast Domain Cluster Default-1 SVM
Broadcast Domain ID 1 2 3
Configured MTU 9000 1500 1500
Ports Node-01:e0a
Node-01:e0b
Node-02:e0a Node-01:e0M Node-01:a0a
Node-02:e0b <No entry for Node -02>