Cluster join fails due to vifmgr process failure
Applies to
- ONTAP 9.0
- ONTAP 9.1
Joining a new node to an ONTAP 9.0 or 9.1 cluster may fail with the following error message:
Updating LIF Manager ........................Error: Failed to create Default Broadcast domain. Timeout: Operation "vifmgr_broadcast_domain_perform_cluster_join_iterator::create_imp()" took longer than 25 seconds to complete.
To confirm the root cause, check the following:
1. Confirm join has failed on the update-default-broadcast-domain task
cluster::*> set -privilege diagnostic
cluster::*> debug cluster-join show
Task ID SubTask ID Status Tries Failures
-------------------- ------------------------- ---------- ----- -------
pre-setup check-unused-cluster-ports success 20 0
pre-setup mtu-check success 20 0
pre-setup ping-local success 20 0
pre-setup ping-remote success 20 0
pre-setup mtu-subnet-test success 20 0
pre-setup rpc-check success 20 0
pre-setup capability-check success 20 0
network-setup check-node-mgmt-mtu success 20 0
network-setup rename-lifs-nodeuuid success 20 0
network-setup relabel-lifs success 20 0
network-setup ping-local2 success 20 0
network-setup limit-check success 20 0
node-check check-for-mroot success 20 0
node-check ha-mode-check success 20 0
node-check sfo-partner-check success 20 0
node-check platform-check success 20 0
node-check license-check success 20 0
node-check join-switchless success 20 0
node-check get-node-time success 20 0
node-check get-node-name success 20 0
node-check check-node-name success 20 0
node-check resolve-aggr-names success 20 0
node-check cluster-ha-check success 20 0
system-initialize system-initialize success 20 0
cluster-join join-site-list success 20 0
cluster-join wait-for-rdb-online success 20 0
cluster-join wait-for-rdb-databases success 20 0
cluster-join create_cluster_version_entries success 20 0
cluster-join upload-capability success 20 0
system-startup system-startup success 20 0
check-cluster-apps vldb success 20 0
check-cluster-apps lifmgr success 20 0
check-cluster-apps bcom success 20 0
vldb-update register-aggregates success 20 0
vldb-update register-volumes success 20 0
vifmgr-update update-default-broadcast-domain failure 20 1
nonshared-clus-setup nonshared-clus-setup - 0 0
miscellaneous rename-lifs-nodename - 0 0
miscellaneous get-location - 0 0
miscellaneous register-mgwd-dsmfp-service - 0 0
miscellaneous file-replication - 0 0
miscellaneous subscribe-host-based-keys - 0 0
miscellaneous subscribe-systemshell-ssh-keys - 0 0
miscellaneous motd-and-banner-join - 0 0
miscellaneous nvfail-setup - 0 0
miscellaneous node-http-config - 0 0
miscellaneous upload-licenses-v2 - 0 0
miscellaneous remove-precluster-cert - 0 0
miscellaneous bandwidth-check - 0 0
miscellaneous dummy-task - 0 0
finished finished failure 20 2
2. Search /mroot/etc/log/mlog/vifmgr.log for "fg_update_join" messages where the number of seconds is large and keeps increasing:
[src/rdb/ 2916 (0x8127f0100)]: RW-transaction TID <16,17042,17042> held by client 1022 for 81795 seconds (created: 1995072s, now: 2076867s) (label 'fg_update_join').
[src/rdb/ 2916 (0x811c0ec00)]: RW-transaction TID <16,17042,17042> held by client 1022 for 81805 seconds (created: 1995072s, now: 2076877s) (label 'fg_update_join').