ANDU Ontap upgrade paused on node due to out of sync test SMBC relationships on source cluster
Applies to
- Ontap 9.11.x
- SMBC
- Consistency Groups
- FC
- ANDU process
- ESXi host
Issue
- Ontap upgrade failed with below error.
cluster::> ::> storage failover show
Takeover
Node Partner Possible State Description
-------------- -------------- -------- -------------------------------------
cluster-01 cluster-02 true Connected to cluster-02, Partial giveback
cluster-02 cluster-01 true Connected to cluster-01. Waiting for cluster applications to come online on the local node. Offline applications:scsi blade.
- Upgrade on Node2 completed from 9.11P5 to 9.11P8, however Node2 was not completely given back by Node1. Only Controller and root aggregates were given back to Node2 but the data aggregates still were on Node1, as the cluster applications on Node2 did not come online due to SCSI blade being offline on Node2.
- All the FCP LIFs on Node2 are in down state as the vserver on Node2 is stuck in initializing state.
- Since Node1 is holding the data aggregates of Node 2, ANDU on Node1 is in paused state.
- Aborting the vserver initialization helped to perform a complete giveback of Node:2. However, the vserver went back to initializing state on Node2 and the FC LIFs on Node 2 were still down.
- Once the complete giveback of Node2 was done, it took over Node1 to complete the upgrade, post which the upgrade was completed on Node1 from 9.11.1P5 to 9.11.1P8.
- But the vservers on both the nodes went into initializing state and the FCP LIFs were operationally down on both nodes.
- We could see
SAN SMBC cache to be initialized
errors inbcomd logs
which indicated thatscsit_san_asa_table
wasn't populated as expected.
From node 1, repeated errors:
00000018.0180bd01 070c61f6 Sat Jun 03 2023 09:34:24 +02:00 [kern_bcomd:info:6792] 0x8114ed600: 8503e8000174b44d: INFO: SAN::KACOMM::KADISPATCH: src/ka_communication/kaDispatch.cc:dispatch:953 did: 40c4a - command dispatch to node cluster-02 result: (408/9) BCOMKA internal error: operation on non-empty resource
From node 2, repeated errors:
Sat Jun 03 2023 09:33:28 +02:00 [kern_bcomd:info:6705] 0x80a035f00: 8303e90000000007: ERR: SAN::VSERVER::WORKSPACE: src/bcomd/vsWorkspace.cc:scsitCacheVolumes:1219 SCSIT asa cache verification failed: entry doesn't exist
Sat Jun 03 2023 09:33:28 +02:00 [kern_bcomd:info:6705] 0x80a035f00: 8303e90000000007: ERR: SAN::VSERVER::WORKSPACE: src/bcomd/vsWorkspace.cc:get_zrto_relationships:1272 returning: 408/92 - Internal error. Waiting for the SAN SMBC cache to be initialized.
- Test SMBC relationships created on this source cluster were in
out of sync
state. - Half configured test SMBC configuration caused all ESXI hosts to go down resulting in full outage.