Skip to main content
NetApp Knowledge Base

ANDU Ontap upgrade paused on node due to out of sync test SMBC relationships on source cluster

Views:
286
Visibility:
Public
Votes:
0
Category:
fas-systems
Specialty:
san
Last Updated:

Applies to

  • Ontap 9.11.x
  • SMBC
  • Consistency Groups
  • FC
  • ANDU process
  • ESXi host

Issue

  • Ontap upgrade failed with below error.

cluster::> ::> storage failover show
                              Takeover
Node           Partner        Possible State Description
-------------- -------------- -------- -------------------------------------
cluster-01  cluster-02  true     Connected to cluster-02, Partial giveback
cluster-02  cluster-01  true     Connected to cluster-01. Waiting  for cluster applications to                                        come online on the local node. Offline applications:scsi blade.

  • Upgrade on Node2 completed from 9.11P5 to 9.11P8, however Node2 was not completely given back by Node1. Only Controller and root aggregates were given back to Node2 but the data aggregates still were on Node1, as the cluster applications on Node2 did not come online due to SCSI blade being offline on Node2.
  • All the FCP LIFs on Node2 are in down state as the vserver on Node2 is stuck in initializing state.
  • Since Node1 is holding the data aggregates of Node 2, ANDU on Node1 is in paused state.
  • Aborting the vserver initialization helped to perform a complete giveback of Node:2. However, the vserver went back to initializing state on Node2 and the FC LIFs on Node 2 were still down.
  • Once the complete giveback of Node2 was done, it took over Node1 to complete the upgrade, post which the upgrade was completed on Node1 from 9.11.1P5 to 9.11.1P8.
  • But the vservers on both the nodes went into initializing state and the FCP LIFs were operationally down on both nodes.
  • We could see SAN SMBC cache to be initialized errors in bcomd logs which indicated that scsit_san_asa_table wasn't populated as expected.

From node 1, repeated errors:
00000018.0180bd01 070c61f6 Sat Jun 03 2023 09:34:24 +02:00 [kern_bcomd:info:6792] 0x8114ed600: 8503e8000174b44d: INFO: SAN::KACOMM::KADISPATCH: src/ka_communication/kaDispatch.cc:dispatch:953 did: 40c4a - command dispatch to node cluster-02 result: (408/9) BCOMKA internal error: operation on non-empty resource

From node 2, repeated errors:
Sat Jun 03 2023 09:33:28 +02:00 [kern_bcomd:info:6705] 0x80a035f00: 8303e90000000007: ERR: SAN::VSERVER::WORKSPACE: src/bcomd/vsWorkspace.cc:scsitCacheVolumes:1219 SCSIT asa cache verification failed: entry doesn't exist
Sat Jun 03 2023 09:33:28 +02:00 [kern_bcomd:info:6705] 0x80a035f00: 8303e90000000007: ERR: SAN::VSERVER::WORKSPACE: src/bcomd/vsWorkspace.cc:get_zrto_relationships:1272 returning: 408/92 - Internal error. Waiting for the SAN SMBC cache to be initialized.

  • Test SMBC relationships created on this source cluster were in out of sync state.   
  • Half configured test SMBC configuration caused all ESXI hosts to go down resulting in full outage.

 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.