Skip to main content
NetApp Knowledge Base

IO interrupted during takeover

Views:
55
Visibility:
Public
Votes:
0
Category:
fas-systems
Specialty:
HW
Last Updated:
1/24/2025, 12:56:22 AM

Applies to

  • FAS8200

Issue

  • During the takeover, all data LIFs were removed from the healthy node and re-added about one minute later.
  • This behavior resulted in a 1-minute I/O interruption
EMS
Tue Nov 19 19:10:13 [node-01: cf_worker: cf.misc.operatorTakeover:notice]: Failover monitor: takeover initiated by operator
Tue Nov 19 19:10:13 [node-01: cf_slowTimeout: cf.fsm.nfo.shtdwnReqIC:debug]: A node sent a shutdown request over interconnect.
Tue Nov 19 19:10:17 [node-01: kltp: clam.heartbeat.state.change:info]: Heartbeats to node (name=node-02, ID=1000) are Failing.
Tue Nov 19 19:10:49 [node-01: cf_worker: cf.hwassist.notifyCfgSuccess:debug]: params: {'hwtype': 'SP'}
Tue Nov 19 19:10:52 [node-01: cf_main: cf.fsm.partnerNotResponding:notice]: Failover monitor: partner not responding
 
Tue Nov 19 19:11:13 [node-01: kltp: clam.node.avail.change:debug]: The availability status of node (name=node-02, ID=1000) changed from Available to Unavailable.
Tue Nov 19 19:11:13 [node-01: kltp: clam.received.quorum:debug]: Local node received a quorum update from Cluster node (name=node-01, ID=1001).
Tue Nov 19 19:11:13 [node-01: ThreadHandlerun: clam.takeover.disallowed:info]: CLAM on node node-01 (ID=1001) cannot proceed with a takeover operation of partner node node-02 (ID=1000) because HA disallowed takeover. Reason :Already in takeover mode
Tue Nov 19 19:11:13 [node-01: ThreadHandlerun: clam.update.partner.state:info]: CLAM on node (ID=1001) updated failover state of partner (ID=1000) to to-inhibit.
Tue Nov 19 19:11:13 [node-01: clam_bg: clam.quorum.epoch:debug]: CLAM set the quorum epoch on local node to 121.
Tue Nov 19 19:11:13 [node-01: ThreadHandlerun: qmm.acn.event.recvd:debug]: params: {'hostname': 'node-02', 'nvram_id': '0x20141d99', 'clam_id': '1000', 'attribs': '0x8'}
Tue Nov 19 19:11:13 [node-01: ThreadHandlerun: qmm.acn.event.recvd:debug]: params: {'hostname': 'node-01', 'nvram_id': '0x2013e2b8', 'clam_id': '1001', 'attribs': '0xa'}
Tue Nov 19 19:11:13 [node-01: qmm_thread: qmm.ssg.grp.update:debug]:
Tue Nov 19 19:11:13 [node-01: ctran_core_0: ctran.acn.received:debug]: CTRAN has received an ACN with value 0x1.
Tue Nov 19 19:11:13 [node-01: ctran_core_0: ctran.api.state.change:debug]: CTRAN subsystem's API is now not ready.
Tue Nov 19 19:11:13 [node-01: token_mgr_admin: token.node.out.of.quorum:notice]: All token references from node (ID - 5e0d0ae4-9d2e-11eb-bc79-d039ea2a14e9) are dropped because the node went out of quorum.
Tue Nov 19 19:11:13 [node-01: mt_thread: mct.channel.destroy:notice]: Mirror Cache transport destroyed channel 'MNT-aggr1'.
Tue Nov 19 19:11:13 [node-01: ctran_core_0: ctran.jpc.assigned:info]: Cluster node (name=node-01, ID=1001) is the Join Proposal Coordinator (JPC) node.
Tue Nov 19 19:11:13 [node-01: mt_thread: mct.channel.destroy:notice]: Mirror Cache transport destroyed channel 'MNT-node-01_aggr0'.
Tue Nov 19 19:11:13 [node-01: mt_thread: mct.channel.destroy:notice]: Mirror Cache transport destroyed channel 'MNT-aggr2'.
Tue Nov 19 19:11:13 [node-01: mt_thread: mct.channel.destroy:notice]: Mirror Cache transport destroyed channel 'MNT-node-02_aggr0'.
Tue Nov 19 19:11:13 [node-01: ctran_core_0: ctran.api.state.change:debug]: CTRAN subsystem's API is now ready.
Tue Nov 19 19:11:13 [node-01: rastrace_dump: rastrace.dump.saved:debug]: A RAS trace dump for module SMN instance 0 was stored in /etc/log/rastrace/SMN_0_20241119_19:11:13:377368.dmp.
Tue Nov 19 19:11:17 [node-01: kltp: clam.master.epoch.change:debug]: CLAM changed the epoch of the local node's master from 120 to 77.
Tue Nov 19 19:11:22 [node-01: kltp: clam.master.epoch.change:debug]: CLAM changed the epoch of the local node's master from 77 to 121
Tue Nov 19 19:11:31 [node-01: vifmgr: vifmgr.lifBeingRemoved:notice]: LIF data_lif1_node1 (on virtual server 3), IP address xx.xxx.xx.81, is being removed from node node-01, port a2a.
Tue Nov 19 19:11:31 [node-01: vifmgr: vifmgr.lifBeingRemoved:notice]: LIF data_lif1_node2 (on virtual server 3), IP address xx.xxx.xx.82, is being removed from node node-01, port a2a.
Tue Nov 19 19:11:31 [node-01: vifmgr: vifmgr.lifBeingRemoved:notice]: LIF data_lif2_node1 (on virtual server 4), IP address xx.xxx.xx.87, is being removed from node node-01, port a2a.
Tue Nov 19 19:11:31 [node-01: vifmgr: vifmgr.lifBeingRemoved:notice]: LIF data_lif2_node2 (on virtual server 4), IP address xx.xxx.xx.88, is being removed from node node-01, port a2a.
Tue Nov 19 19:11:31 [node-01: vifmgr: vifmgr.lifBeingRemoved:notice]: LIF data_lif3_node1 (on virtual server 5), IP address xx.xxx.xx.91, is being removed from node node-01, port a2a.
Tue Nov 19 19:11:31 [node-01: vifmgr: vifmgr.lifBeingRemoved:notice]: LIF data_lif3_node2 (on virtual server 5), IP address xx.xxx.xx.92, is being removed from node node-01, port a2a.
 
Tue Nov 19 19:12:11 [node-01: cf_takeover: cf.fm.takeoverStarted:notice]: Failover monitor: takeover started
 
Tue Nov 19 19:12:16 [node-01: cf_takeover: cf.fm.takeoverComplete:notice]: Failover monitor: takeover completed
Tue Nov 19 19:12:16 [node-01: cf_takeover: cf.fm.takeoverDuration:info]: Failover monitor: takeover duration time is 5 seconds.
 
Tue Nov 19 19:12:24 [node-01: wafl_spcd_main: monitor.volumes.one.ok:debug]: Volume vol0 is OK.
Tue Nov 19 19:12:24 [node-01: wafl_spcd_main: monitor.volumes.one.ok:debug]: Aggregate node-02_aggr0 is OK.
Tue Nov 19 19:12:24 [node-01: vifmgr: vifmgr.lifsuccessfullymoved:notice]: LIF data_lif1_node1 (on virtual server 3), IP address xx.xxx.xx.81, is now hosted on node node-01, port a2a.
Tue Nov 19 19:12:24 [node-01: vifmgr: vifmgr.lifsuccessfullymoved:notice]: LIF data_lif1_node2 (on virtual server 3), IP address xx.xxx.xx.82, is now hosted on node node-01, port a2a.
Tue Nov 19 19:12:24 [node-01: vifmgr: vifmgr.lifsuccessfullymoved:notice]: LIF data_lif2_node1 (on virtual server 4), IP address xx.xxx.xx.87, is now hosted on node node-01, port a2a.
Tue Nov 19 19:12:24 [node-01: vifmgr: vifmgr.lifsuccessfullymoved:notice]: LIF data_lif2_node2 (on virtual server 4), IP address xx.xxx.xx.88, is now hosted on node node-01, port a2a.
Tue Nov 19 19:12:24 [node-01: vifmgr: vifmgr.lifsuccessfullymoved:notice]: LIF data_lif3_node1 (on virtual server 5), IP address xx.xxx.xx.91, is now hosted on node node-01, port a2a.
Tue Nov 19 19:12:24 [node-01: vifmgr: vifmgr.lifsuccessfullymoved:notice]: LIF data_lif3_node2 (on virtual server 5), IP address xx.xxx.xx.92, is now hosted on node node-01, port a2a.
  • The LIF removal was caused by a delayed takeover, and this delayed takeover is due to the slow shutdown process on the shutdown node
node-0102::*> takeover -ofnode node-02
  (storage failover takeover)
 
node-0102::*> Terminated
.
Uptime: 22d23h2m9s
System rebooting...
BIOS Version: 11.9
Portions Copyright (C) 2014-2018 NetApp, Inc. All Rights Reserved.
 
Initializing System Memory ...
Loading Device Drivers ...
Configuring Devices ...
 
CPU = 1 Processor(s) Detected.
  Intel(R) Xeon(R) CPU D-1587 @ 1.70GHz (CPU 0)
  CPUID: 0x00050664. Cores per Processor = 16
131072 MB System RAM Installed.
SATA (AHCI) Device: SV9MST6D120GLM41NP
 
Boot Loader version 6.0.10 
Copyright (C) 2000-2003 Broadcom Corporation.
Portions Copyright (C) 2002-2020 NetApp, Inc. All Rights Reserved.
 
Starting AUTOBOOT press Ctrl-C to abort...
Loading X86_64/freebsd/image2/kernel:0x200000/1091846 0x30b000/10937088 0xf79300/3783464 0x1314e28/13716632 0x200200/1016 Entry at 0xffffffff8030b000
Loading X86_64/freebsd/image2/platform.ko:0x202a000/4412224 0x245f340/596200 
Starting program at 0xffffffff8030b000
NetApp Data ONTAP 9.7P12
arc4random: no preloaded entropy cache
ena_rss_init_default_deferred() [TID:100000]: No devclass ena
 
arc4random: no preloaded entropy cache
arc4random: no preloaded entropy cache
arc4random: no preloaded entropy cache
arc4random: no preloaded entropy cache
arc4random: no preloaded entropy cache
arc4random: no preloaded entropy cache
arc4random: no preloaded entropy cache
arc4random: no preloaded entropy cache
arc4random: no preloaded entropy cache
arc4random: no preloaded entropy cache
arc4random: no preloaded entropy cache
arc4random: no preloaded entropy cache
arc4random: no preloaded entropy cache
arc4random: no preloaded entropy cache
arc4random: no preloaded entropy cache
Copyright (C) 1992-2021 NetApp.
All rights reserved.
*******************************
*                             *
* Press Ctrl-C for Boot Menu. *
*                             *
*******************************
cryptomod_fips: Executing Crypto FIPS Self Tests.
cryptomod_fips: Crypto FIPS self-test: 'CPU COMPATIBILITY' passed.
cryptomod_fips: Crypto FIPS self-test: 'AES-128 ECB, AES-256 ECB' passed.
cryptomod_fips: Crypto FIPS self-test: 'AES-128 CBC, AES-256 CBC' passed.
cryptomod_fips: Crypto FIPS self-test: 'AES-128 GCM, AES-256 GCM' passed.
cryptomod_fips: Crypto FIPS self-test: 'AES-128 CCM' passed.
cryptomod_fips: Crypto FIPS self-test: 'CTR_DRBG' passed.
cryptomod_fips: Crypto FIPS self-test: 'KDF' passed.
cryptomod_fips: Crypto FIPS self-test: 'SHA1, SHA256, SHA512' passed.
cryptomod_fips: Crypto FIPS self-test: 'HMAC-SHA1, HMAC-SHA256, HMAC-SHA512' passed.
cryptomod_fips: Crypto FIPS self-test: 'PBKDF2' passed.
cryptomod_fips: Crypto FIPS self-test: 'AES-XTS 128, AES-XTS 256' passed.
cryptomod_fips: Crypto FIPS self-test: 'Self-integrity' passed.
Tue Nov 19 10:13:18 2024 [nv2flash.restage.progress:NOTICE]: ReStage is not needed because the flash has no data.
 
Nov 19 19:14:03 Power outage protection flash de-staging: 17 cycle
Pensando Offload Driver, ver 1.0.1-E-31
Pensando Ethernet NIC Driver, ver: 1.0.1-E-39
ionic_rdma ver 1.0.1-E-39 : Pensando RoCE HCA driver
***OS2SP configured successfully***Nov 19 19:15:20 [node-02:fal_nvme.partition.status:notice]: Partition 0-1 with capacity 894 GiB status: rewarm.
Nov 19 19:15:20 [node-02:fal_nvme.partition.status:notice]: Partition 0-1 with capacity 894 GiB status: rewarm.
Reservation conflict found on this node's disks! 
Local System ID: 538189209
Press Ctrl-C for Maintenance menu to release disks.
Nov 19 19:15:38 [node-02:cf.fmns.skipped.disk:notice]: While releasing the reservations in "Waiting For Giveback" state Failover Monitor Node State(fmns) module skipped the disk 0d.22.21 that is owned by 538189209 and reserved by 538174136.
pnso provider init started.
pnso init failed in pnso_init() : 35
hwo: Node is using hardware provider : 1.
 
sk_allocate_memory: large allocation, bzero 7160 MB in  906 ms
cryptomod_fips: Executing Crypto FIPS Self Tests.
cryptomod_fips: Crypto FIPS self-test: 'CPU COMPATIBILITY' passed.
cryptomod_fips: Crypto FIPS self-test: 'AES-128 ECB, AES-256 ECB' passed.
cryptomod_fips: Crypto FIPS self-test: 'AES-128 CBC, AES-256 CBC' passed.
cryptomod_fips: Crypto FIPS self-test: 'AES-128 GCM, AES-256 GCM' passed.
cryptomod_fips: Crypto FIPS self-test: 'AES-128 CCM' passed.
cryptomod_fips: Crypto FIPS self-test: 'CTR_DRBG' passed.
cryptomod_fips: Crypto FIPS self-test: 'KDF' passed.
cryptomod_fips: Crypto FIPS self-test: 'SHA1, SHA256, SHA512' passed.
cryptomod_fips: Crypto FIPS self-test: 'HMAC-SHA1, HMAC-SHA256, HMAC-SHA512' passed.
cryptomod_fips: Crypto FIPS self-test: 'PBKDF2' passed.
cryptomod_fips: Crypto FIPS self-test: 'AES-XTS 128, AES-XTS 256' passed.
cryptomod_fips: Crypto FIPS self-test: 'Self-integrity' passed.
Nov 19 19:15:43 [node-02:raid.autoPart.disabled:ALERT]: Disk auto-partitioning is disabled on this system: the system needs a minimum of 8 usable internal hard disks.
Nov 19 19:15:43 [node-02:callhome.raid.adp.disabled:ALERT]: Disk auto-partitioning is disabled on this system: ADP DISABLED.
Disk reservations have been released
Waiting for giveback...(Press Ctrl-C to abort wait)

 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.