System panics while processing storage failover giveback
Applies to
- PANIC during giveback
- FAS8200
Issue
- The system boot up to Waiting for Giveback, after the giveback process, the system encounters an e1a port failure, resulting in PANIC and reverting back to the Waiting for Giveback state.
[2023-04-29 11:24:26.764] Disk reservations have been released
[2023-04-29 11:24:38.256] Waiting for giveback...(Press Ctrl-C to abort wait)Continuing boot...
[2023-04-29 11:25:18.647] Apr 29 11:25:18 [node2:cf.fm.discardNvram:notice]: Failover monitor: node was previously taken over, nvram may be discarded
[2023-04-29 11:25:41.126] Apr 29 11:25:40 [node2:cf.ic.xferTimedOut:error]: HA interconnect: ofw transfer timed out.
[2023-04-29 11:25:41.178] cf: WARNING CF monitor fast timeout was blocked for 15 secs, unexpected takeover may occur
[2023-04-29 11:25:41.180] Apr 29 11:25:40 [node2:cf.fm.slowTimeoutBlocked:notice]: High Availability slow timeout was blocked for 17 secs.
[2023-04-29 11:25:41.220] Apr 29 11:25:40 [node2:netif.uncorEccError:EMERGENCY]: Unrecoverable ECC error on network interface e1a.
[2023-04-29 11:25:41.222] Apr 29 11:25:40 [node2:cf.fm.fastTimeoutBlocked:error]: WARNING failover monitor fast timeout was blocked for 15 secs
[2023-04-29 11:25:41.251] Apr 29 11:25:40 [node2:cf.fm.hogger:error]: Failover monitor: Process nblade1 ran continuously for 15530 ms.
[2023-04-29 11:25:41.904] Apr 29 11:25:41 [node2:wafl.transition.cp.completed:notice]: Transition CP with reason flush_b4_mounted, 00000000 for replaying=0,0 unmounting=0,0 total=2,1 volumes with a total of total=72 incoming=3 dirty buffers took 23247ms with longest CP phases being CP_P2V_REFCOUNT=16520, CP_P2V_PRE_BLOG=5639, CP_P2V_BM=685 on aggregate node2_aggr00.
[2023-04-29 11:25:42.064] Apr 29 11:25:41 [node2:kern.syslog.msg:notice]: The system was down for 6 seconds
[2023-04-29 11:25:42.076] boot_from_disk:last_booted_OS:9.3P21
[2023-04-29 11:25:43.156] Apr 29 11:25:42 [node2:cf.ic.xferTimedOut:error]: HA interconnect: wafl transfer timed out.
[2023-04-29 11:25:43.170] Apr 29 11:25:42 [node2:cf.fsm.takeoverOfPartnerEnabled:notice]: Failover monitor: takeover of node1 enabled
[2023-04-29 11:25:44.123] fmhaosc_is_odm_platform Read bootarg:haosc-odm-plat value:(null)
[2023-04-29 11:25:44.168] Apr 29 11:25:43 [node2:cf.fsm.takeoverOfPartnerDisabled:error]: Failover monitor: takeover of node1 disabled (unsynchronized log).
[2023-04-29 11:25:44.413] Apr 29 11:25:43 [node2:kern.syslog.msg:notice]: domain xing mode: off, domain xing interrupt: false
[2023-04-29 11:25:44.510] Apr 29 11:25:43 [node2:dfu.firmwareUpToDate:notice]: Firmware is up-to-date on all eligible disks.
[2023-04-29 11:25:44.617] Apr 29 11:25:43 [node2:wafl.transition.cp.completed:notice]: Transition CP with reason none, 00000000 for replaying=0,0 unmounting=0,0 total=2,1 volumes with a total of total=330 incoming=218 dirty buffers took 173ms with longest CP phases being CP_P2V_BM=84, CP_P1_CLEAN=50, CP_P2_FLUSH=23 on aggregate node2_aggr00.
[2023-04-29 11:27:19.314]
[2023-04-29 11:27:20.052] Sat Apr 29 11:27:18 JST 2023
[2023-04-29 11:27:20.077] SP-login: login: PANIC : process on cpu13 hung (nblade1) for 5004 milliseconds!
[2023-04-29 11:27:43.637] version: 9.3P21: Mon Jan 11 12:28:03 EST 2021
[2023-04-29 11:27:43.639] conf : x86_64.optimize
[2023-04-29 11:27:43.690] cpuid = 13
*
*
*
[2023-04-29 11:31:53.212] *******************************
[2023-04-29 11:31:53.223] * *
[2023-04-29 11:31:53.225] * Press Ctrl-C for Boot Menu. *
[2023-04-29 11:31:53.235] * *
[2023-04-29 11:31:53.237] *******************************
[2023-04-29 11:31:53.400] cryptomod_fips: Executing Crypto FIPS Self Tests.
[2023-04-29 11:31:53.419] cryptomod_fips: Crypto FIPS self-test: 'CPU COMPATIBILITY' passed.
[2023-04-29 11:31:53.439] cryptomod_fips: Crypto FIPS self-test: 'AES-128 ECB, AES-256 ECB' passed.
[2023-04-29 11:31:53.458] cryptomod_fips: Crypto FIPS self-test: 'AES-128 CBC, AES-256 CBC' passed.
[2023-04-29 11:31:53.471] cryptomod_fips: Crypto FIPS self-test: 'CTR_DRBG' passed.
[2023-04-29 11:31:53.480] cryptomod_fips: Crypto FIPS self-test: 'SHA1, SHA256, SHA512' passed.
[2023-04-29 11:31:53.509] cryptomod_fips: Crypto FIPS self-test: 'HMAC-SHA1, HMAC-SHA256, HMAC-SHA512' passed.
[2023-04-29 11:31:53.599] cryptomod_fips: Crypto FIPS self-test: 'PBKDF2' passed.
[2023-04-29 11:31:53.611] cryptomod_fips: Crypto FIPS self-test: 'AES-XTS 128, AES-XTS 256' passed.
[2023-04-29 11:31:53.631] cryptomod_fips: Crypto FIPS self-test: 'Self-integrity' passed.
[2023-04-29 11:31:53.956] Sat Apr 29 02:31:54 2023 [nv2flash.restage.progress:NOTICE]: ReStage is not needed because the flash has no data.
[2023-04-29 11:31:54.283] Attempting to use existing varfs on /dev/nvrd1
[2023-04-29 11:32:05.793] ifconfig: interface e5a does not exist
[2023-04-29 11:32:05.813]
[2023-04-29 11:32:05.814] ifconfig: interface e5b does not exist
[2023-04-29 11:32:05.819]
[2023-04-29 11:32:08.874] Apr 29 11:32:09 Power outage protection flash de-staging: 16 cycles
[2023-04-29 11:33:10.172] ***OS2SP configured successfully***Reservation conflict found on this node's disks!
[2023-04-29 11:33:35.062] Local System ID: XXXXXXXXX
[2023-04-29 11:33:35.065] Apr 29 11:33:35 [node2:cf.fmns.skipped.disk:notice]: While releasing the reservations in "Waiting For Giveback" state Failover Monitor Node State(fmns) module skipped the disk 0c.20.2 that is owned by XXXXXXXX and reserved by XXXXXXXXX.
[2023-04-29 11:33:35.116] Press Ctrl-C for Maintenance menu to release disks.
[2023-04-29 11:33:39.139]
[2023-04-29 11:33:39.145] sk_allocate_memory: large allocation, bzero 7810 MB in 988 ms
[2023-04-29 11:33:41.866] cryptomod_fips: Executing Crypto FIPS Self Tests.
[2023-04-29 11:33:41.875] cryptomod_fips: Crypto FIPS self-test: 'CPU COMPATIBILITY' passed.
[2023-04-29 11:33:41.902] cryptomod_fips: Crypto FIPS self-test: 'AES-128 ECB, AES-256 ECB' passed.
[2023-04-29 11:33:41.922] cryptomod_fips: Crypto FIPS self-test: 'AES-128 CBC, AES-256 CBC' passed.
[2023-04-29 11:33:41.936] cryptomod_fips: Crypto FIPS self-test: 'CTR_DRBG' passed.
[2023-04-29 11:33:41.943] cryptomod_fips: Crypto FIPS self-test: 'SHA1, SHA256, SHA512' passed.
[2023-04-29 11:33:41.963] cryptomod_fips: Crypto FIPS self-test: 'HMAC-SHA1, HMAC-SHA256, HMAC-SHA512' passed.
[2023-04-29 11:33:42.058] cryptomod_fips: Crypto FIPS self-test: 'PBKDF2' passed.
[2023-04-29 11:33:42.077] cryptomod_fips: Crypto FIPS self-test: 'AES-XTS 128, AES-XTS 256' passed.
[2023-04-29 11:33:42.096] cryptomod_fips: Crypto FIPS self-test: 'Self-integrity' passed.
[2023-04-29 11:33:42.183] AutoPartAFFDetermination: total_disks: 48 num_internal_disks: 0 num_ssds: 0 num_unknowns: 0 num_mediator_disks: 0 num_not_supported: 0 all_ssd? false
[2023-04-29 11:33:44.178] Disk reservations have been released
[2023-04-29 11:33:55.305] Waiting for giveback...(Press Ctrl-C to abort wait)Continuing boot...
- Attempting to giveback again, but the system encounters another PANIC and taken over by node2 once again.
2023-04-29 11:33:55.305] Waiting for giveback...(Press Ctrl-C to abort wait)Continuing boot...
[2023-04-29 11:36:52.262] Apr 29 11:36:52 [node2:cf.fm.discardNvram:notice]: Failover monitor: node was previously taken over, nvram may be discarded
[2023-04-29 11:36:56.351] Apr 29 11:36:56 [node2:cf.ic.xferTimedOut:error]: HA interconnect: ofw transfer timed out.
[2023-04-29 11:37:23.694] cf: WARNING CF monitor fast timeout was blocked for 24 secs, unexpected takeover may occur
[2023-04-29 11:37:23.694] Apr 29 11:37:24 [node2:cf.fm.slowTimeoutBlocked:notice]: High Availability slow timeout was blocked for 26 secs.
[2023-04-29 11:37:23.725] Apr 29 11:37:24 [node2:cf.fm.fastTimeoutBlocked:error]: WARNING failover monitor fast timeout was blocked for 24 secs
[2023-04-29 11:37:23.741] Apr 29 11:37:24 [node2:cf.fm.hogger:error]: Failover monitor: Process nblade1 ran continuously for 24709 ms.
[2023-04-29 11:37:24.460] Apr 29 11:37:24 [node2:wafl.cp.toolong:error]: Aggregate node2_aggr00 experienced a long CP.
[2023-04-29 11:37:24.491] Apr 29 11:37:24 [node2:wafl.transition.cp.completed:notice]: Transition CP with reason flush_b4_mounted, 00000000 for replaying=0,0 unmounting=0,0 total=2,1 volumes with a total of total=71 incoming=3 dirty buffers took 32203ms with longest CP phases being CP_P2V_SNAP=24735, CP_P2V_BM=6485, CP_P2V_VOLINFO=694 on aggregate node2_aggr00.
[2023-04-29 11:37:25.632] Apr 29 11:37:26 [node2:cf.ic.xferTimedOut:error]: HA interconnect: wafl transfer timed out.
[2023-04-29 11:37:25.679] Apr 29 11:37:26 [node2:kern.syslog.msg:notice]: The system was down for 10 seconds
[2023-04-29 11:37:25.694] Apr 29 11:37:26 [node2:netif.uncorEccError:EMERGENCY]: Unrecoverable ECC error on network interface e1a.
[2023-04-29 11:37:26.226] Apr 29 11:37:26 [node2:dfu.firmwareUpToDate:notice]: Firmware is up-to-date on all eligible disks.
[2023-04-29 11:37:26.273] fmhaosc_is_odm_platform Read bootarg:haosc-odm-plat value:(null)
[2023-04-29 11:37:26.304] Apr 29 11:37:26 [node2:cf.fsm.takeoverOfPartnerDisabled:error]: Failover monitor: takeover of node1 disabled (unsynchronized log).
[2023-04-29 11:37:26.319] Apr 29 11:37:26 [node2:kern.syslog.msg:notice]: domain xing mode: off, domain xing interrupt: false
[2023-04-29 11:37:26.366] Apr 29 11:37:26 [node2:extCache.rw.log.open:notice]: WAFL external cache log could not be opened: aggregate node2_aggr00, log ec_tagstore.
[2023-04-29 11:37:26.382] Apr 29 11:37:26 [node2:extCache.rw.canceled:notice]: WAFL external cache reconstruct was canceled.
[2023-04-29 11:38:53.582]
[2023-04-29 11:38:53.589] Sat Apr 29 11:38:54 JST 2023
[2023-04-29 11:38:53.591] SP-login: login: PANIC : process on cpu0 hung (nblade1) for 5007 milliseconds!
[2023-04-29 11:39:33.761] version: 9.3P21: Mon Jan 11 12:28:03 EST 2021
[2023-04-29 11:39:33.770] conf : x86_64.optimize
[2023-04-29 11:39:33.805] cpuid = 0