ONTAP Node fails and won't start with error on boot: “PANIC: Process vifmgr unresponsive for xxx seconds in process nodewatchdog on release 9.x”
Applies to
- ONTAP 9.x
- Data ONTAP 8.x
- Data ONTAP operating in 7-Mode
Issue
- The node panicked due to vifmgr not responding before the watchdog timed out and was unable to recover the mdbs when it rebooted:
Panic String: PANIC : Process vifmgr unresponsive for 629 seconds version: 9.1P12
-
The root volume filled up due to high snapshot delta and snapshot space utilization on vol0 resulting from rolling packet trace that was running for 2 weeks:
Mon Mar 30 08:28:37 CDT [nodename: rshd_0: kern.cli.cmd:debug]: Command-line input: The command is 'pktt'. The full command line is 'pktt start a0a-10 -d /etc/crash -m 9018 -b 8m -s 2g -r 12'.
- Immediately prior to the panic the console log showed that vifmgr and vldb crashed and were unable to restart:
Apr 13 00:49:45 [nodename:spm.vldb.process.exit:EMERGENCY]: Volume Location Database(VLDB) subsystem with ID 34409 exited as a result of signal normal exit (1). The subsystem will attempt to restart.
Apr 13 00:49:47 [nodename:spm.vifmgr.process.exit:EMERGENCY]: Logical Interface Manager(VifMgr) with ID 34415 aborted as a result of signal normal exit (1). The subsystem will attempt to restart.
-
When the node reboots it is unable to recovery the mdbs due to lack of space on vol0:
Apr 13 02:54:46 [nodename:callhome.mdb.recovery.unsuccessful:EMERGENCY]: Call home for MDB RECOVERY UNSUCCESSFUL FOR THE notifyd WARNING.
ln: /var/zoneinfo/zoneinfo: No space left on device
root: Unable to ln /mroot/etc/zoneinfo to /var/zoneinfo - error code(1)
/usr/bin/plxcoeff_log: cannot create /mroot/etc/log/plxcoeff/plxcoeff.log.tmp: No space left on devicestat: /mroot/etc/log/plxcoeff/plxcoeff.log.tmp: stat: No such file or directory