Node unresponsive "Node accessible via HA-IC, but cluster access failed"
Applies to
- ONTAP 9.9.1
- SnapMirror Business Continuity (SM-BC)
Issue
- One node of a 2-node cluster stopped responding and was not serving data
- Cluster commands complete but cannot return correct status for the aggregates on the problem node:
::> aggr show
Info: Failed to get the information for aggregate aggr1. Reason: ZSM - failed, status code = 572, extra = RPC: Unable to receive [from mgwd on node "node1" (VSID -1) to kernel at 127.0.0.1], took 0.001s, max 110s [127.0.0.1:000].
::> lun show
This table is currently empty.
Warning: the LUN inventory is not available for the following volumes:
Volume "vol1" in Vserver "svm1". Reason: RPC: Unable to receive [from mgwd on node "node1" (VSID: -1) to kernel at 127.0.0.1].
- Running nodeshell commands against the problem node causes the console to stop responding
- Node panics:
Node node1 is not responding and the below panic was found:
Panic String: Shutdown taking longer than 930 seconds in process nodewatchdog on release 9.9.1 (C)