Multiple H410 nodes down Block Service Unhealthy and Amber light on due to over temperature event
Applies to
- NetApp HCI/SolidFire Element OS
- H-Series nodes H410S, H410C
- 2U H-Series chassis
Issue
- After a sudden failure of the data center air conditioning (AC) system, the environment temperature exceeded operational limits. As a result multiple H410 nodes in 2U H-Series chassis became offline and unresponsive.
Alerts reported :
nodeOffline - The SolidFire Application cannot communicate with Storage node having node ID X.blockServiceUnhealthy - Block service(s) on more than one node are unhealthy. Data unavailability is possible and rebuild may be blocked.nvramDeviceStatus - NVRAM device warning={capacitor1And2Temperature: 51.65 C, capacitor3And4Temperature: 55.54 C, fanInletAmbientTemperature: 64.54 C}
- Nodes did not power on after temperature normalization
- Chassis showed only amber lights on both PSUs
- Attempts to power drain or reseat power supplies were unsuccessful.
- All attempts to restore service by power cable replacement , PSU replacement or power cycling failed.
