CPU VRD Temperature false sensor reading of AFF-A900 and FAS9500
Applies to
- AFF-A900
- FAS9500
- ONTAP 9
Issue
- CPU1 or CPU2 VRD Temperature sensor is incorrectly read by ONTAP and reports as 0 degrees, triggering a chassis low-temperature alert.
- EMS log:
Wed Oct 19 01:48:30 -0400 [Node_Name: env_mgr: monitor.chassisTemperature.cool:alert]: Chassis temperature is too cool: CPU2 VRD Temp is critical low (0 C).
- BMC system events log:
- EMS log:
Record 1527: Sat Mar 04 00:32:01.076611 2023 [IPMI.notice]: 0513 | 02 | EVT: 01500005 | CPU2_VRD_Temp | Assertion Event, "Lower Non-critical going low " | Reading: 0.000 | Threshold: 5.000
Record 1528: Sat Mar 04 00:32:01.078001 2023 [IPMI.notice]: 0514 | 02 | EVT: 01520000 | CPU2_VRD_Temp | Assertion Event, "Lower Critical going low " | Reading: 0.000 | Threshold: 0.000
Record 1963: Mon Apr 24 18:13:18.996830 2023 [IPMI.notice]: 06b9 | 02 | EVT: 01500005 | CPU1_VRD_Temp | Assertion Event, "Lower Non-critical going low " | Reading: 0.000 | Threshold: 5.000
Record 1964: Mon Apr 24 18:13:18.998197 2023 [IPMI.notice]: 06ba | 02 | EVT: 01520000 | CPU1_VRD_Temp | Assertion Event, "Lower Critical going low " | Reading: 0.000 | Threshold: 0.000
Record 1978: Wed Apr 26 11:38:01.061589 2023 [IPMI.notice]: 06c7 | 02 | EVT: 6f02ffff | PSU1_Status | Assertion Event, "Fault"
Record 1979: Wed Apr 26 11:38:01.139064 2023 [IPMI.notice]: 06c8 | 02 | EVT: 6f02ffff | PSU3_Status | Assertion Event, "Fault"
Record 1980: Wed Apr 26 11:38:05.910595 2023 [IPMI.notice]: 06c9 | 02 | EVT: 6f02ffff | Fan3_Status | Assertion Event, "Fault"
- The alert clears seconds later:
- EMS log:
-
Wed Oct 19 01:48:31 -0400 [Node_Name: env_mgr: monitor.chassisTemperature.ok:notice]: Chassis temperature is ok.
-
- BMC system events log:
- EMS log:
Record 1981: Wed Apr 26 11:38:05.973490 2023 [IPMI.notice]: 06ca | 02 | EVT: ef02ffff | PSU1_Status | Deassertion Event, "Fault"
Record 1982: Wed Apr 26 11:38:06.058606 2023 [IPMI.notice]: 06cb | 02 | EVT: ef02ffff | PSU3_Status | Deassertion Event, "Fault"
Record 1983: Wed Apr 26 11:38:09.932151 2023 [IPMI.notice]: 06cc | 02 | EVT: ef02ffff | Fan3_Status | Deassertion Event, "Fault"
- Other sensors (PSU, FAN, PCM) might be continue to assert and deassert "Fault," and the system fan speed fluctuates between high and non-critical levels.
Record 1938: Fri Apr 21 04:10:21.983012 2023 [IPMI.notice]: 06a0 | 02 | EVT: 81598aff | Fan1_Speed1 | Deassertion Event, "Upper Critical going high" | Reading: 8280.000 | Threshold: 15300.000
Record 1939: Fri Apr 21 04:10:21.987010 2023 [IPMI.notice]: 06a1 | 02 | EVT: 81578ad9 | Fan1_Speed1 | Deassertion Event, "Upper Non-critical going high" | Reading: 8280.000 | Threshold: 13020.000
Record 1940: Fri Apr 21 04:10:21.989607 2023 [IPMI.notice]: 06a2 | 02 | EVT: 81598bff | Fan1_Speed2 | Deassertion Event, "Upper Critical going high" | Reading: 8340.000 | Threshold: 15300.000
Record 1941: Fri Apr 21 04:10:21.991062 2023 [IPMI.notice]: 06a3 | 02 | EVT: 81578bd9 | Fan1_Speed2 | Deassertion Event, "Upper Non-critical going high" | Reading: 8340.000 | Threshold: 13020.000
Record 1942: Fri Apr 21 04:10:22.001742 2023 [IPMI.notice]: 06a4 | 02 | EVT: 81598aff | Fan1_Speed3 | Deassertion Event, "Upper Critical going high" | Reading: 8280.000 | Threshold: 15300.000
Record 1943: Fri Apr 21 04:10:22.003202 2023 [IPMI.notice]: 06a5 | 02 | EVT: 81578ad9 | Fan1_Speed3 | Deassertion Event, "Upper Non-critical going high" | Reading: 8280.000 | Threshold: 13020.000
Record 1944: Fri Apr 21 04:10:22.004990 2023 [IPMI.notice]: 06a6 | 02 | EVT: 81598aff | Fan1_Speed4 | Deassertion Event, "Upper Critical going high" | Reading: 8280.000 | Threshold: 15300.000
Record 1945: Fri Apr 21 04:10:22.006297 2023 [IPMI.notice]: 06a7 | 02 | EVT: 81578ad9 | Fan1_Speed4 | Deassertion Event, "Upper Non-critical going high" | Reading: 8280.000 | Threshold: 13020.000