CFBMC-3251: Many IO cards are reported as degraded and recovered by BMC reboot

Last updated

Apr 2, 2025
Save as PDF
Share
1. Share
2. Tweet
3. Share

Views:: 190

Visibility:: Public

Votes:: 1

Category:: ontap-9

Specialty:: hw

Last Updated:: 4/2/2025, 4:14:54 PM

Issue

Many IO cards are reported as degraded and recovered by BMC reboot

Multiple sensors simultaneously experienced degradation, resulting in "is not readable" status for the sensors

[?] Wed Jul 10 19:06:17 +0900 [node-1: env_mgr: monitor.ioCard.degraded:alert]: IO card is degraded: IO1 SAS Inflow Temp is not readable
[?] Wed Jul 10 19:06:20 +0900 [node-1: env_mgr: monitor.ioCard.degraded:alert]: IO card is degraded: IO1 SAS Outflow Temp is not readable
・
・
[?] Wed Jul 10 19:06:33 +0900 [node-1: env_mgr: monitor.ioCard.degraded:alert]: IO card is degraded: IO11 SAS P12V HS is not readable
[?] Wed Jul 10 19:06:33 +0900 [node-1: env_mgr: monitor.ioCard.degraded:alert]: IO card is degraded: IO11 SAS Hot Swap Cur is not readable

After an SP reboot was triggered immediately, and the message "Chassis temperature is too high" is displayed with the status "monitor.globalStatus.critical: EMERGENCY."

[?] Wed Jul 10 19:06:33 +0900 [node-1: env_mgr: sp.reboot.sensor.unreadable:notice]: Rebooting BMC because one or more sensors are unreadable.
[?] Wed Jul 10 19:07:00 +0900 [node-1: monitor: monitor.globalStatus.critical:EMERGENCY]: Chassis temperature is too high..
[?] Wed Jul 10 19:07:48 +0900 [node-1: cf_worker: cf.hwassist.notifyCfgSuccess:debug]: params: {'hwtype': 'BMC'}

However, the ASUP for "hm.alert.critical: alert" is triggered.

[?] Wed Jul 10 19:18:45 +0900 [node-1: mgwd: callhome.hm.alert.critical:alert]: Call home for Health Monitor process cphm: CriticalFruMultiFaultAlert[033243222222].