Multiple sensors readings give errors due to miscommunication in Midplane
Applies to
- AFF-A220/FAS2720
- HA Pair
Issue
- System reports multiple error messages related to temperature and sensor issues against node in HA-Pair
Mon Mar 25 13:41:03 0800 [Node-02: env_mgr: monitor.fan.warning:notice]: multiple fans have failed. Replace it to avoid overheating
...
Mon Mar 25 13:41:33 0800 [Node-02: env_mgr: monitor.temp.unreadable:error]: The controller temperature (Module A Expander Temp) is not readable.
...
Mon Mar 25 13:42:34 0800 [Node-02: env_mgr: monitor.temp.unreadable:error]: The controller temperature (Midplane 4 Temp) is not readable.
...
...
Mon Mar 25 13:42:34 0800 [Node-02: env_mgr: monitor.temp.unreadable:error]: The controller temperature (Ambient Temp) is not readable.
...
Mon Mar 25 14:05:54 0800 [Node-02: env_mgr: callhome.c.fan.fru.fault:error]: Call home for CHASSIS FAN FRU FAILED: Multiple fans have failed
- Motherboard reseat does not fix the issue.
- After motherboard replacement additional errors are experienced during boot sequence in Maintenance Mode:
NOTE: It is okay to use 'show/status' sub-commands such as
'disk show or aggr status' in Maintenance mode while the partner is up
Continue with boot? y
y
Apr 03 16:07:43 [Node-02:ses.status.temperatureWarning:ALERT]: DS224-12 (S/N SHFHU2046000032) shelf 0 on channel 0a temperature warning for Temperature sensor 13: not installed or failed. Current temperature: 31 C (87 F). This module is on the rear of the shelf at the top right, on shelf module B.
Apr 03 16:07:43 [Node-02:ses.status.temperatureWarning:ALERT]: DS224-12 (S/N SHFHU2046000032) shelf 0 on channel 0a temperature warning for Temperature sensor 14: not installed or failed. Current temperature: 42 C (107 F). This module is on the rear of the shelf at the top right, on shelf module B.
Apr 03 16:07:43 [Node-02:ses.status.temperatureWarning:ALERT]: DS224-12 (S/N SHFHU2046000032) shelf 0 on channel 0a temperature warning for Temperature sensor 15: not installed or failed. Current temperature: 26 C (78 F). This module is on the rear of the shelf at the top right, on shelf module B.
Apr 03 16:07:43 [Node-02:ses.status.ModuleError:ALERT]: DS224-12 (S/N SHFHU2046000032) shelf 0 on channel 0a SAS expander error for SAS shelf electronics 2: not installed; unknown status or failed. This module is on the rear of the shelf at the top right, on shelf module B.
Apr 03 16:07:43 [Node-02:ses.status.ACPWarn:error]: DS224-12 (S/N SHFHU2046000032) shelf 0 on channel 0a ACP Processor warning for SAS shelf ACP processor 2: ACP not installed. This module is on the rear of the shelf at the top right, on shelf module B.
Apr 03 16:07:49 [Node-02:monitor.shutdown.emergency:EMERGENCY]: Emergency shutdown: Environmental Reason Shutdown (Multiple fans failed)
Apr 03 16:07:50 [Node-02:monitor.shutdown.emergency:EMERGENCY]: Emergency shutdown: Status of fans is unknown for 90 seconds. Shutting down now.
- Reverting to old controller, additional errors are observed:
Apr 03 16:26:35 [Node-02:monitor.shutdown.emergency:EMERGENCY]: Emergency shutdown: Status of fans is unknown for 90 seconds. Shutting down now.
fm_run0: no local disks found & OFW channel is not up.
Apr 03 16:26:40 [Node-02:cf.fm.noMBDisksOrIc:error]: Could not find the local mailbox disks. Could not determine the firmware state of the partner through the HA interconnect.
WARNING: 0 disks found!
Storage Adapters found:
0 Fibre Channel Storage Adapters found!
2 SAS Adapters found!
0 Parallel SCSI Storage Adapters found!
0 ATA Adapters found!
Select option 4 from the boot menu to choose disks for the root volume and spare core space.
Apr 03 16:26:40 [Node-02:raid.assim.tree.noRootVol:error]: No usable root volume was found!
Target Adapters found:
0 Fibre Channel Target Adapters found!
0 iSCSI Target Adapters found!
WARNING: there do not appear to be any disks attached to the system.
No root volume found.
Rebooting... (press ctrl-c during boot to break reboot loop)
...
Apr 03 16:53:22 [Node-02:callhome.c.fan.fru.fault:error]: Call home for CHASSIS FAN FRU FAILED: Multiple fans have failed
...
Apr 03 16:53:26 [Node-02:cf.disk.ResvTakeOver:notice]: This node will wait for giveback and the disk reservations to be released.
Waiting for reservations to clear