NSM100 sensors report errors and recover automatically
Applies to
- NS224
- NSM100
- The issue persisted after upgrading shelf firmware to the latest version or reseating NSM100 module
Issue
- NSM100 sensors report errors in
EMS-LOG-FILE.GZ
.
[?] Fri Jul 26 05:25:54 +0900 [NodeA: dsa_worker5: ses.status.temperatureWarning:alert]: NS224NSM100 (S/N XXXXXXXXXXXX) shelf 21 on channel 0x temperature warning for Temperature sensor 6: not installed or failed. Current temperature: 40 C (104 F). This element is on the unknown location.
[?] Fri Jul 26 05:25:54 +0900 [NodeA: dsa_worker5: ses.status.temperatureWarning:alert]: NS224NSM100 (S/N XXXXXXXXXXXX) shelf 21 on channel 0x temperature warning for Temperature sensor 7: not installed or failed. Current temperature: 75 C (167 F). This element is on the unknown location.
[?] Fri Jul 26 05:25:54 +0900 [NodeA: dsa_worker5: ses.status.temperatureWarning:alert]: NS224NSM100 (S/N XXXXXXXXXXXX) shelf 21 on channel 0x temperature warning for Temperature sensor 8: not installed or failed. Current temperature: 69 C (156 F). This element is on the unknown location.
[?] Fri Jul 26 05:25:54 +0900 [NodeA: dsa_worker5: ses.status.temperatureWarning:alert]: NS224NSM100 (S/N XXXXXXXXXXXX) shelf 21 on channel 0x temperature warning for Temperature sensor 9: not installed or failed. Current temperature: 69 C (156 F). This element is on the unknown location.
[?] Fri Jul 26 05:25:54 +0900 [NodeA: dsa_worker5: ses.status.electronicsWarn:error]: NS224NSM100 (S/N XXXXXXXXXXXX) shelf 21 on channel 0x environmental monitoring warning for SES electronics 1: communication error. ; enclosure services hardware failed This element is on the rear of the shelf at the top, on module A.
[?] Fri Jul 26 05:25:54 +0900 [NodeA: dsa_worker5: ses.status.ModuleWarn:alert]: NS224NSM100 (S/N XXXXXXXXXXXX) shelf 21 on channel 0x PCI switch warning for PCI Switch 1: communication error. This element is on the rear of the shelf at the top, on module A.
[?] Fri Jul 26 05:25:54 +0900 [NodeA: dsa_worker5: ses.status.ACPWarn:error]: NS224NSM100 (S/N XXXXXXXXXXXX) shelf 21 on channel 0x ACP Processor warning for shelf ACP processor 1: communication error. ; Alternate Control Path hardware failed This element is on the rear of the shelf at the top, on module A.
[?] Fri Jul 26 05:25:54 +0900 [NodeA: dsa_worker5: ses.status.battery.error:error]: NS224NSM100 (S/N XXXXXXXXXXXX) shelf 21 on channel 0x battery failure error for Coin Battery 1: not installed or hardware failure. This element is on the rear of the shelf, in top module (A).
[?] Fri Jul 26 05:25:54 +0900 [NodeA: dsa_worker5: ses.status.etherConn.warn:error]: NS224NSM100 (S/N XXXXXXXXXXXX) shelf 21 on channel 0x Ethernet connector warning for port e0a: cannot communicate with connector. This element is on the rear of the shelf at the top, on module A.
[?] Fri Jul 26 05:25:54 +0900 [NodeA: dsa_worker5: ses.status.etherConn.warn:error]: NS224NSM100 (S/N XXXXXXXXXXXX) shelf 21 on channel 0x Ethernet connector warning for port e0b: cannot communicate with connector. This element is on the rear of the shelf at the top, on module A.
[?] Fri Jul 26 05:26:30 +0900 [NodeA: dsa_worker3: ses.status.dimm.error:error]: NS224NSM100 (S/N XXXXXXXXXXXX) shelf 21 on channel 0x DIMM failure for Dimm Element 1: not installed or failed. This element is on the DIMM slot 1 in the top shelf module (A).
[?] Fri Jul 26 05:26:30 +0900 [NodeA: dsa_worker3: ses.status.dimm.error:error]: NS224NSM100 (S/N XXXXXXXXXXXX) shelf 21 on channel 0x DIMM failure for Dimm Element 2: not installed or failed. This element is on the DIMM slot 2 in the top shelf module (A).
[?] Fri Jul 26 05:26:30 +0900 [NodeA: dsa_worker3: ses.status.dimm.error:error]: NS224NSM100 (S/N XXXXXXXXXXXX) shelf 21 on channel 0x DIMM failure for Dimm Element 3: not installed or failed. This element is on the DIMM slot 3 in the top shelf module (A).
[?] Fri Jul 26 05:26:30 +0900 [NodeA: dsa_worker3: ses.status.dimm.error:error]: NS224NSM100 (S/N XXXXXXXXXXXX) shelf 21 on channel 0x DIMM failure for Dimm Element 4: not installed or failed. This element is on the DIMM slot 4 in the top shelf module (A).
- The errors clear moments later in
EMS-LOG-FILE.GZ
.
[?] Fri Jul 26 05:29:12 +0900 [NodeA: dsa_worker4: ses.status.ModuleInfo:info]: NS224NSM100 (S/N XXXXXXXXXXXX) shelf 21 on channel 0x PCI switch information for PCI Switch 1: normal status.
[?] Fri Jul 26 05:29:12 +0900 [NodeA: dsa_worker4: ses.status.ACPInfo:info]: NS224NSM100 (S/N XXXXXXXXXXXX) shelf 21 on channel 0x ACP Processor information for shelf ACP processor 1: normal status.
[?] Fri Jul 26 05:29:12 +0900 [NodeA: dsa_worker4: ses.status.dimm.info:notice]: NS224NSM100 (S/N XXXXXXXXXXXX) shelf 21 on channel 0x DIMM notification for Dimm Element 1: normal status.
[?] Fri Jul 26 05:29:12 +0900 [NodeA: dsa_worker4: ses.status.dimm.info:notice]: NS224NSM100 (S/N XXXXXXXXXXXX) shelf 21 on channel 0x DIMM notification for Dimm Element 2: normal status.
[?] Fri Jul 26 05:29:12 +0900 [NodeA: dsa_worker4: ses.status.dimm.info:notice]: NS224NSM100 (S/N XXXXXXXXXXXX) shelf 21 on channel 0x DIMM notification for Dimm Element 3: normal status.
[?] Fri Jul 26 05:29:12 +0900 [NodeA: dsa_worker4: ses.status.dimm.info:notice]: NS224NSM100 (S/N XXXXXXXXXXXX) shelf 21 on channel 0x DIMM notification for Dimm Element 4: normal status.
[?] Fri Jul 26 05:29:12 +0900 [NodeA: dsa_worker4: ses.status.battery.info:notice]: NS224NSM100 (S/N XXXXXXXXXXXX) shelf 21 on channel 0x battery information for Coin Battery 1: normal status.
[?] Fri Jul 26 05:29:12 +0900 [NodeA: dsa_worker4: ses.status.etherConn.info:notice]: NS224NSM100 (S/N XXXXXXXXXXXX) shelf 21 on channel 0x Ethernet connector information for port e0a: normal status.
[?] Fri Jul 26 05:29:12 +0900 [NodeA: dsa_worker4: ses.status.etherConn.info:notice]: NS224NSM100 (S/N XXXXXXXXXXXX) shelf 21 on channel 0x Ethernet connector information for port e0b: normal status.
[?] Fri Jul 26 05:29:22 +0900 [NodeA dsa_worker4: ses.status.temperatureInfo:info]: NS224NSM100 (S/N XXXXXXXXXXXX) shelf 21 on channel 0x temperature information for Temperature sensor 6: normal status.
[?] Fri Jul 26 05:29:22 +0900 [NodeA: dsa_worker4: ses.status.temperatureInfo:info]: NS224NSM100 (S/N XXXXXXXXXXXX) shelf 21 on channel 0x temperature information for Temperature sensor 7: normal status.
[?] Fri Jul 26 05:29:22 +0900 [NodeA: dsa_worker4: ses.status.temperatureInfo:info]: NS224NSM100 (S/N XXXXXXXXXXXX) shelf 21 on channel 0x temperature information for Temperature sensor 8: normal status.
[?] Fri Jul 26 05:29:22 +0900 [NodeA: dsa_worker4: ses.status.temperatureInfo:info]: NS224NSM100 (S/N XXXXXXXXXXXX) shelf 21 on channel 0x temperature information for Temperature sensor 9: normal status.
[?] Fri Jul 26 05:29:22 +0900 [NodeA: dsa_worker4: ses.status.temperatureInfo:info]: NS224NSM100 (S/N XXXXXXXXXXXX) shelf 21 on channel 0x temperature information for Temperature sensor 10: normal status.
[?] Fri Jul 26 05:29:22 +0900 [NodeA: dsa_worker4: ses.status.temperatureInfo:info]: NS224NSM100 (S/N XXXXXXXXXXXX) shelf 21 on channel 0x temperature information for Temperature sensor 11: normal status.
[?] Fri Jul 26 05:29:22 +0900 [NodeA: dsa_worker4: ses.status.bootDv.info:notice]: NS224NSM100 (S/N XXXXXXXXXXXX) shelf 21 on channel 0x boot device notification for Boot device 1: normal status.