Skip to main content
NetApp Knowledge Base

NS224 NSM100 normal shelf module alerts seen during firmware upgrade

Views:
1,131
Visibility:
Public
Votes:
0
Category:
ontap-9
Specialty:
hw
Last Updated:

Applies to

  • ONTAP 9
  • ONTAP Upgrade
    • Manual Self Firwmare
  • AFF and NS224 Disk Shelf
  • NSM100 Disk Shelf Module

Issue

  • Automated ONTAP upgrade (ANDU) is started using System Manager
  • The ONTAP upgrade completes successfully without errors and the cluster is healthy
  • Or after manually running shelf firmware upgrade
  • A few minutes later, a health alert is raised

Sat Sep 03 14:57:43 +0100 [cluster1-node2: mgwd: callhome.hm.alert.major:alert]: Call home for Health Monitor process nchm: NoPathToNSMA_Alert[7867034284049604608].

  • Errors dealing with disk shelf module A are seen in the event log

Sat Sep 03 14:59:01 +0100 [cluster1-node2: storlog_admin: sla.shelf.mod.reboot:notice]: Reboot event reported by module A in shelf: 0x.1.0.99.1, log: Sat Sep  3 13:57:58 2022 (    0+00:00:39.013); 02000233; U?; HAL; hal; 04; +++ Application version 0165 launching +++
Sat Sep 03 14:59:01 +0100 [cluster1-node2: storlog_admin: sla.shelf.mod.reboot.unexp:error]: Unexpected reboot event reported by module A in shelf: 0x.1.0.99.1, log: Sat Sep  3 13:58:03 2022 (    0+00:00:44.016); 02000093; U?; HAL; hal; 04; Module Reboot: Startup type 5-Crash reset (regVal:0x40)

Sat Sep 03 14:59:28 +0100 [cluster1-node2: storlog_admin: sla.shelf.mod.reboot:notice]: Reboot event reported by module A in shelf: 0x.0.0.99.0, log: Sat Sep  3 13:57:58 2022 (    0+00:00:39.008); 02000233; U?; HAL; hal; 04; +++ Application version 0165 launching +++
Sat Sep 03 14:59:28 +0100 [cluster1-node2: storlog_admin: sla.shelf.mod.reboot.unexp:error]: Unexpected reboot event reported by module A in shelf: 0x.0.0.99.0, log: Sat Sep  3 13:58:02 2022 (    0+00:00:43.510); 02000093; U?; HAL; hal; 04; Module Reboot: Startup type 5-Crash reset (regVal:0x40)

  • 15 minutes later, errors are logged that firmware is mismatched between module A and B for the same disk shelf, resulting in the system being single-path HA state

Sat Sep 03 15:13:35 +0100 [cluster1-node1: dsa_disc: ses.mismatch.fw.version:error]: The disk shelf modules on disk shelf 0x.0 are running two different firmware versions. Disk shelf module A is running 0163, and disk shelf module B is running 0141.
Sat Sep 03 15:13:35 +0100 [cluster1-node1: dsa_disc: sfu.firmwareDownrev.shelf:error]: Shelf 0x.shelf0 has downrev firmware.
Sat Sep 03 15:13:35 +0100 [cluster1-node1: dsa_disc: ses.mismatch.fw.version:error]: The disk shelf modules on disk shelf 0x.1 are running two different firmware versions. Disk shelf module A is running 0163, and disk shelf module B is running 0141.
Sat Sep 03 15:13:35 +0100 [cluster1-node1: dsa_disc: sfu.firmwareDownrev.shelf:error]: Shelf 0x.shelf1 has downrev firmware.
Sat Sep 03 15:13:35 +0100 [cluster1-node1: dsa_disc: shelf.config.tospha:info]: System has transitioned to single path HA attached storage
Sat Sep 03 15:13:35 +0100 [cluster1-node1: dsa_disc: shelf.config.spha:info]: System is using single path HA attached storage only.

  • Similar 'unexpected reboot disk shelf moddule A' errors are seen for disk shelf module B about 25 minutes later

Sat Sep 03 15:22:02 +0100 [cluster1-node2: storlog_admin: sla.shelf.mod.reboot:notice]: Reboot event reported by module B in shelf: 0x.0.3.99.0, log: Sat Sep  3 14:20:58 2022 (    0+00:00:39.244); 02000233; U?; HAL; hal; 04; +++ Application version 0165 launching +++
Sat Sep 03 15:22:02 +0100 [cluster1-node2: storlog_admin: sla.shelf.mod.reboot.unexp:error]: Unexpected reboot event reported by module B in shelf: 0x.0.3.99.0, log: Sat Sep  3 14:21:03 2022 (    0+00:00:44.246); 02000093; U?; HAL; hal; 04; Module Reboot: Startup type 5-Crash reset (regVal:0x40)

Sat Sep 03 15:23:32 +0100 [cluster1-node2: storlog_admin: sla.shelf.mod.reboot:notice]: Reboot event reported by module B in shelf: 0x.1.3.99.1, log: Sat Sep  3 14:22:33 2022 (    0+00:00:39.335); 02000233; U?; HAL; hal; 04; +++ Application version 0165 launching +++
Sat Sep 03 15:23:32 +0100 [cluster1-node2: storlog_admin: sla.shelf.mod.reboot.unexp:error]: Unexpected reboot event reported by module B in shelf: 0x.1.3.99.1, log: Sat Sep  3 14:22:38 2022 (    0+00:00:43.837); 02000093; U?; HAL; hal; 04; Module Reboot: Startup type 5-Crash reset (regVal:0x40)

 

  • Around this time, other disk shelf module errors are seen

Sat Sep 03 15:17:19 +0100 [cluster1-node1: dsa_worker4: ses.status.temperatureWarning:alert]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x temperature warning for Temperature sensor 12: not installed or failed. Current temperature: 25 C (77 F). This element is on the unknown location.
Sat Sep 03 15:17:19 +0100 [cluster1-node1: dsa_worker4: ses.status.electronicsWarn:error]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x environmental monitoring warning for SES electronics 2: communication error. ; enclosure services hardware failed This element is on the rear of the shelf at the bottom, on module B.
Sat Sep 03 15:17:19 +0100 [cluster1-node1: dsa_worker4: ses.status.ModuleWarn:alert]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x PCI switch warning for PCI Switch 2: communication error. This element is on the rear of the shelf at the bottom, on module B.
Sat Sep 03 15:17:19 +0100 [cluster1-node1: dsa_worker4: ses.status.ACPWarn:error]: NS224NSM100 (S/N SHFHU212200xxx) shelf 1 on channel 0x ACP Processor warning for shelf ACP processor 2: communication error. ; Alternate Control Path hardware failed e B.
Sat Sep 03 15:17:19 +0100 [cluster1-node1: dsa_worker4: ses.status.dimm.error:error]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x DIMM failure for Dimm Element 5: not installed or failed. This element is on the DIMM slot 1 in the bottom shelf module (B).
Sat Sep 03 15:17:19 +0100 [cluster1-node1: dsa_worker4: ses.status.dimm.error:error]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x DIMM failure for Dimm Element 6: not installed or failed. This element is on the DIMM slot 2 in the bottom shelf module (B).
Sat Sep 03 15:17:19 +0100 [cluster1-node1: dsa_worker4: ses.status.dimm.error:error]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x DIMM failure for Dimm Element 7: not installed or failed. This element is on the DIMM slot 3 in the bottom shelf module (B).
Sat Sep 03 15:17:19 +0100 [cluster1-node1: dsa_worker4: ses.status.dimm.error:error]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x DIMM failure for Dimm Element 8: not installed or failed. This element is on the DIMM slot 4 in the bottom shelf module (B).
Sat Sep 03 15:17:19 +0100 [cluster1-node1: dsa_worker4: ses.status.battery.error:error]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x battery failure error for Coin Battery 2: not installed or hardware failure. This element is on the rear of the shelf, in bottom module (B).

  • The disk shelf module errors later clear, after the module reboots

Sat Sep 03 15:23:29 +0100 [cluster1-node1: dsa_worker4: ses.status.ModuleInfo:info]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x PCI switch information for PCI Switch 2: normal status.
Sat Sep 03 15:23:29 +0100 [cluster1-node1: dsa_worker4: ses.status.ACPInfo:info]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x ACP Processor information for shelf ACP processor 2: normal status.
Sat Sep 03 15:23:29 +0100 [cluster1-node1: dsa_worker4: ses.status.dimm.info:notice]: NS224NSM100 (S/N SHFHU212200xxxx) 
Sat Sep 03 15:23:29 +0100 [cluster1-node1: dsa_worker4: ses.status.battery.info:notice]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x battery information for Coin Battery 2: normal status.
Sat Sep 03 15:23:29 +0100 [cluster1-node1: dsa_worker4: ses.status.etherConn.info:notice]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x Ethernet connector information for port e0a: normal status.
Sat Sep 03 15:23:29 +0100 [cluster1-node1: dsa_worker4: ses.status.etherConn.info:notice]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x Ethernet connector information for port e0b: normal status.
Sat Sep 03 15:23:38 +0100 [cluster1-node1: dsa_worker0: ses.status.bootDv.info:notice]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x boot device notification for Boot device 2: normal status.
Sat Sep 03 15:23:56 +0100 [cluster1-node1: dsa_worker4: ses.status.temperatureInfo:info]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x temperature information for Temperature sensor 12: normal status.
Sat Sep 03 15:23:56 +0100 [cluster1-node1: dsa_worker4: ses.status.temperatureInfo:info]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x temperature information for Temperature sensor 13: normal status.

 

  • After both disk shelf modules A and B reboot, the cluster alerts clear and the system returns to a healthy multi-path state

Sat Sep 03 15:26:23 +0100 [cluster1-node1: nchmd: hm.alert.cleared:notice]: Alert Id = NoPathToNSMA_Alert , Alerting Resource = 7867034284049604608 cleared by monitor node-connect
Sat Sep 03 15:26:23 +0100 [cluster1-node1: nchmd: hm.alert.cleared:notice]: Alert Id = NoPathToNSMA_Alert , Alerting Resource = 8299379848277172224 cleared by monitor node-connect
Sat Sep 03 15:33:41 +0100 [cluster1-node1: start_asup_collector_thread: shelf.config.tompha:info]: System has transitioned to multi-path HA attached storage

 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.