Skip to main content
NetApp Knowledge Base

Failed NSM100 leads to Checksum Errors and Disk Redundancy Failures

Views:
163
Visibility:
Public
Votes:
0
Category:
disk-shelves
Specialty:
HW
Last Updated:

Applies to

  • NSM224 shelf
  • Disk checksum errors during SCRUB

Issue

  • Multiple checksum errors in different disks.

[node_name: raidio_thread: raid_rg_scrub_cksum_err_1:notice]: params: {'disk_rpm': 'N/A', 'vendor': 'NETAPP ', 'firmware_revision': 'NA51', 'shelf': '24', 'disk_info': 'Disk /aggr_name/plex0/rg0/e0c.24.0.14P1 Shelf 24 Bay 14 [NETAPP X4010WBORA1T9NTE NA51] (...)

[node_name: raidio_thread: raid_rg_scrub_cksum_err_1:notice]: params: {'disk_rpm': 'N/A', 'vendor': 'NETAPP ', 'firmware_revision': 'NA51', 'shelf': '24', 'disk_info': 'Disk /aggr_name/plex0/rg0/e0c.24.0.7P1 Shelf 24 Bay 7 [NETAPP X4010WBORA1T9NTE NA51] (...)

[node_name: raidio_thread: raid_rg_scrub_cksum_err_1:notice]: params: {'disk_rpm': 'N/A', 'vendor': 'NETAPP ', 'firmware_revision': 'NA51', 'shelf': '24', 'disk_info': 'Disk /aggr_name/plex0/rg0/e0d.24.3.12P1 Shelf 24 Bay 12 [NETAPP X4010WBORA1T9NTE NA51] (...)

[node_name: raidio_thread: raid_cksum_verify_error_file_1:notice]: params: {'firmware_revision': 'NA51', [...], 'disk_info': 'Disk /aggr_name/plex0/rg0/e0c.24.0.15P1 Shelf 24 Bay 15 [NETAPP X4010WBORA1T9NTE NA51] [...], 'error': 'checksum computation mismatched', 'model': 'X4010WBORA1T9NTE', 'ino_type': ''}

  • Shelf NSM100 module reports errors related to sensors, connectivity and hardware components. Examples:

[node_name: scsi_cmdblk_strthr_admin: scsi.cmd.notReadyConditionEMSOnly:debug]: Enclosure services device 0x.24.1.99L0: Device returns not yet ready: CDB 0x1c: Sense Data SCSI:not ready - (0x2 - 0x35 0x2 0x0)(0).
[node_name: scsi_cmdblk_strthr_admin: scsi.cmd.mcc.lunmgr.io.error:debug]: Disk device S/N 22323T800648 - CDB 0x28:0b652ef8:0008 - (scsi error: command aborted) - Sense Data SCSI:no sense - (0x0 - 0x0 0x0 0x0)(DT 594). (HA status 0x15) - (out_status_flags 0x8)
[node_name: scsi_cmdblk_strthr_admin: scsi.cmd.mcc.lunmgr.io.error:debug]: Disk device S/N 22323T800065 - CDB 0x9a:0000000014c1c180:0005:002c - (scsi error: command aborted) - Sense Data SCSI:no sense - (0x0 - 0x0 0x0 0x0)(DT 966). (HA status 0x15) - (out_status_flags 0x8)
[node_name: dsa_worker3: ses.status.temperatureWarning:alert]: NS224NSM100 (S/N SHJHU1234567890) shelf 24 on channel 0x temperature warning for Temperature sensor 12: not installed or failed. Current temperature: 29 C (84 F). This element is on the unknown location.
[node_name: dsa_worker3: ses.status.electronicsWarn:error]: NS224NSM100 (S/N SHJHU1234567890) shelf 24 on channel 0x environmental monitoring warning for SES electronics 2: communication error. ; enclosure services hardware failed This element is on the rear of the shelf at the bottom, on module B.
[node_name: dsa_worker3: ses.status.ModuleWarn:alert]: NS224NSM100 (S/N SHJHU1234567890) shelf 24 on channel 0x PCI switch warning for PCI Switch 2: communication error. This element is on the rear of the shelf at the bottom, on module B.
[node_name: dsa_worker3: ses.status.ACPWarn:error]: NS224NSM100 (S/N SHJHU1234567890) shelf 24 on channel 0x ACP Processor warning for shelf ACP processor 2: communication error. ; Alternate Control Path hardware failed e B.
[node_name: dsa_worker3: ses.status.battery.error:error]: NS224NSM100 (S/N SHJHU1234567890) shelf 24 on channel 0x battery failure error for Coin Battery 2: not installed or hardware failure. This element is on the rear of the shelf, in bottom module (B).
[node_name: dsa_worker3: ses.status.etherConn.warn:error]: NS224NSM100 (S/N SHJHU1234567890) shelf 24 on channel 0x Ethernet connector warning for port e0a: cannot communicate with connector. This element is on the rear of the shelf at the bottom, on module B.
[node_name: dsa_worker3: ses.status.etherConn.warn:error]: NS224NSM100 (S/N SHJHU1234567890) shelf 24 on channel 0x Ethernet connector warning for port e0b: cannot communicate with connector. This element is on the rear of the shelf at the bottom, on module B.
[node_name: dsa_worker3: ses.status.dimm.error:error]: NS224NSM100 (S/N SHJHU1234567890) shelf 24 on channel 0x DIMM failure for Dimm Element 5: not installed or failed. This element is on the DIMM slot 1 in the bottom shelf module (B).
[node_name: dsa_worker3: ses.status.dimm.error:error]: NS224NSM100 (S/N SHJHU1234567890) shelf 24 on channel 0x DIMM failure for Dimm Element 6: not installed or failed. This element is on the DIMM slot 2 in the bottom shelf module (B).
[node_name: dsa_worker3: ses.status.dimm.error:error]: NS224NSM100 (S/N SHJHU1234567890) shelf 24 on channel 0x DIMM failure for Dimm Element 7: not installed or failed. This element is on the DIMM slot 3 in the bottom shelf module (B).

  • The issue remains after ONTAP and NSM100 Firmware update.

  • The issue reamins after the NSM100 module re-seat

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.