Skip to main content
NetApp Knowledge Base

Node down with multiple DISK "scsi.cmd.pastTimeToLive:error"

Views:
61
Visibility:
Public
Votes:
0
Category:
fas-systems
Specialty:
hw
Last Updated:
6/6/2025, 9:47:29 AM

Applies to

  • FAS 2820
  • ONTAP 9
  • Internal shelf

Issue

  • Node down with multiple disk scsi.cmd.pastTimeToLive:error errors.

[?] Sat Dec 28 08:48:00 +0900 [node01: scsi_cmdblk_strthr_admin: scsi.cmd.pastTimeToLive:error]: Disk device 0b.00.0: request failed after try #1: cdb 0x8a:000000046cd85e00:00000200.
[?] Sat Dec 28 08:48:00 +0900 [node01: scsi_cmdblk_strthr_admin: scsi.cmd.pastTimeToLive:error]: Disk device 0b.00.0: request failed after try #1: cdb 0x8a:000000047237f760:00000008.
[?] Sat Dec 28 08:48:00 +0900 [node01: scsi_cmdblk_strthr_admin: scsi.cmd.pastTimeToLive:error]: Disk device 0b.00.0: request failed after try #1: cdb 0x8f:000000046c3c7e00:00000400.
...
[?] Sat Dec 28 08:48:00 +0900 [node01: scsi_cmdblk_strthr_admin: scsi.cmd.pastTimeToLive:error]: Disk device 0b.00.8: request failed after try #1: cdb 0x88:000000047237ef90:00000008.

  • In partner node HA Group Notification (CONTROLLER TAKEOVER COMPLETE AUTOMATIC - Communiction Error) ALERT.
    • The following ems log is detected.

[?] Sat Dec 28 08:48:01 +0900 [node02: cf_main: cf.fsm.takeover.mdp:alert]: Failover monitor: takeover attempted after multi-disk failure on partner

  • Shelf IOM port state shows NO SIGNAL

Timestamp: Sat Jan 4 08:33:20 JST 2025
Shelf name: 0c.shelf0
Channel: 0c
Module: A
Shelf id: 0
Shelf UUID: 50:0a:09:80:08:6f:fb:24
Shelf S/N: SHJSG2418000037
Term switch: N/A
Shelf state: ONLINE
Module state: OK

Partial Path Link Invalid Running Loss Phy CRC Phy
Disk Port Timeout Rate DWord Disparity Dword Reset Error Change
Id State Value (ms) (Gb/s) Count Count Count Problem Count Count
--------------------------------------------------------------------------------------------
[HST0/P0:0] NO SIGNAL 7 NA 0 0 0 0 0 974
[HST1/P0:1] NO SIGNAL 7 NA 1299 1298 0 0 0 974
[HST2/P0:2] NO SIGNAL 7 NA 310 307 0 0 0 974
[HST3/P0:3] NO SIGNAL 7 NA 85 81 0 0 0 974
[HST4/P1:0] OK 7 12.0 0 0 0 0 0 3
[HST5/P1:1] OK 7 12.0 0 0 0 0 0 3
[HST6/P1:2] OK 7 12.0 0 0 0 0 0 3

  • Multiple drives are not read by the node and the aggregate fails due to multi-disk error:
    Mon Jun 02 10:17:22 +0700 [node-02: config_thread: raid.vol.failed:notice]: Aggregate aggr1_n2: Failed due to multi-disk error.
    Mon Jun 02 10:17:23 +0700 [node-02: config_thread: cf.multidisk.fatalProblem:error]: Node encountered a multidisk error or other fatal error while waiting to be taken over. aggr aggr1_n2: raid volfsm, fatal multi-disk error..  Raid type - raid_dp Group name plex0/rg0 state DOUBLEDEGRADED. 1 disk failed in the group. Disk 0a.00.2P1 Shelf 0 Bay 2 [NETAPP   X336_TTCRE04TA07 NA04] S/N [Y3F0A2XXXXXX] UID [6000039C:E82AC314:500A0981:00000001:00000000:00000000:00000000:00000000:00000000:00000000] error: disk failed..

     
  • The node goes down due to multi-disk failure
    Mon Jun 02 10:17:23 +0700 [node-02: cf_main: cf.fsm.takeover.mdp:alert]: Failover monitor: takeover attempted after multi-disk failure on partner

     

 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.