Node down with multiple DISK "scsi.cmd.pastTimeToLive:error"
Applies to
- FAS 2820
- ONTAP 9
- Internal shelf
Issue
- Node down with multiple disk
scsi.cmd.pastTimeToLive:erro
r errors.
[?] Sat Dec 28 08:48:00 +0900 [node01: scsi_cmdblk_strthr_admin: scsi.cmd.pastTimeToLive:error]: Disk device 0b.00.0: request failed after try #1: cdb 0x8a:000000046cd85e00:00000200.
[?] Sat Dec 28 08:48:00 +0900 [node01: scsi_cmdblk_strthr_admin: scsi.cmd.pastTimeToLive:error]: Disk device 0b.00.0: request failed after try #1: cdb 0x8a:000000047237f760:00000008.
[?] Sat Dec 28 08:48:00 +0900 [node01: scsi_cmdblk_strthr_admin: scsi.cmd.pastTimeToLive:error]: Disk device 0b.00.0: request failed after try #1: cdb 0x8f:000000046c3c7e00:00000400.
...
[?] Sat Dec 28 08:48:00 +0900 [node01: scsi_cmdblk_strthr_admin: scsi.cmd.pastTimeToLive:error]: Disk device 0b.00.8: request failed after try #1: cdb 0x88:000000047237ef90:00000008.
- In partner node
HA Group Notification (CONTROLLER TAKEOVER COMPLETE AUTOMATIC - Communiction Error) ALERT
.- The following ems log is detected.
[?] Sat Dec 28 08:48:01 +0900 [node02: cf_main: cf.fsm.takeover.mdp:alert]: Failover monitor: takeover attempted after multi-disk failure on partner
- Shelf IOM port state shows
NO SIGNAL
Timestamp: Sat Jan 4 08:33:20 JST 2025
Shelf name: 0c.shelf0
Channel: 0c
Module: A
Shelf id: 0
Shelf UUID: 50:0a:09:80:08:6f:fb:24
Shelf S/N: SHJSG2418000037
Term switch: N/A
Shelf state: ONLINE
Module state: OK
Partial Path Link Invalid Running Loss Phy CRC Phy
Disk Port Timeout Rate DWord Disparity Dword Reset Error Change
Id State Value (ms) (Gb/s) Count Count Count Problem Count Count
--------------------------------------------------------------------------------------------
[HST0/P0:0] NO SIGNAL 7 NA 0 0 0 0 0 974
[HST1/P0:1] NO SIGNAL 7 NA 1299 1298 0 0 0 974
[HST2/P0:2] NO SIGNAL 7 NA 310 307 0 0 0 974
[HST3/P0:3] NO SIGNAL 7 NA 85 81 0 0 0 974
[HST4/P1:0] OK 7 12.0 0 0 0 0 0 3
[HST5/P1:1] OK 7 12.0 0 0 0 0 0 3
[HST6/P1:2] OK 7 12.0 0 0 0 0 0 3