CONTAP-252261: Drive failure causing long IO delays

Last updated
Save as PDF
Share
1. Share
2. Tweet
3. Share

Views:: 258

Visibility:: Public

Votes:: 0

Category:: ontap-9

Specialty:: core

Last Updated:

Issue

When an drive returns a non-retriable IO error, ONTAP may incorrectly keep retrying the IO.
ONTAP will then take a long time to fail the drive resulting in IO delays that may affect clients (like ESX).

[Node1: scsi_cmdblk_strthr_admin: scsi.cmd.notReadyConditionEMSOnly:debug]: Disk device 0v.i1.1L34: Device returns not yet ready: CDB 0x2a:37e2f0f0:0001: Sense Data SCSI:not ready -  (0x2 - 0x4 0x0 0x82)(32559).
[Node1:raid.label.io.writeError:notice]: Label write on Disk /aggr1/plex1/rg1/0v.i1.1L34 ... failed with storage error disk operation timed out
wafl_exempt02: wafl.cp.toolong:error]: Aggregate aggr1 experienced a long CP.
kernel: Nblade.nfsLongRunningOp:debug]: Detected a long running network process operation...