IO Error reported on Solaris host due to datastore becoming "Write Protected"
Applies to
- Ontap 9
- Solaris
Issue
IO Error
reported on Solaris host.-
In the
dmesg.out
log, below events were reported on Host end-
Jun 21 17:19:30 T5ROOT1 mac: [ID 736570 kern.notice] NOTICE: ldoms-vsw0.vport12 unregistered
Jun 21 17:19:30 T5ROOT1 mac: [ID 469746 kern.notice] NOTICE: ldoms-vsw0.vport12 registered
Jun 25 11:43:13 T5ROOT1 scsi: [ID 583741 kern.warning] WARNING: /scsi_vhci (scsi_vhci0):
Jun 25 11:43:13 T5ROOT1 /scsi_vhci/ssd@g600a0980383057506724535a6832522d (ssd1066): Command Timeout on path fp7/ssd@w2002d039ea487302,95: b397bbc4c4703801◄▬▬▬▬
Jun 25 11:43:32 T5ROOT1 scsi: [ID 583741 kern.warning] WARNING: /scsi_vhci (scsi_vhci0):
Jun 25 11:43:32 T5ROOT1 /scsi_vhci/ssd@g600a0980383057506724535a6832524b (ssd1074): Command Timeout on path fp7/ssd@w2002d039ea487302,99: b3dd046fb2503c01 ◄▬▬▬▬
Jun 25 11:44:28 T5ROOT1 scsi: [ID 583741 kern.warning] WARNING: /scsi_vhci (scsi_vhci0):
- In the output above, the last field of the
scsi log
( exampleb397bbc4c4703801
) is theIO Error Numeric Association (ENA)
of a particular IO transaction. - This ENA can be used to trace the IO transaction in FMA on host end.
- The writes are retried and it succeeds post few attempts and during that time when IOs are retried host generate the IO error.
Messages
log on host end reports below:
Oraclet $ TZ=Asia/Kolkata fmdump -VA -E b3dd046fb2503cxx | egrep 'Jun|device-path|asses|key =|asc'
Jun 25 2024 11:43:32.514791264 ireport.io.scsi.cmd.disk.tran
device-path = /pci@380/pci@1/pci@0/pci@7/xxx@0/fp@0,0/ssd@w2002d039ea4873xx,99
driver-assessment = retry
Jun 25 2024 11:43:32.515045492 ereport.io.scsi.cmd.disk.dev.rqs.derr
device-path = /pci@340/pci@1/pci@0/pci@5/xxx@0/fp@0,0/ssd@w2003d039ea4873xx,99
driver-assessment = fault
key = 0x7◄▬▬▬▬ DATA PROTECT. !!
asc = 0x27◄▬▬▬▬ asc/ascq "27h/01h DZT RO BK HARDWARE WRITE PROTECTED"
ascq = 0x1
devid = id1,ssd@n600a0980383057506724535a6832xxxx
driver-assessment = fault◄▬▬▬▬
filter-ratio = 1
op-code = 0xa
cdb = 0xa 0x2 0x74 0xc1 0x8 0x0
pkt-reason = CMD_CMPLT
pkt-phci-reason = CMD_CMPLT
pkt-state = 0x3f
pkt-stats = 0x0
pkt-flags = 0x2420c000
pkt-time = 60
pkt-hrt-dev = 150307
pkt-hrt-hba = 0
pkt-tag = 0
stat-code = 0x2
key = 0x7◄▬▬▬▬ DATA PROTECT. !!
asc = 0x27◄▬▬▬▬ asc/ascq "27h/01h DZT RO BK HARDWARE WRITE PROTECTED" https://www.t10.org/lists/asc-num.htm#ASC_27
ascq = 0x1
sense-data = 0x70 0x0 0x7 0x0 0x0 0x0 0x0 0xe 0x0 0x0 0x0 0x0 0x27 0x1 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
skaarssa = 0x2072701003f00xx
info = ssd_sense_key_fail_command◄▬▬▬▬
__ttl = 0x1
__tod = 0x667a5ff9 0x3698df7c
__hrt = 2633577238186035
- The volume on storage end shows the status as
RW
. - No erroneous events reported in the
EMS
during the issue time. - The affected volume is part of snap mirror destination.
- These SM relationships were resynced exactly before the issue was seen.
Tue Jun 25 11:42:49 IST 2024 ResyncTransfer[Jun 25 11:42:16]:050c5b35-2c9d-11ef-9083-d039ea4873xx Operation-Uuid=dcfe3471-32b9-11ef-9083-d039ea4873xx Group=none Operation-Cookie=0 action=End source=SOURCE_SVM:DISK1 destination=DEST-SVM:DISK1_PSDB status=Success bytes_transferred=6503716443 network_compression_ratio=1.0:1 transfer_desc=Logical Transfer with Storage Efficiency - Optimized Directory Mode
-
Later the relations were broken off to make the destination volume RW.
Tue Jun 25 11:46:55 IST 2024 BreakVolume[Jun 25 11:46:52]:050c5b35-2c9d-11ef-9083-d039ea4873xx Operation-Uuid=837089b8-32ba-11ef-9083-d039ea4873xx Group=none Operation-Cookie=0 action=End source=SOURCE_SVM:DISK1 destination=DEST-SVM:DISK1_PSDB status=Success