IO Error reported on Solaris host due to datastore becoming "Write Protected"

Last updated

Jul 25, 2024
Save as PDF
Share
1. Share
2. Tweet
3. Share

Views:: 171

Visibility:: Public

Votes:: 0

Category:: ontap-9

Specialty:: SAN

Last Updated:: 7/25/2024, 3:20:33 PM

Applies to

Ontap 9
Solaris

Issue

IO Error reported on Solaris host.
In the dmesg.out log, below events were reported on Host end-

Jun 21 17:19:30 T5ROOT1 mac: [ID 736570 kern.notice] NOTICE: ldoms-vsw0.vport12 unregistered Jun 21 17:19:30 T5ROOT1 mac: [ID 469746 kern.notice] NOTICE: ldoms-vsw0.vport12 registered Jun 25 11:43:13 T5ROOT1 scsi: [ID 583741 kern.warning] WARNING: /scsi_vhci (scsi_vhci0): Jun 25 11:43:13 T5ROOT1 /scsi_vhci/ssd@g600a0980383057506724535a6832522d (ssd1066): Command Timeout on path fp7/ssd@w2002d039ea487302,95: b397bbc4c4703801◄▬▬▬▬ Jun 25 11:43:32 T5ROOT1 scsi: [ID 583741 kern.warning] WARNING: /scsi_vhci (scsi_vhci0): Jun 25 11:43:32 T5ROOT1 /scsi_vhci/ssd@g600a0980383057506724535a6832524b (ssd1074): Command Timeout on path fp7/ssd@w2002d039ea487302,99: b3dd046fb2503c01 ◄▬▬▬▬ Jun 25 11:44:28 T5ROOT1 scsi: [ID 583741 kern.warning] WARNING: /scsi_vhci (scsi_vhci0):

In the output above, the last field of the scsi log ( example b397bbc4c4703801) is the IO Error Numeric Association (ENA) of a particular IO transaction.
This ENA can be used to trace the IO transaction in FMA on host end.

The writes are retried and it succeeds post few attempts and during that time when IOs are retried host generate the IO error.
Messages log on host end reports below:

Oraclet $ TZ=Asia/Kolkata fmdump -VA -E b3dd046fb2503cxx | egrep 'Jun|device-path|asses|key =|asc' Jun 25 2024 11:43:32.514791264 ireport.io.scsi.cmd.disk.tran device-path = /pci@380/pci@1/pci@0/pci@7/xxx@0/fp@0,0/ssd@w2002d039ea4873xx,99 driver-assessment = retry Jun 25 2024 11:43:32.515045492 ereport.io.scsi.cmd.disk.dev.rqs.derr device-path = /pci@340/pci@1/pci@0/pci@5/xxx@0/fp@0,0/ssd@w2003d039ea4873xx,99 driver-assessment = fault key = 0x7◄▬▬▬▬ DATA PROTECT. !! asc = 0x27◄▬▬▬▬ asc/ascq "27h/01h DZT RO BK HARDWARE WRITE PROTECTED" ascq = 0x1 devid = id1,ssd@n600a0980383057506724535a6832xxxx driver-assessment = fault◄▬▬▬▬ filter-ratio = 1 op-code = 0xa cdb = 0xa 0x2 0x74 0xc1 0x8 0x0 pkt-reason = CMD_CMPLT pkt-phci-reason = CMD_CMPLT pkt-state = 0x3f pkt-stats = 0x0 pkt-flags = 0x2420c000 pkt-time = 60 pkt-hrt-dev = 150307 pkt-hrt-hba = 0 pkt-tag = 0 stat-code = 0x2 key = 0x7◄▬▬▬▬ DATA PROTECT. !! asc = 0x27◄▬▬▬▬ asc/ascq "27h/01h DZT RO BK HARDWARE WRITE PROTECTED" https://www.t10.org/lists/asc-num.htm#ASC_27 ascq = 0x1 sense-data = 0x70 0x0 0x7 0x0 0x0 0x0 0x0 0xe 0x0 0x0 0x0 0x0 0x27 0x1 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 skaarssa = 0x2072701003f00xx info = ssd_sense_key_fail_command◄▬▬▬▬ __ttl = 0x1 __tod = 0x667a5ff9 0x3698df7c __hrt = 2633577238186035

The volume on storage end shows the status as RW.
No erroneous events reported in the EMS during the issue time.
The affected volume is part of snap mirror destination.
These SM relationships were resynced exactly before the issue was seen.

Tue Jun 25 11:42:49 IST 2024 ResyncTransfer[Jun 25 11:42:16]:050c5b35-2c9d-11ef-9083-d039ea4873xx Operation-Uuid=dcfe3471-32b9-11ef-9083-d039ea4873xx Group=none Operation-Cookie=0 action=End source=SOURCE_SVM:DISK1 destination=DEST-SVM:DISK1_PSDB status=Success bytes_transferred=6503716443 network_compression_ratio=1.0:1 transfer_desc=Logical Transfer with Storage Efficiency - Optimized Directory Mode

Later the relations were broken off to make the destination volume RW.

Tue Jun 25 11:46:55 IST 2024 BreakVolume[Jun 25 11:46:52]:050c5b35-2c9d-11ef-9083-d039ea4873xx Operation-Uuid=837089b8-32ba-11ef-9083-d039ea4873xx Group=none Operation-Cookie=0 action=End source=SOURCE_SVM:DISK1 destination=DEST-SVM:DISK1_PSDB status=Success