Disk operation error observed on host end due to network glitch on the path from switch to target
Applies to
- Ontap 9
- Brocade Switch
- AIX Host
Issue
Disk operation errorandPath FailedError observed on host end underErrpt logon AIX host-
Errpt: -
IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION
DCB47997 0912191024 T H hdisk33 DISK OPERATION ERROR
F31FFAC3 0912191024 I H hdisk33 PATH HAS RECOVERED
DE3B8540 0912190924 P H hdisk33 PATH HAS FAILED
F31FFAC3 0912190924 I H hdisk38 PATH HAS RECOVERED
DCB47997 0912190824 T H hdisk38 DISK OPERATION ERROR
DE3B8540 0912190824 P H hdisk38 PATH HAS FAILED
DCB47997 0912190824 T H hdisk31 DISK OPERATION ERROR
- Issue got auto-recovered without any actions taken from any of the devices.
- On Storage end, we could see
CRCandITWerrors being reported which correlate with thedisk operation errortimestamp on host end.ITWErrors are reported when a frame is dropped.- Cyclic Redundancy Check (
CRC) Error gets reported when data or any frame is corrupted.

- Additionally, we could see
WQE Errorswithextended status 1dbeing reported inEMSduring the issue time.Ext status 1dindicates out of order frame delivery here.
Log snippet-
Thu Sep 12 18:59:35 +0530 [NetApp-02: fct_tpd_work_thread_0: fcp.io.status:debug]: STIO Adapter:2b IO WQE failure, Handle 0x1, Type 8, S_ID: 66F240, VPI: 275, OX_ID: 1ECE, Status 0x3 Ext_Status 0x2
Thu Sep 12 19:01:06 +0530 [NetApp-02: fct_tpd_work_thread_0: fcp.io.status:debug]: STIO Adapter:2b IO WQE failure, Handle 0x1, Type 8, S_ID: 66F8C0, VPI: 275, OX_ID: 9AD, Status 0x3 Ext_Status 0x1d
Thu Sep 12 19:02:06 +0530 [NetApp-02: fct_tpd_work_thread_0: fcp.io.status:debug]: STIO Adapter:2b IO WQE failure, Handle 0x1, Type 8, S_ID: 66F240, VPI: 275, OX_ID: 2E76, Status 0x3 Ext_Status 0x1d
- On switch end, there are no physical layer issues or errors reported on storage and host connected switch ports.
- SFP stats are in optimal range as per
sfpshowoutput-
=============
Slot 12/Port 18:
=============
RX Power: -0.6 dBm (880.40uW)
TX Power: 0.4 dBm (1087.60 uW)
- No port flaps observed in the
fabriclogoutput. - Verified the ISL ports and could not find any errors in the
errdump. - No maps alerts triggered for the affected ports.
- All the reported errors in the
porterrshoware historic , since the port stats were not cleared in the past 6 months.
fabos/bin/switchshow :
Index Slot Port Address Media Speed State Proto
============================================================
242 12 18 66f240 id N32 Online FC F-Port 10:00:00:10:9b:9e:xx:xx
368 12 32 66f8c0 id N32 Online FC F-Port 10:00:00:10:9b:9e:xx:xx
/fabos/cliexec/porterrshow :
frames enc crc crc too too bad enc disc link loss loss frjt fbsy c3timeout pcs uncor
tx rx in err g_eof shrt long eof out c3 fail sync sig tx rx err err
242: 341.0g 71.2g 0 0 0 0 0 0 0 89 0 0 0 0 0 0 0 0 0
368: 341.0g 71.2g 0 0 0 0 0 0 0 81 0 0 0 0 0 0 0 0 0
- There are no
ITWerrors orCRCerrors reported on the host or target connected port on switch, which indicates that the frames got corrupted once it left the switch , just before it reached the target. - The target (NetApp) received the frames in order, but one or more frames in a sequence were corrupted (
CRC), and it is dropped by lower layers of NetApp HBA, and it is detected on the FCP/SCSI layers that one or more frames are missing in the form ofWQEerrors on NetApp. - Since the frames were corrupted and host did not get the response or acknowledgement for those frames from target, hence host would start to report
disk operation error.
- As a temporary workaround, you can disable the port, to ensure that no frames are being passed via that path in case the host has not performed the path failover on its own.
