Skip to main content
NetApp Knowledge Base

Disk operation error observed on host end due to network glitch on the path from switch to target

Views:
129
Visibility:
Public
Votes:
0
Category:
ontap-9
Specialty:
san
Last Updated:

Applies to

  • Ontap 9
  • Brocade Switch
  • AIX Host

Issue

  • Disk operation error  and Path Failed Error observed on host end under Errpt log on AIX host-

Errpt: - 
IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
DCB47997   0912191024 T H hdisk33        DISK OPERATION ERROR
F31FFAC3   0912191024 I H hdisk33        PATH HAS RECOVERED
DE3B8540   0912190924 P H hdisk33        PATH HAS FAILED
F31FFAC3   0912190924 I H hdisk38        PATH HAS RECOVERED
DCB47997   0912190824 T H hdisk38        DISK OPERATION ERROR
DE3B8540   0912190824 P H hdisk38        PATH HAS FAILED
DCB47997   0912190824 T H hdisk31        DISK OPERATION ERROR

 

  • Issue got auto-recovered without any actions taken from any of the devices.
  • On Storage end, we could see CRC and ITW errors being reported which correlate with the disk operation error timestamp on host end.
    • ITW Errors are reported when a frame is dropped.
    • Cyclic Redundancy Check (CRC) Error gets reported when data or any frame is corrupted.

ITW&CRC.png

  • Additionally, we could see WQE Errors with extended status 1d being reported in EMS during the issue time.

Log snippet-

Thu Sep 12 18:59:35 +0530 [NetApp-02: fct_tpd_work_thread_0: fcp.io.status:debug]: STIO Adapter:2b IO WQE failure, Handle 0x1, Type 8, S_ID: 66F240, VPI: 275, OX_ID: 1ECE, Status 0x3 Ext_Status 0x2
Thu Sep 12 19:01:06 +0530 [NetApp-02: fct_tpd_work_thread_0: fcp.io.status:debug]: STIO Adapter:2b IO WQE failure, Handle 0x1, Type 8, S_ID: 66F8C0, VPI: 275, OX_ID: 9AD, Status 0x3 Ext_Status 0x1d
Thu Sep 12 19:02:06 +0530 [NetApp-02: fct_tpd_work_thread_0: fcp.io.status:debug]: STIO Adapter:2b IO WQE failure, Handle 0x1, Type 8, S_ID: 66F240, VPI: 275, OX_ID: 2E76, Status 0x3 Ext_Status 0x1d


 

  • On switch end, there are no physical layer issues or errors reported on storage and host connected switch ports.
  • SFP stats are in optimal range as per sfpshow output-

=============
Slot 12/Port 18:
=============

RX Power:    -0.6    dBm (880.40uW)
TX Power:    0.4     dBm (1087.60 uW)

 

  • No port flaps observed in the fabriclog output.
  • Verified the ISL ports and could not find any errors in the errdump.
  • No maps alerts triggered for the affected ports.
  • All the reported errors in the porterrshow are historic , since the port stats were not cleared in the past 6 months.

fabos/bin/switchshow :
Index Slot Port Address Media  Speed        State    Proto
============================================================
242   12   18   66f240   id    N32     Online      FC  F-Port  10:00:00:10:9b:9e:xx:xx
 368   12   32   66f8c0   id    N32     Online      FC  F-Port  10:00:00:10:9b:9e:xx:xx

/fabos/cliexec/porterrshow :
          frames        enc     crc     crc     too     too     bad     enc    disc    link    loss    loss    frjt    fbsy     c3timeout     pcs      uncor
        tx       rx      in     err     g_eof   shrt    long    eof     out    c3      fail    sync    sig                      tx      rx      err     err
242:  341.0g   71.2g    0       0       0       0       0       0       0      89       0       0       0       0       0       0       0       0       0
368:  341.0g   71.2g    0       0       0       0       0       0       0      81       0       0       0       0       0       0       0       0       0

 

 

  • There are no ITW errors or CRC errors reported on the host or target connected port on switch, which indicates that the frames got corrupted once it left the switch , just before it reached the target.
  • The target (NetApp) received the frames in order, but one or more frames in a sequence were corrupted (CRC),  and it is dropped by lower layers of NetApp HBA, and it is detected on  the FCP/SCSI layers that one or more frames are missing in the form of WQE errors on NetApp.
  • Since the frames were corrupted and host did not get the response or acknowledgement for those frames from target, hence host would start to report disk operation error.

 

  • As a temporary workaround, you can disable the port, to ensure that no frames are being passed via that path in case the host has not performed the path failover on its own.

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.