"IO operation error" reported on host with FC LUN
Applies to
- Ontap
- FC
- Brocade Fabric OS v7.4.2d
- Windows
- Aix
Issue
IO operation Error 153
reported on Windows host andDisk Operation Error
on AIX end, path and LUN becomes inaccessible for couple of minutes and auto recovers.- Windows host generates disk pause error at host end, along with IO lag observed and host cannot read or write to the disk.
- Aix host reports path flapping on the host end and the disk is inaccessible on that path where the disk operation error is reported.
- Storage reports
IO WQE error
with extended status0x2
or0x1d
on the same FC port on the storage end. - Both the hosts are using the same FC port to access storage.
Tue Nov 30 20:17:47 +07 [NetApp: fct_tpd_work_thread_0: fcp.io.status:debug]: STIO Adapter:2b IO WQE failure, Handle 0x1, Type 8, S_ID: 79Dxx0, VPI: 275, OX_ID: B63, Status 0x3 Ext_Status 0x1d
Mon Sep 26 13:26:52 +07 [NetApp: fct_tpd_work_thread_0: fcp.io.status:debug]: STIO Adapter:2b IO WQE failure, Handle 0x1, Type 8, S_ID: 79Dxy0, VPI: 275, OX_ID: 8AD, Status 0x3 Ext_Status 0x1d
Tue Jan 03 16:36:22 +07 [NetApp: fct_tpd_work_thread_0: fcp.io.status:debug]: STIO Adapter:2b IO WQE failure, Handle 0x1, Type 8, S_ID: 79Dxx0, VPI: 275, OX_ID: DA3, Status 0x3 Ext_Status 0x2
Fri Jan 13 12:05:37 +07 [NetApp: fct_tpd_work_thread_0: fcp.io.status:debug]: STIO Adapter:2b IO WQE failure, Handle 0x1, Type 8, S_ID: 79Dxy0, VPI: 275, OX_ID: 459, Status 0x3 Ext_Status 0x1d
- Except for IO WQE errors no other erroneous events are seen on Storage side.
- On Brocade switch side, Long distance E-port is manually disabled and too many zones associated with port where IO WQE errors were seen on storage side.
switchshow :
switchName: XXXXY
switchType: 62.0
switchState: Online
switchMode: Native
switchRole: Principal
switchDomain: 102
Index Slot Port Address Media Speed State Proto
============================================================
150 2 22 661400 id 8G No_Light FC LS Disabled (Persistent)
- There are signs of port flapping and
ELP,EFP
rejects seens underFabriclog
output on Brocade switch:
00:19:55.386609 *ELP Snd ACC:rev=2,flow=1,flen=80,sz=164 .. F0,P1 F0,P2 166 0x8937
00:19:55.386610 op_mode=0x5580 F0,P1 F0,P2 166 0x8937
00:19:55.386803 BF ACC Rcv F0,P3 F0,P3 318 0x6ad3
00:19:55.386852 SCN Domain 102 invalid F0,NA F0,NA NA NA
00:19:55.386877 ELP RJT Rcv - ct prfrm,in prgs,vu=0 F0,P2 F0,P2 166 0x6ad4
00:19:55.391604 SCN Port Offline;g=0x1c F0,P2 F0,P0 150 NA
00:19:55.391611 *Removing all nodes from port F0,P0 F0,P0 150 NA
Consider the following before moving ahead
- IF the FOS version is old, should consider an upgrade. Upgrading the FOS will help fix lot of known burts and issues on the Fabric.
- With too many zones connected to lot of devices and if there is any issue with inter-switch happens that will most likely result in an out of order frame issue we are observing here
- Clean up the unnecessary zones and keep only necessary zones
- NetApp recommends 1:1 zoning, ensure that both AIX host & windows host are not zoned to the same target port.
- Both status
0x02
and0x1d
indicate out of order FC frame sequences, which generally denotes to an issue in fabric.