Adapter timeouts causing lun disconnects
Applies to
- ONTAP 9
- Brocade switch
- Fibre Channel Protocol (FCP)
- Windows Host
- ESXi Host
- QLogic adapters on storage
- Fabric Performance Impact Notifications (FPIN)
Issue
- LIF registration with Fabric name server manager is not "NS Registration Done", may see timeout or failed
net int show -vserver * -data-protocol fcp -fields status-oper,status-extended
- Hosts lose LUN after reboot
- Hosts are configured with four paths to storage, yet LUNs are visible through only one path
- Zoning and configuration adhere to NetApp's recommendations, with the receive (Rx) and transmit (Tx) rates for the ports on both the switch and storage ends being within the optimal range
- Although the FC ports appear online on the NetApp end, no data transfer is occurring through these ports:
cluster::*> statistics port fcp show
cluster : 4/12/2024 11:14:02
NVMf NVMf NVMf NVMf NVMf NVMf NVMf NVMf NVMf NVMf
*Read Write Other Total Read Write CAW Other Remote Remote CAW Remote Total Remote
Port Ops Ops Ops Ops Ops Ops Ops Ops Read Ops Write Ops Ops Other Ops Ops Total Ops
------- ----- ----- ----- ----- ---- ----- ---- ----- -------- --------- ---- --------- ----- ---------
port.1b 45 160 30 236 0 0 0 0 0 0 0 0 0 0
port.1a 19 676 26 721 0 0 0 0 0 0 0 0 0 0
port.1b 14 43 47 105 0 0 0 0 0 0 0 0 0 0
port.1a 14 149 19 183 0 0 0 0 0 0 0 0 0 0
port.10b
0 0 0 0 0 0 0 0 0 0 0 0 0 0
port.10b
0 0 0 0 0 0 0 0 0 0 0 0 0 0
port.10a
0 0 0 0 0 0 0 0 0 0 0 0 0 0
port.10a
0 0 0 0 0 0 0 0 0 0 0 0 0 0
- On ports where there are no I/O operations, LUNs are not visible on the host end through those ports
- New FC LIF's created do not come up with
operational status up
- Host unable to connect to LUN until takeover (TO)/ Giveback (GB) done on ONTAP
- LUN disconnected from hosts after ONTAP upgrade
- Rebooting host does not resolve issue
- Initiator will report
not logged in
state:
A22xxxG1::*> igroup show COKHCH1xx10 -v
Vserver Name: sxx0
Igroup Name: COKHCxxL10
Protocol: mixed
OS Type: vmware
Portset Binding Igroup: -
Initiators: 50:0x:0x:00:0x:cx:7e:2x
50:0x:0x:00:0x:cx:7e:2x
Child Igroups: -
Igroup UUID: c5ec904e-18xx-11ed-bbxx-d039ea903bxx
ALUA: true
Initiators: 50:0x:0x:00:0x:cx:7e:2x (not logged in)
50:0x:0x:00:0x:cx:7e:2x (logged in)
Vserver UUID: 2ef579xx-18b5-11xx-bbxx-d039ea903bxx
...
Igroup Comment:
- Several adaptors on ONTAP report timeout causing disconnection to multiple hosts:
cluster01::> network fcp adapter show -node node1 -adapter Xa
Error: show failed: Timeout while getting fabric information
cluster01::> network fcp adapter show -node node01 -adapter Xb
Error: show failed: Timeout while getting fabric information
MGWD.log
timeout messages observed:
Example:[kern_mgwd:info:2548] 0x83771bf00: 0: ERR: SAN::FCP::ADAPTER_KERNEL: src/tables/san/fcp_adapter_internal.cc:get_imp:95 returning: 418/24 - Timeout while getting fabric information
[kern_mgwd:info:2548] 0x83771bf00: 0: ERR: SAN::FCP::ADAPTER: src/tables/san/fcp_adapter.cc:get_imp:719 returning: 418/24 - Timeout while getting fabric information
[kern_mgwd:info:2548] 0x83771bf00: 0: ERR: NET::VIF::SAN: src/tables/san/net_vif_san.cc:populateFcpPortmap:991 Failed getting the FCP port on node netapp01 for lif lif01: Timeout while getting fabric information
- Down/UP port from ONTAP temporarily resolves, but the issue returns in an hour or two
- Down/UP port at the switch side does not resolve
- Host may log error messages similar to the following:
May 16 15:41:28 Host_name: qla2xxx [0000:b1:00.0]-5037:11: Async-login failed: handle=d pid=011703 wwpn=XX:XX:XX:XX:XX:XX:XX:XX comp_status=31 iop0=18 iop1=92900