Adapter timeouts causing lun disconnects

Last updated
Save as PDF
Share
1. Share
2. Tweet
3. Share

Views:: 2,785

Visibility:: Public

Votes:: 2

Category:: ontap-9

Specialty:: san

Last Updated:

Applies to

ONTAP 9
Brocade switch
Fibre Channel Protocol (FCP)
Windows Host
ESXi Host
QLogic adapters on storage
Fabric Performance Impact Notifications (FPIN)

Issue

LIF registration with Fabric name server manager is not "NS Registration Done", may see timeout or failed

net int show -vserver * -data-protocol fcp -fields status-oper,status-extended

Hosts lose LUN after reboot
Hosts are configured with four paths to storage, yet LUNs are visible through only one path
Zoning and configuration adhere to NetApp's recommendations, with the receive (Rx) and transmit (Tx) rates for the ports on both the switch and storage ends being within the optimal range
Although the FC ports appear online on the NetApp end, no data transfer is occurring through these ports:

cluster::*> statistics port fcp show cluster : 4/12/2024 11:14:02 NVMf NVMf NVMf NVMf NVMf NVMf NVMf NVMf NVMf NVMf *Read Write Other Total Read Write CAW Other Remote Remote CAW Remote Total Remote Port Ops Ops Ops Ops Ops Ops Ops Ops Read Ops Write Ops Ops Other Ops Ops Total Ops ------- ----- ----- ----- ----- ---- ----- ---- ----- -------- --------- ---- --------- ----- --------- port.1b 45 160 30 236 0 0 0 0 0 0 0 0 0 0 port.1a 19 676 26 721 0 0 0 0 0 0 0 0 0 0 port.1b 14 43 47 105 0 0 0 0 0 0 0 0 0 0 port.1a 14 149 19 183 0 0 0 0 0 0 0 0 0 0 port.10b 0 0 0 0 0 0 0 0 0 0 0 0 0 0 port.10b 0 0 0 0 0 0 0 0 0 0 0 0 0 0 port.10a 0 0 0 0 0 0 0 0 0 0 0 0 0 0 port.10a 0 0 0 0 0 0 0 0 0 0 0 0 0 0

On ports where there are no I/O operations, LUNs are not visible on the host end through those ports
New FC LIF's created do not come up with operational status up
Host unable to connect to LUN until takeover (TO)/ Giveback (GB) done on ONTAP
LUN disconnected from hosts after ONTAP upgrade
Rebooting host does not resolve issue
Initiator will report not logged in state:

A22xxxG1::*> igroup show COKHCH1xx10 -v Vserver Name: sxx0 Igroup Name: COKHCxxL10 Protocol: mixed OS Type: vmware Portset Binding Igroup: - Initiators: 50:0x:0x:00:0x:cx:7e:2x 50:0x:0x:00:0x:cx:7e:2x Child Igroups: - Igroup UUID: c5ec904e-18xx-11ed-bbxx-d039ea903bxx ALUA: true Initiators: 50:0x:0x:00:0x:cx:7e:2x (not logged in) 50:0x:0x:00:0x:cx:7e:2x (logged in) Vserver UUID: 2ef579xx-18b5-11xx-bbxx-d039ea903bxx ... Igroup Comment:

Several adaptors on ONTAP report timeout causing disconnection to multiple hosts:

Example:

cluster01::> network fcp adapter show -node node1 -adapter Xa

Error: show failed: Timeout while getting fabric information

cluster01::> network fcp adapter show -node node01 -adapter Xb

Error: show failed: Timeout while getting fabric information

MGWD.log timeout messages observed:

Example:[kern_mgwd:info:2548] 0x83771bf00: 0: ERR: SAN::FCP::ADAPTER_KERNEL: src/tables/san/fcp_adapter_internal.cc:get_imp:95 returning: 418/24 - Timeout while getting fabric information [kern_mgwd:info:2548] 0x83771bf00: 0: ERR: SAN::FCP::ADAPTER: src/tables/san/fcp_adapter.cc:get_imp:719 returning: 418/24 - Timeout while getting fabric information [kern_mgwd:info:2548] 0x83771bf00: 0: ERR: NET::VIF::SAN: src/tables/san/net_vif_san.cc:populateFcpPortmap:991 Failed getting the FCP port on node netapp01 for lif lif01: Timeout while getting fabric information

Down/UP port from ONTAP temporarily resolves, but the issue returns in an hour or two
Down/UP port at the switch side does not resolve
Host may log error messages similar to the following:

May 16 15:41:28 Host_name: qla2xxx [0000:b1:00.0]-5037:11: Async-login failed: handle=d pid=011703 wwpn=XX:XX:XX:XX:XX:XX:XX:XX comp_status=31 iop0=18 iop1=92900