AFF A400 reporting vifmgr.rpc.nblade.timeouts:error due to fatal error on Slot 3 cluster ports
Applies to
- AFF A400
- ONTAP 9.7 - ONTAP 9.7P6
Issue
- Data services may be impacted causing failed mounts to clients and Applications
- node1's EMS is reporting numerous network related errors in EMS and VIFMgr
Sun Jun 14 23:59:05 -0700 [node1: vifmgr: vifmgr.rpc.nblade.timeouts:error]: The Logical Interface Manager (VIFMgr) is not receiving responses from the nblade.
- Since then we see these errors on a daily basis.
- Eventually leading to an NFS outage due to a hung system
- As the message is referring to VIFMgr we checked over the VIFMgr logs and noticed there is a process timeout not long before we see the first error in EMS.
00000013.00ffb00c 020caea1 Tue Jun 30 2020 08:20:38 -07:00 [kern_vifmgr:info:6191] [0x813410700] [NbladeWriter::nitroPcpRpcCall] clnt_call idemp RPC timeout (elapsed time: 30s)
00000013.00ffb00d 020caea1 Tue Jun 30 2020 08:20:38 -07:00 [kern_vifmgr:info:6191] [0x813410700] [NbladeWriter::reportHungNblade] Nblade has not responded to nitro RPCs for 1326210 seconds
00000013.00ffb0cd 020caffd Tue Jun 30 2020 08:21:08 -07:00 [kern_vifmgr:info:6191] [0x813410700] [NbladeWriter::nitroPcpRpcCall] clnt_call idemp RPC timeout (elapsed time: 60s)
00000013.00ffb0ce 020caffd Tue Jun 30 2020 08:21:08 -07:00 [kern_vifmgr:info:6191] [0x813410700] [NbladeWriter::nitroPcpRpcCall] long-running operation: procNum=35; time=60024 ms
00000013.00ffb0d0 020caffd Tue Jun 30 2020 08:21:08 -07:00 [kern_vifmgr:info:6191] [0x80bf0dc00] [NbladeWriter::reportHungNblade] Nblade has not responded to nitro RPCs for 1326240 seconds