SAN host does not failover I/O to alternate paths when one or more paths are removed and an RSCN is not received
Applies to
SAN environments using an FC or FCoE fabric, with registered state change notifications (RSCNs) disabled or otherwise failing to send due to a known or new defect.
Known Defects:
Cisco Bug CSCuw60947: Affects Cisco Nexus 5500, 5600, or 6000 switches running NX-OS 7.1(3)N1(1) to 7.1(3)N1(4).
Note: The severity and exact symptoms of RSCN related issues will vary depending on host operating system and host multipath configuration. This particular KB was written based on observations of a Windows 2012R2 host, in combination with Cisco defect CSCuw60947.
Issue
Check Active IQ if this impacts your systems
In Windows, if one or more paths to the storage target are removed, I/O hangs indefinitely until the path(s) are restored. If left in this state long enough, NTFS will eventually timeout often leaving the host in hung state. The issue can be triggered by the removal of an Active/Optimized or Active/Unoptimized path, and is observed regardless if there are Active/Optimized or Active/Unoptimized paths remaining.
Path removal mechanisms known to trigger the issue include a NetApp node rebooting (i.e. upgrade), or manually placing the target LIF or physical port offline. Additionally, if missing or new path(s) are added after the host operating system is booted, host-side rescans may not automatically detect the new path(s). Generally, resetting the host port or rebooting the host will allow the host to detect any added paths.