Server retries IO due to faulty SAN switch ISL
Applies to
- ONTAP 9
- Windows Server
- Brocade FC switches
- Inter-switch link (ISL) between storage and server
Issue
- Windows reports that IOs are retried:
The IO operation at logical block address 0x620e0080 for Disk 10 (PDO name: \Device\MPIODisk8) was retried.
The IO operation at logical block address 0xecf5b00 for Disk 11 (PDO name: \Device\MPIODisk9) was retried.
- Affected server is connected to storage via ISL between two fabrics
- ISL port is flapping and frame timeouts are happening:
[C5-1014], 609, CHASSIS | PORT 0/47, WARNING, BrocadeG720, Link Reset on Port S0,P47(16) vc_no=3 crd(s)lost=16 auto trigger.
[AN-1014], 623, FID 128, INFO, SR2-BR-ST2, Frame timeout detected, tx port 47 rx port 15, sid 20f00, did 90501, timestamp 2024-08-27 08:25:42 .
- ONTAP sees hung commands for the affected source FCID:
STIO Adapter:0h, found hung cmd:0xfffff8092a021000(state=5, flags=0x0, ctio_sent=1/1,RecvExAddr=0x11f4f0, OX_ID=0x18b, RX_ID=0xffff,SID=0x20f00, Cmd[2A], req_q_free:721)
STIO Adapter:0h, found hung cmd:0xfffff8092a049ac0(state=5, flags=0x0, ctio_sent=1/1,RecvExAddr=0x128670, OX_ID=0x1d7, RX_ID=0xffff,SID=0x20f00, Cmd[8A], req_q_free:4089)