VMware path failures and FCP command failures due to faulty switch end SFP
Applies to
- VMware
- FCP
- ONTAP
- Brocade switch
Issue
- VMware storage paths are flapping generating the alert,
Alarm 'Cannot connect to storage' on xxx.xx.x.xx triggered by event 49104432 'Path redundancy to storage device naa.600a09803831460000000000000000 degraded. Path vmhba1:XX:XX:Xxx is down. Affected datastores: XX_NETAPP_SANxx.'
- Storage adapter port login connected with VMware recording below link break errors in EMS,
[?] Sat Mar 23 02:16:10 +0000 [CLUSTER1-01: fct_tpd_work_thread_0: scsitarget.slifct.linkBreak:error]: Link break detected on Fibre Channel target HBA 2a with event status 1 , topology type 1, status1 0x0, status2 0x0.
[?] Sat Mar 23 02:16:11 +0000 [CLUSTER1-01: fct_tpd_work_thread_0: scsitarget.hwpfct.linkUp:notice]: Link up on Fibre Channel target adapter 2a.
[?] Sat Mar 23 02:16:13 +0000 [CLUSTER1-01: fct_tpd_work_thread_0: scsitarget.fct.portLogin:notice]: Login at target FC port: '2a' by initiator port: '10:00:00:62:xx:xx:xx:xx address 0x00000. The target virtual port is: 'NetApp FC Target Port (LPe00000) SVM:n1_fc_2a'.
- Ontap system manager has LIF operationally down errors for lifs of adapter port (home port).
errdump -ahas 'Frame timeout' events on the switch port connected to storage adapter port.
2024/03/20-02:40:17, [AN-1014], 2665374, FID 128, INFO, SBI-LIFE-G620-R4U27-SANSW1, Frame timeout detected, tx port 16 rx port 19, sid 11315, did 11001, timestamp 2024-03-20 02:40:17 .
- porterrshow have high disc C3, link fail and loss sync errors,
porterrshow :
frames enc crc crc too too bad enc disc link loss loss frjt fbsy c3timeout pcs uncor
tx rx in err g_eof shrt long eof out c3 fail sync sig tx rx err err
16: 3.1g 2.1g 0 151 17 7 0 135 0 1.8m 2.4m 4.8m 644 0 0 1.8m 0 28.7k 32.3k
- RX and TX of concerned switch port are with-in the optimal range.
=============
Port 16:
=============
RX Power: -2.1 dBm (613.4uW)
TX Power: -0.4 dBm (914.7 uW)
