Poor performance on hosts due to faulty switch SFP
Applies to
- ONTAP 9
- SAN
- FC
Issue
- Poor performance is observed on hosts/application end where NetApp LUN is mapped.
- Only a single host or Hypervisor is impacted.
- Applications report I/O device error eventually leading to job failure :
- Very high latency peaks are observed on VMware.
- Below events are observed on VMware end.
2023-06-13T21:04:01.379Z cpu0:2097809)NMP: nmp_ThrottleLogForDevice:3867: Cmd 0x2a (0x45b9547879c8, 5247624) to dev "naa.600a09803830464b522b4cxxxxxxxxxx" on path "vmhba1:C0:xx:L2x" Failed:
2023-06-13T21:04:01.379Z cpu0:2097809)ScsiDeviceIO: 4124:Cmd(0x45b9547879c8) 0x2a, CmdSN 0x800e0009 from world 5247624 to dev "naa.600a09803830464b522b4cxxxxxxxxxx" failed H:0x8 D:0x0 P:0x0
- When perf archives are analyzed on NetApp end, it is observed that latency is less for the affected LUN and node CPU utilization reported is less too.
- From performance perspective, everything is fine on storage end.
- Moving the volume to another node resolves the problem.
- EMS logs report IO WQE or AEN 0x8048 errors are observed for the impacted port or the there can be no errors seen also
Tue Jun 13 18:08:59 +0200 [sdea-nas-p04c: fct_tpd_work_thread_0: fcp.io.status:debug]: STIO Adapter:10a IO WQE failure, Handle 0x4, Type 8, S_ID: CB0C00, VPI: 3, OX_ID: 5A6, Status 0x3 Ext_Status 0x1d
Tue Jun 13 18:09:31 +0200 [sdea-nas-p04c: fct_tpd_work_thread_0: fcp.io.status:debug]: STIO Adapter:10a IO WQE failure, Handle 0x4, Type 8, S_ID: B2D02, VPI: 3, OX_ID: 707, Status 0x3 Ext_Status 0x1d
Tue Jun 13 18:08:59 +0200 [sdea-nas-p04c: fct_tpd_work_thread_0: fcp.io.status:debug]: STIO Adapter:10a AEN 0x8048 (RECV_ERROR) MboxStatus1 0x1000 MboxStatus2 0xc1
Tue Jun 13 18:08:59 +0200 [sdea-nas-p04c: fct_tpd_work_thread_0: fcp.io.status:debug]: STIO Adapter:10a AEN 0x8048 (RECV_ERROR) MboxStatus1 0x1008 MboxStatus2 0x44
Tue Jun 13 18:08:59 +0200 [sdea-nas-p04c: fct_tpd_work_thread_0: fcp.io.status:debug]: STIO Adapter:10a AEN 0x8048 (RECV_ERROR) MboxStatus1 0x1003 MboxStatus2 0x44
- This port 10a is part of an active connection between host and affected LUN.
- Low Rx is reported on this port on the storage end :
Adapter 10a
Received Optical Power 286.3 (uWatts)
SFP Transmitted Optical Power 835.8 (uWatts)
- Low Rx indicates an issue either with cable or upstream to the device.
- Lox Tx power reported on the connected switch port :