Any version of ONTAP
A production outage can occur in a SAN environment that is connected with NetApp storage in the backend. Hosts may report high latency, or that they cannot connect to storage at all.
- Adapter resets may be displayed for ports on the filer.
- The repeated STIO TPD cmd alloc failed messages indicate a queue full condition in the filer.
- The initiator(s) seem to be overloading this port. Eventually, the adapter is reset because a terminate CTIO is sent to the firmware (probably for an ABTS) and after not receiving a response to it for some time, it is flagged as a hungcommand.
- Hitting the queue full condition means the filer resources have been exhausted.
- The filer cannot process IO in a timely manner until it can free up resources in theSCSI queue
The following is reported in the EMS logs:
Thu Aug 04 02:40:16 EDT [Filer1: fct_tpd_thread_1: fcp.io.status:debug]: STIO Adapter:0g LUN:13DROPPED Cmd[2A] from SID:240942 VP:2 on OXID:000766 RxEx 0xffffffff (cmd allocation exceeded)at 59908347756
Fri Aug 31 22:56:12 EDT [Filer1: fct_tpd_thread_3: fcp.io.status:debug]: STIO TPD cmd alloc threshold reached handle:3 taskflags:0 Active commands:1945 threshold:1945
- Host may report different messages depending on the OS.
- For VMWare:
2016-08-04T07:54:36.573Z cpu30:32859)NMP: nmp_ThrottleLogForDevice:2349: Cmd 0x1a(0x413680446300, 0) to dev "mpx.vmhba32:C0:T0:L0" on path "vmhba32:C0:T0:L0" Failed: H:0x0D:0x28 P:0x0 Valid sense data: 0x0 0x0 0x0. Act:NONE 2016-08-04T07:58:21.779Z cpu9:33712)NMP: nmp_ThrottleLogForDevice:2349: Cmd 0x88(0x412e80451080, 36817) to dev "naa.600a09803830345a785d46414548504d" on path"vmhba1:C0:T1:L34" Failed: H:0x0 D:0x8 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:EVAL
- For Windows
8/31/2018,22:55:7,454,Warning,WindowsHost1,ontapdsm,61141,"IO error: SCSI Queue FULL reported on LUN 0 on Path Id 03000002. The IO will be retried."
- For RHEL:
Aug 21 10:55:31 hostname kernel: qla2xxx [0000:0e:00.0]-3820:1: QUEUE FULL detected.