Performance Impact due to FCP Queue Depth Threshold Reached
Applies to
- ONTAP 9
- Fibre Channel (FCP)
Issue
- A production outage may occur in the form of extremely high latency similar to performance issue
- Adapter resets shortly after queues threshold reached
- The repeated STIO TPD cmd alloc failed messages indicate a queue full condition
- High latency observed on network sub-system from qos statistics volume latency show during client based backup jobs for example
Example EMS log:
fcp.io.status:debug]: STIO Adapter:0g LUN:13DROPPED Cmd[2A] from SID:240942 VP:2 on OXID:000766 RxEx 0xffffffff (cmd allocation exceeded)at 59908347756
Tue Jul 16 11:30:56 +0700 [Node01: fct_tpd_work_thread_0: fcp.io.status:debug]: STIO Adapter:0e AEN 0x8048 (RECV_ERROR) MboxStatus1 0x1200 MboxStatus2 0x44
fct_tpd_thread_3: fcp.io.status:debug]: STIO TPD cmd alloc threshold reached handle:3 taskflags:0 Active commands:1945 threshold:1945
fct_tpd_work_thread_0: scsitarget.fct.port.thresh:notice]: FC target port 2d has 1946 outstanding commands, which exceeds the threshold 1945 for this port.
fcp.io.status:debug]: STIO Adapter:3a, found hung cmd:0xfffff81e7a302e40(state=7, flags=0x0, ctio_sent=1/2,RecvExAddr=0x1c4, OX_ID=0x462f, RX_ID=0xffff,SID=0x5c0f14, Cmd[2A], req_q_free:0)
fct_tpd_thread_4: scsitarget.fcp.dump:debug]: FCP target SRAM dump generated for adapter 3a, fct_tpd_check_hung_commands: Command termination hung. cmd:0xfffff81e7a3534f8 (state=0xa, flags=0x2,ctio_sent=2/2, RecvExAddr=0xa7b, OX_ID=0x16b, RX_ID=0xffff, SID=0xc1402)
fct_tpd_thread_4: scsitarget.hwpfct.dump.saved:notice]: A dump for adapter 3a was stored in /etc/log/fctsli_3a_1234/fct_fw_3a.dmp.gz.
- Higher IO wait time, path errors, timeouts may be seen on Hosts
- For VMWare:
NMP: nmp_ThrottleLogForDevice:2349: Cmd 0x1a(0x413680446300, 0) to dev "mpx.vmhba32:C0:T0:L0" on path "vmhba32:C0:T0:L0" Failed: H:0x0D:0x28 P:0x0 Valid sense data: 0x0 0x0 0x0. Act:NONE
NMP: nmp_ThrottleLogForDevice:2349: Cmd 0x88(0x412e80451080, 36817) to dev "naa.600a123" on path"vmhba1:C0:T1:L34" Failed: H:0x0 D:0x8 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:EVAL
- For Windows:
454,Warning,WindowsHost1,ontapdsm,61141,"IO error: SCSI Queue FULL reported on LUN 0 on Path Id 03000002. The IO will be retried."
- For RHEL:
Aug 21 10:55:31 hostname kernel: qla2xxx [0000:0e:00.0]-3820:1: QUEUE FULL detected.