High bus overruns and scsi.cmd.checkCondition:errors due to incorrectly applied RCF
Applies to
- ONTAP 9
- N9K-9336C
- Storage RCF 1.11
Issue
- Issue is seen on both nodes and both ports
- scsi.cmd.checkConditions and retrySuccess seen.
[?] Thu Aug 15 09:10:11 -0400 [node2: scsi_cmdblk_strthr_admin: scsi.cmd.checkCondition:error]: Disk device e5b.02.1.14L0: Check Condition: CDB 0x28:15e289a3:001d: Sense Data SCSI:aborted command - (0xb - 0x90 0x2 0xfc)(8964).
[?] Thu Aug 15 09:10:11 -0400 [node2: scsi_cmdblk_strthr_admin: scsi.cmd.checkCondition:error]: Disk device e5b.02.1.14L0: Check Condition: CDB 0x28:15e58a3b:0019: Sense Data SCSI:aborted command - (0xb - 0x90 0x2 0xfc)(8964).
[?] Thu Aug 15 09:10:11 -0400 [node2: scsi_cmdblk_strthr_admin: scsi.cmd.checkCondition:error]: Disk device e5b.02.1.14L0: Check Condition: CDB 0x28:15e58a54:0027: Sense Data SCSI:aborted command - (0xb - 0x90 0x2 0xfc)(8964).
[?] Thu Aug 15 09:10:11 -0400 [node2: scsi_cmdblk_strthr_admin: scsi.cmd.checkCondition:error]: Disk device e5b.02.1.14L0: Check Condition: CDB 0x28:15e82ccc:0018: Sense Data SCSI:aborted command - (0xb - 0x90 0x2 0xfc)(8964).
[?] Thu Aug 15 09:10:12 -0400 [node2: scsi_cmdblk_strthr_admin: scsi.cmd.retrySuccess:debug]: Disk device e5a.02.0.17L0: request successful after retry #1/#0: cdb 0x28:15ea6d40:001b (10000).
[?] Thu Aug 15 09:10:12 -0400 [node2: scsi_cmdblk_strthr_admin: scsi.cmd.retrySuccess:debug]: Disk device e5a.02.0.17L0: request successful after retry #1/#0: cdb 0x28:15ea6d5b:0025 (9995).
[?] Thu Aug 15 09:10:12 -0400 [node2: scsi_cmdblk_strthr_admin: scsi.cmd.retrySuccess:debug]: Disk device e5a.02.0.16L0: request successful after retry #1/#0: cdb 0x28:15e8f0cc:0040 (10001).
[?] Thu Aug 15 09:10:12 -0400 [node2: scsi_cmdblk_strthr_admin: scsi.cmd.retrySuccess:debug]: Disk device e5a.02.0.16L0: request successful after retry #1/#0: cdb 0x28:15e83d7a:0015 (9985).
[?] Thu Aug 15 09:10:12 -0400 [node2: scsi_cmdblk_strthr_admin: scsi.cmd.retrySuccess:debug]: Disk device e5a.02.0.16L0: request successful after retry #1/#0: cdb 0x28:15e83d8f:002b (9985).
- NVME Timeouts and latency seen
[?] Sat Aug 10 01:38:40 -0400 [node2: intr: nvmeof.timeout:notice]: Timeout on subnqn nqn.2014-08.org.nvmexpress.discovery, controller ID 722, qpair ID 0, sequence number 3306.
[?] Sat Aug 10 01:38:43 -0400 [node2: intr: nvmeof.timeout:notice]: Timeout on subnqn nqn.2014-08.org.nvmexpress:144d:144d:S612NE0WC13303:X4020S173A15TNQF, controller ID 101, qpair ID 0, sequence
- Priority flow control is off on switch
`show interface priority-flow-control` slot 1
======= ============================================================
Port Mode Oper(VL bmap) RxPPP TxPPP
============================================================
Ethernet1/1 Off Off 1131113579 0
Ethernet1/2 Off Off 1321393611 0
- Discards due to bus overruns on both nodes and both ports
– interface e5a (55 days, 0 hours, 38 minutes, 34 seconds) –
RECEIVE
Total frames: 67748k | Frames/second: 14 | Total bytes: 32288m
Bytes/second: 6791 | Total errors: 0 | Errors/minute: 0
Total discards: 345m | Discards/minute: 4354 | Multi/broadcast: 32095k
Non-primary u/c: 0 | Errored frames: 0 | Unsupported Op: 0
CRC errors: 0 | Runt frames: 0 | Fragment: 0
Long frames: 0 | Jabber: 0 | Length errors: 0
Alignment errors: 0 | No buffer: 0 | Pause: 0
Jumbo: 4300k | Error symbol: 0 | Bus overruns: 345m
Queue drops: 0 | LRO segments: 0 | LRO bytes: 0
LRO6 segments: 34471k | LRO6 bytes: 19252m | Bad UDP cksum: 0
Bad UDP6 cksum: 0 | Bad TCP cksum: 0 | Bad TCP6 cksum: 0
Mcast v6 solicit:28595k | Lagg errors: 0 | Lacp errors: 0
Lacp PDU errors: 0
– interface e5b (55 days, 0 hours, 38 minutes, 31 seconds) –
RECEIVE
Total frames: 43104k | Frames/second: 9 | Total bytes: 17564m
Bytes/second: 3694 | Total errors: 0 | Errors/minute: 0
Total discards: 341m | Discards/minute: 4310 | Multi/broadcast: 32095k
Non-primary u/c: 0 | Errored frames: 0 | Unsupported Op: 0
CRC errors: 0 | Runt frames: 0 | Fragment: 0
Long frames: 0 | Jabber: 0 | Length errors: 0
Alignment errors: 0 | No buffer: 0 | Pause: 0
Jumbo: 967k | Error symbol: 0 | Bus overruns: 341m
Queue drops: 0 | LRO segments: 0 | LRO bytes: 0
LRO6 segments: 9912k | LRO6 bytes: 4639m | Bad UDP cksum: 0
Bad UDP6 cksum: 0 | Bad TCP cksum: 0 | Bad TCP6 cksum: 0
Mcast v6 solicit:28594k | Lagg errors: 0 | Lacp errors: 0
Lacp PDU errors: 0