Skip to main content
NetApp Knowledge Base

ESXi hosts intermittently lose access to iSCSI datastores

Views:
768
Visibility:
Public
Votes:
0
Category:
fabric-interconnect-and-management-switches
Specialty:
san
Last Updated:

Applies to

  • NetApp FAS/AFF storage
  • VMware ESXi hosts
  • iSCSI
  • Cisco IP switches connecting ESXi hosts and NetApp storage

Issue

  • ESXi hosts intermittently lose access to iSCSI datastores.
  • Host reports errors similar to the following :

Apr  7 06:10:42 va1plxld001 iscsid: iscsid: Kernel reported iSCSI connection 12:0 error (1022 - ISCSI_ERR_NOP_TIMEDOUT: A NOP has timed out) state (3)
Apr  7 06:10:42 va1plxld001 iscsid: iscsid: connection14:0 is operational after recovery (3 attempts)
Apr  7 06:10:44 va1plxld001 kernel: connection11:0: ping timeout of 5 secs expired, recv timeout 5, last rx 6465840173, last ping 6465845184, now 6465850192
Apr  7 06:10:44 va1plxld001 kernel: connection11:0: detected conn error (1022)
Apr  7 06:10:44 va1plxld001 kernel: connection10:0: ping timeout of 5 secs expired, recv timeout 5, last rx 6465840173, last ping 6465845184, now 6465850192
Apr  7 06:10:44 va1plxld001 kernel: connection10:0: detected conn error (1022)
Apr  7 06:10:44 va1plxld001 iscsid: iscsid: Kernel reported iSCSI connection 11:0 error (1022 - ISCSI_ERR_NOP_TIMEDOUT: A NOP has timed out) state (3)
Apr  7 06:10:44 va1plxld001 iscsid: iscsid: Kernel reported iSCSI connection 10:0 error (1022 - ISCSI_ERR_NOP_TIMEDOUT: A NOP has timed out) state (3)
Apr  7 06:10:45 va1plxld001 iscsid: iscsid: connection23:0 is operational after recovery (3 attempts)
Apr  7 06:10:45 va1plxld001 iscsid: iscsid: connection24:0 is operational after recovery (3 attempts)
Apr  7 06:10:45 va1plxld001 iscsid: iscsid: connection13:0 is operational after recovery (4 attempts)
Apr  7 06:10:49 va1plxld001 iscsid: iscsid: connection9:0 is operational after recovery (6 attempts)
Apr  7 06:10:50 va1plxld001 iscsid: iscsid: connection19:0 is operational after recovery (6 attempts)
Apr  7 06:10:50 va1plxld001 iscsid: iscsid: connection20:0 is operational after recovery (6 attempts)

  • Below errors are observed in the ESXi logs:

2024-01-11T08:59:02.137Z cpu56:2098325)WARNING: iscsi_vmk: iscsivmk_ConnProcessReject:963: vmhba64:CH:0 T:0 CN:0: iSCSI pdu rejected: itt 0x12c6, opcode TMF Request, reason Immediate Command Reject
2024-01-11T08:59:02.137Z cpu56:2098325)WARNING: iscsi_vmk: iscsivmk_ConnProcessReject:965: Sess [ISID: 00023d000001 TARGET: iqn.1992-08.com.netapp:sn.9345f8e16a7e11exxxfe00a098dbab70:vs.3 TPGT: 402 TSIH: 0]
2024-01-11T08:59:02.137Z cpu56:2098325)WARNING: iscsi_vmk: iscsivmk_ConnProcessReject:966: Conn [CID: 0 L: 10.164.60.64:17687 R: 10.164.56.69:3260]
2024-01-11T08:59:02.137Z cpu56:2098325)WARNING: iscsi_vmk: iscsivmk_ConnProcessReject:991: vmhba64:CH:0 T:0 CN:0: Rejected TMF Task not found: itt 0x12c6
2024-01-11T08:59:02.137Z cpu56:2098325)WARNING: iscsi_vmk: iscsivmk_ConnProcessReject:992: Sess [ISID: 00023d000001 TARGET: iqn.1992-08.com.netapp:sn.9345f8e16a7e11exxxfe00a098dbab70:vs.3 TPGT:

  • Performance issues observed on the VMs on NetApp luns  and a few of them are hosted as webservers like apache, tomcat
  • High amount of CRC observed on the ontap ethernet ports

-- interface  e0c  (421 days, 4 hours, 17 minutes, 52 seconds) --RECEIVE
 Total frames:      220g | Frames/second:    6059  | Total bytes:       296t
 Bytes/second:     8149k | Total errors:    38204k | Errors/minute:      63
 Total discards:      1  | Discards/minute:     0  | Multi/broadcast:   618m
 Non-primary u/c:     0  | CRC errors:      38204k | Runt frames:         0
 Long frames:         0  | Length errors:    4116  | Alignment errors:    0
 No buffer:           1  | Pause:               0  | Jumbo:               0
 Noproto:             0  | Bus overruns:        0  | LRO segments:      181g
 LRO bytes:         259t | LRO6 segments:       0  | LRO6 bytes:          0
 Bad UDP cksum:       0  | Bad UDP6 cksum:      0  | Bad TCP cksum:    4800
 Bad TCP6 cksum:      0  | Mcast v6 solicit:    0  | Lagg errors:         0
 Lacp errors:         0  | Lacp PDU errors:     0

  • No erroneous events reported in the ems logs except iscsi login events.

Mon Apr 7 10:05:04 EDT [netapp-01: iswt_admin_thread: iscsi.notice:notice]: ISCSI: New session from initiator iqn.1994-05.com.redhat:902acxxxc6f4 at IP addr 10.xx.22x.35
Mon Apr 7 10:05:16 EDT [netapp-01: iswt_admin_thread: iscsi.notice:notice]: ISCSI: New session from initiator iqn.1994-05.com.redhat:902acxxxc6f4 at IP addr 10.xx.22x.36

  • For fault isolation, provided multipating is configured we can offline the ethernet port where CRC errors are reported , post disabling the port CRC are incremented on the partner node ethernet port , indicating issue with upstream to ontap

 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.