Skip to main content
NetApp Knowledge Base

"DISK REDUNDANCY FAILED" due to transceiver issue

Views:
46
Visibility:
Internal
Votes:
0
Category:
metrocluster
Specialty:
metrocluster
Last Updated:

Applies to

  • MetroCluster IP
  • Cisco Backend switch

Issue

  1. Error messages:

Tue Sep 03 04:51:31 +0200 [ClusterA-02: wafl_exempt09: mirror.stream.qp.error:debug]: params: {'mirror': 'DR PARTNER', 'qp_name': 'WAFL', 'error': 'NVMM_ERR_MIRROR_POLL_TIMEOUT'}Tue Sep 03 04:51:31 +0200 [ClusterA-02: wafl_exempt09: nvmm.mirror.aborting:debug]: mirror of sysid 2, partner_type DR PARTNER and mirror state NVMM_MIRROR_ONLINE is aborted because of reason NVMM_ERR_MIRROR_POLL_TIMEOUT.
Tue Sep 03 04:51:31 +0200 [ClusterA-02: nvmm_error: mirror.stream.qp.error:debug]: params: {'mirror': 'DR PARTNER', 'qp_name': 'WAFL', 'error': 'NVMM_ERR_MIRROR_COMPLETION'}
Tue Sep 03 04:51:31 +0200 [ClusterA-02: nvmm_error: ems.engine.suppressed:debug]: Event 'rdma.rlib.event.error' suppressed 11 times in last 263 seconds.
Tue Sep 03 04:51:31 +0200 [ClusterA-02: nvmm_error: rdma.rlib.event.error:debug]: QP wafl event error: client disconnect.
Tue Sep 03 04:51:31 +0200 [ClusterA-02: nvmm_error: nvmm.mirror.offlined:debug]: params: {'mirror': 'DR_PARTNER'}
Tue Sep 03 04:51:31 +0200 [ClusterA-02: DR_heartbeat_thread: cf.ic.xferTimedOut:error]: HA interconnect: MCC_DRSOM transfer timed out.

followed by successful retries like:

Tue Sep 03 04:51:32 +0200 [ClusterA-02: iw_cm_wq: rdma.rlib.connected:debug]: wafl:DR:A QP is now connected.

  1. High number of error messages mixed with successful retries applying to many different disks (all are remote disks):

Tue Sep 03 04:51:34 +0200 [ClusterA-02: doneq0: scsi.mcc.adt.ioTransportError:error]: mcc_adt[2] - Transport error during execution of command: HA status 0x13: CAM transport status 0x1b: cdb 0x28:356b73b3:000d.
Tue Sep 03 04:51:34 +0200 [ClusterA-02: doneq0: scsi.mcc.adt.ioTransportError:error]: mcc_adt[2] - Transport error during execution of command: HA status 0x13: CAM transport status 0x1b : cdb 0x28:356b6555:000d....
Tue Sep 03 04:51:34 +0200 [ClusterA-02: scsi_cmdblk_strthr_admin: scsi.cmd.abortedByHost:error]: Disk device 0m.i1.2L17: Command aborted by host adapter: HA status 0x13: cdb 0x28:356b73b3:000d.
Tue Sep 03 04:51:34 +0200 [ClusterA-02: scsi_cmdblk_strthr_admin: scsi.cmd.abortedByHost:error]: Disk device 0m.i1.2L17: Command aborted by host adapter: HA status 0x13: cdb 0x28:356b6555:000d.
Tue Sep 03 04:51:34 +0200 [ClusterA-02: scsi_cmdblk_strthr_admin: scsi.cmd.retrySuccess:debug]: Disk device 0v.i1.0L17: request successful after retry #1/#0: cdb 0x28:356b73b3:000d (1967)

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

This is an internal KB article and its content should not be copy/pasted and shared with people outside of NetApp. Always seek Duty Manager authentication of caller for password reset requests. If you need further assistance post a question in Knowledge Xchange
NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.