Skip to main content
NetApp Knowledge Base

SM-S goes OOS intermittently due to QR failure

Views:
207
Visibility:
Public
Votes:
0
Category:
snapmirror
Specialty:
dp
Last Updated:

Applies to

  • ONTAP 9
  • SnapMirror Synchronous (SM-S)
  • out-of-sync (OOS)
  • Quick Reconcile (QR)

Issue

The Synchronous SnapMirror is in "out-of-sync" status due to "Transfer failed." on EMS:

[HostName: sm_logger_main: sms.status.out.of.sync:error]: Source volume "SVM:volsource" and destination volume "SVMDest:volsource_dest" with relationship UUID "88d8bbf7-51ac-11eb-8ae7-d039ea27a9f0" is in "out-of-sync" status due to the following reason: "Transfer failed.".

The SnapMirror error logs show:

Wed Jun 30 13:12:32 CST 2021 REPL-CTRL:88d8bbf7-51ac-11eb-8ae7-d039ea27a9f0 Trace_Buf (1 of 5 log entries): repl_granular::GranularSyncWriter::_init():91 : Initializing GranularSyncWriter for SSM with PROTO_SYNC_REPL
Wed Jun 30 13:12:32 CST 2021 REPL-CTRL:88d8bbf7-51ac-11eb-8ae7-d039ea27a9f0 Trace_Buf (2 of 5 log entries): repl_granular::GranularSyncWriter::WriterInSyncOperationCallback():741 : Notifying d-control about DRL resync completion. Bytes Transferred: 303467560
Wed Jun 30 13:12:32 CST 2021 REPL-CTRL:88d8bbf7-51ac-11eb-8ae7-d039ea27a9f0 Trace_Buf (3 of 5 log entries): repl_receiver::Node::abort():638 : Replication (ReceiverNode) aborted with transferComplete=0,Status=0 
Wed Jun 30 13:12:32 CST 2021 REPL-CTRL:88d8bbf7-51ac-11eb-8ae7-d039ea27a9f0 Trace_Buf (4 of 5 log entries): repl_receiver::LogicalWriter::~LogicalWriter():262 : Unable to determine Datawarehouse file size since data store is not valid

The SKTRACE logs show:

2021-06-30T05:12:32Z 527236433979694      [9:0] REPL_0:  repl_granular::QR::processQrList(): | [88d8bbf7-51ac-11eb-8ae7-d039ea27a9f0] | [0xfffff7009af1a250]  | [0] QrList Size is: 16
2021-06-30T05:12:32Z 527236434079666      [9:0] REPL_0:  repl_granular::IFTT::_checkAndRetryOps(): | [88d8bbf7-51ac-11eb-8ae7-d039ea27a9f0] | [0xfffff7009af19e90]  | [0] RetryList Size is: 0
2021-06-30T05:12:32Z 527236452563216      [15:0] REPL_0:  repl_granular::SyncEngine::sendWaflReadDataToSecondary():  |  Sent Replops to secondary QR:: Anon seqNo : 1300598Hole seqNo : 0 PushInode seqNo : 0 previous/current seqNo : 1300019
2021-06-30T05:12:32Z 527236453695668      [14:0] REPL_0:  granular::Splitter::_postDecrementInFlightOpCount(): | [88d8bbf7-51ac-11eb-8ae7-d039ea27a9f0] | [0xfffff7009af18000]  | [27088]  | [refcount:39]  msg: 0xfffff81e42ac09b0 cookie: 0xfffff816b8ec2040 seqNo: 1300019 state: 96383 primaryStatus: 277 repl result: [status:0] oosOnFailure: 1 cookie type: 1 is_lun_metadata_op_cookie: 0 ctx-type: 0
2021-06-30T05:12:32Z 527236469009971      [14:0] REPL_0:  repl_granular::SyncEngine::sendWaflReadDataToSecondary():  |  Sent Replops to secondary QR:: Anon seqNo : 0Hole seqNo : 1300612 PushInode seqNo : 0 previous/current seqNo : 1299596
2021-06-30T05:12:32Z 527236469674050      [15:0] REPL_0:  repl_granular::SyncEngine::sendWaflReadDataToSecondary():  |  Sent Replops to secondary QR:: Anon seqNo : 1300613Hole seqNo : 0 PushInode seqNo : 0 previous/current seqNo : 1299451
2021-06-30T05:12:32Z 527236472000866      [12:0] REPL_0:  granular::Splitter::_postDecrementInFlightOpCount(): | [88d8bbf7-51ac-11eb-8ae7-d039ea27a9f0] | [0xfffff7009af18000]  | [7320]  | [refcount:37]  msg: 0xfffff81e33d940b0 cookie: 0xfffff816ba4f4040 seqNo: 1299596 state: 178303 primaryStatus: 277 repl result: [status:0] oosOnFailure: 1 cookie type: 1 is_lun_metadata_op_cookie: 0 ctx-type: 0
2021-06-30T05:12:32Z 527236472163950      [2:0] REPL_0:  repl_granular::ReadBuilder::convertIovecData():  |  Failed to compute range count.  Result: 704
2021-06-30T05:12:32Z 527236472167341      [2:0] REPL_0:  repl_granular::ReadBuilder::_convertInternal():  |  Failed to convert Iovec data
2021-06-30T05:12:32Z 527236472171132      [2:0] REPL_0:  repl_granular::ReadBuilder::_convertInternal():  |  Error building replop from read msg. Result: 704 Volume fsid: 1094
2021-06-30T05:12:32Z 527236472179451      [2:0] REPL_0:  repl_sender::GranularSyncScanner::getTransferBufferForWaflRead(): | [88d8bbf7-51ac-11eb-8ae7-d039ea27a9f0] | [0xfffff70086c40000] Failed to build a replop for readMsg
2021-06-30T05:12:32Z 527236472199482      [2:0] REPL_0:  repl_sender::GranularSyncScanner::_scanCallback(): | [88d8bbf7-51ac-11eb-8ae7-d039ea27a9f0] | [0xffffffff93997240] result: [status: 704 failure_msg: 5898884 failure_msg_detail: 0] Callback with error. buffer 0xfffff70091ffc000
2021-06-30T05:12:32Z 527236472221679      [2:0] REPL_0:  repl_sender::Node::scannerError(): | [88d8bbf7-51ac-11eb-8ae7-d039ea27a9f0] | [0xffffffff939a7ff8] result: [status: 704 failure_msg: 5898884 failure_msg_detail: 0] Scanner error
2021-06-30T05:12:32Z 527236472239427      [2:0] REPL_0:  repl_sender::Node::abort(): | [88d8bbf7-51ac-11eb-8ae7-d039ea27a9f0] | [0xffffffff939a7ff8] result: [status: 704 failure_msg: 5898884 failure_msg_detail: 0] Replication (SenderNode) aborted with reason: [status:704 major (M):005a0284 minor (m):00000000 M0: M1: m0: m1: ]
2021-06-30T05:12:32Z 527236472255883      [2:0] REPL_0:  repl_sender::GranularSyncScanner::abort(): | [88d8bbf7-51ac-11eb-8ae7-d039ea27a9f0] | [0xfffff70086c40000] abort: _aborted: 1 _dstDrainComplete: 0 _exovolScannersStarted: 0 _pendingExovolWorkCount: 0 _scrapperPopulatePending: 1 _buffersPopulating: 100 _vacantListSize: 99
2021-06-30T05:12:32Z 527236472267068      [2:0] REPL_0:  repl_granular::SyncCache::transitionCgToOos_Async(): | [88d8bbf7-51ac-11eb-8ae7-d039ea27a9f0] | [0xffffffff937cea18] Async OOS for CG: 995a47ee-4f36-11eb-b15d-d039ea27a65a
2021-06-30T05:12:32Z 527236472277770      [11:0] REPL_0:  repl_sender::DRLScanner::abort(): | [88d8bbf7-51ac-11eb-8ae7-d039ea27a9f0] | [0xffffffff937a3c30] Aborting DRL scanne

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.