Skip to main content
NetApp Knowledge Base

SM-S goes OOS intermittently due to QR failure

Views:
133
Visibility:
Public
Votes:
0
Category:
snapmirror
Specialty:
dp
Last Updated:

Applies to

  • ONTAP 9
  • SnapMirror Synchronous (SM-S)
  • out-of-sync (OOS)
  • Quick Reconcile (QR)

Issue

The Synchronous SnapMirror is in "out-of-sync" status due to "Transfer failed." on EMS:

[HostName: sm_logger_main: sms.status.out.of.sync:error]: Source volume "SVM:volsource" and destination volume "SVMDest:volsource_dest" with relationship UUID "88d8bbf7-51ac-11eb-8ae7-d039ea27a9f0" is in "out-of-sync" status due to the following reason: "Transfer failed.".

The SnapMirror error logs show:

Wed Jun 30 13:12:32 CST 2021 REPL-CTRL:88d8bbf7-51ac-11eb-8ae7-d039ea27a9f0 Trace_Buf (1 of 5 log entries): repl_granular::GranularSyncWriter::_init():91 : Initializing GranularSyncWriter for SSM with PROTO_SYNC_REPL
Wed Jun 30 13:12:32 CST 2021 REPL-CTRL:88d8bbf7-51ac-11eb-8ae7-d039ea27a9f0 Trace_Buf (2 of 5 log entries): repl_granular::GranularSyncWriter::WriterInSyncOperationCallback():741 : Notifying d-control about DRL resync completion. Bytes Transferred: 303467560
Wed Jun 30 13:12:32 CST 2021 REPL-CTRL:88d8bbf7-51ac-11eb-8ae7-d039ea27a9f0 Trace_Buf (3 of 5 log entries): repl_receiver::Node::abort():638 : Replication (ReceiverNode) aborted with transferComplete=0,Status=0 
Wed Jun 30 13:12:32 CST 2021 REPL-CTRL:88d8bbf7-51ac-11eb-8ae7-d039ea27a9f0 Trace_Buf (4 of 5 log entries): repl_receiver::LogicalWriter::~LogicalWriter():262 : Unable to determine Datawarehouse file size since data store is not valid

The SKTRACE logs show:

2021-06-30T05:12:32Z 527236433979694      [9:0] REPL_0:  repl_granular::QR::processQrList(): | [88d8bbf7-51ac-11eb-8ae7-d039ea27a9f0] | [0xfffff7009af1a250]  | [0] QrList Size is: 16
2021-06-30T05:12:32Z 527236434079666      [9:0] REPL_0:  repl_granular::IFTT::_checkAndRetryOps(): | [88d8bbf7-51ac-11eb-8ae7-d039ea27a9f0] | [0xfffff7009af19e90]  | [0] RetryList Size is: 0
2021-06-30T05:12:32Z 527236452563216      [15:0] REPL_0:  repl_granular::SyncEngine::sendWaflReadDataToSecondary():  |  Sent Replops to secondary QR:: Anon seqNo : 1300598Hole seqNo : 0 PushInode seqNo : 0 previous/current seqNo : 1300019
2021-06-30T05:12:32Z 527236453695668      [14:0] REPL_0:  granular::Splitter::_postDecrementInFlightOpCount(): | [88d8bbf7-51ac-11eb-8ae7-d039ea27a9f0] | [0xfffff7009af18000]  | [27088]  | [refcount:39]  msg: 0xfffff81e42ac09b0 cookie: 0xfffff816b8ec2040 seqNo: 1300019 state: 96383 primaryStatus: 277 repl result: [status:0] oosOnFailure: 1 cookie type: 1 is_lun_metadata_op_cookie: 0 ctx-type: 0
2021-06-30T05:12:32Z 527236469009971      [14:0] REPL_0:  repl_granular::SyncEngine::sendWaflReadDataToSecondary():  |  Sent Replops to secondary QR:: Anon seqNo : 0Hole seqNo : 1300612 PushInode seqNo : 0 previous/current seqNo : 1299596
2021-06-30T05:12:32Z 527236469674050      [15:0] REPL_0:  repl_granular::SyncEngine::sendWaflReadDataToSecondary():  |  Sent Replops to secondary QR:: Anon seqNo : 1300613Hole seqNo : 0 PushInode seqNo : 0 previous/current seqNo : 1299451
2021-06-30T05:12:32Z 527236472000866      [12:0] REPL_0:  granular::Splitter::_postDecrementInFlightOpCount(): | [88d8bbf7-51ac-11eb-8ae7-d039ea27a9f0] | [0xfffff7009af18000]  | [7320]  | [refcount:37]  msg: 0xfffff81e33d940b0 cookie: 0xfffff816ba4f4040 seqNo: 1299596 state: 178303 primaryStatus: 277 repl result: [status:0] oosOnFailure: 1 cookie type: 1 is_lun_metadata_op_cookie: 0 ctx-type: 0
2021-06-30T05:12:32Z 527236472163950      [2:0] REPL_0:  repl_granular::ReadBuilder::convertIovecData():  |  Failed to compute range count.  Result: 704
2021-06-30T05:12:32Z 527236472167341      [2:0] REPL_0:  repl_granular::ReadBuilder::_convertInternal():  |  Failed to convert Iovec data
2021-06-30T05:12:32Z 527236472171132      [2:0] REPL_0:  repl_granular::ReadBuilder::_convertInternal():  |  Error building replop from read msg. Result: 704 Volume fsid: 1094
2021-06-30T05:12:32Z 527236472179451      [2:0] REPL_0:  repl_sender::GranularSyncScanner::getTransferBufferForWaflRead(): | [88d8bbf7-51ac-11eb-8ae7-d039ea27a9f0] | [0xfffff70086c40000] Failed to build a replop for readMsg
2021-06-30T05:12:32Z 527236472199482      [2:0] REPL_0:  repl_sender::GranularSyncScanner::_scanCallback(): | [88d8bbf7-51ac-11eb-8ae7-d039ea27a9f0] | [0xffffffff93997240] result: [status: 704 failure_msg: 5898884 failure_msg_detail: 0] Callback with error. buffer 0xfffff70091ffc000
2021-06-30T05:12:32Z 527236472221679      [2:0] REPL_0:  repl_sender::Node::scannerError(): | [88d8bbf7-51ac-11eb-8ae7-d039ea27a9f0] | [0xffffffff939a7ff8] result: [status: 704 failure_msg: 5898884 failure_msg_detail: 0] Scanner error
2021-06-30T05:12:32Z 527236472239427      [2:0] REPL_0:  repl_sender::Node::abort(): | [88d8bbf7-51ac-11eb-8ae7-d039ea27a9f0] | [0xffffffff939a7ff8] result: [status: 704 failure_msg: 5898884 failure_msg_detail: 0] Replication (SenderNode) aborted with reason: [status:704 major (M):005a0284 minor (m):00000000 M0: M1: m0: m1: ]
2021-06-30T05:12:32Z 527236472255883      [2:0] REPL_0:  repl_sender::GranularSyncScanner::abort(): | [88d8bbf7-51ac-11eb-8ae7-d039ea27a9f0] | [0xfffff70086c40000] abort: _aborted: 1 _dstDrainComplete: 0 _exovolScannersStarted: 0 _pendingExovolWorkCount: 0 _scrapperPopulatePending: 1 _buffersPopulating: 100 _vacantListSize: 99
2021-06-30T05:12:32Z 527236472267068      [2:0] REPL_0:  repl_granular::SyncCache::transitionCgToOos_Async(): | [88d8bbf7-51ac-11eb-8ae7-d039ea27a9f0] | [0xffffffff937cea18] Async OOS for CG: 995a47ee-4f36-11eb-b15d-d039ea27a65a
2021-06-30T05:12:32Z 527236472277770      [11:0] REPL_0:  repl_sender::DRLScanner::abort(): | [88d8bbf7-51ac-11eb-8ae7-d039ea27a9f0] | [0xffffffff937a3c30] Aborting DRL scanne

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

Scan to view the article on your device