CONTAP-54204: Due to intermittent checksum mismatches, SnapMirror transfers involving TSSE volumes might fail in ONTAP 9.8
Issue
- This bug only applies to systems in which the SnapMirror source cluster is running ONTAP 9.8 on an All-Flash FAS (AFF) cluster
- SnapMirror transfer fails with an error after transferring some data: 'Transfer failed. (Checksum mismatch(Replication engine error))'
- There are no issues on the network as described in KB: SnapMirror transfers fail with checksum mismatch or unmarshal errors
- The destination cluster transfer fails after 8 retries (default value for SnapMirror Policies). With each retry, some amount of data has been transferred to the destination
- SnapMirror audit logs (/etc/log/snapmirror_audit) indicate the following:
Wed Oct 13 14:52:22 UTC 2021 Initialize[Oct 13 14:51:06]:Operation-Uuid=<operation_uuid> Group=none Operation-Cookie=0 action=Defer source=<source_path> destination=<destination_path> status=Failure message=Transfer failed.(Checksum mismatch(Replication engine error))
- On the source cluster, EMS logs (/etc/log/ems or the event log show command) indicate:
repl_Handle_low: repl.engine.error:debug]: params: {'replFailureMsgDetail': '5898509', 'lineNumber': '671', 'replFailureMsg': '5898522', 'functionName': 'virtual void repl_stream::DataSource::DataContext::spinnpResponse(repl_spinnp::Request &, repl_spinnp::Response &)', 'replStatus': '14'}
- Also on the source, SKTrace logs (/etc/log/mlog/sktrace.log) indicate:
REPL_0: repl_spinnp::Session::_handleNextCsmResponse(): | [ddd94d44-1f77-11ec-a7ea-00a098d92444] | [0xffffffff948466f0] result: [status: 91 failure_msg: 5898741 failure_msg_detail: 4063303] received response with error from CSM. csm error: 71 request major op: 7 request minor op: 24from b8e42de7-38fd-11e9-aa37-00a098d92444,f7ea4ced-38fb-11e9-aa37-00a098d92444 at b4775206-cb19-11eb-b63f-d039ea499ab4