Citrix pool master timeout on heartbeat volume during slice migration
Applies to
- Citrix 8.2 High Availability (HA)
- NetApp Element software
- Multiple storage nodes unavailable:
- H-series chassis failover
- Simultaneous node failures
- Environmental issue
Issue
- Increasing latency is noticed on the heartbeat volume (observed on the hosts)
- Latency is starting when slice migration initiates (
sliceServiceUnhealthy
) - Latency eventually exceeding the thresholds -> Citrix pool master triggers reboot on all hosts
- Growing queue depth on the heartbeat volume (observed in NetApp SolidFire Active IQ)
- Error in
sf-slice.error
on primary node for heartbeat volume
slice82[57705]: [UNCERR-3] [SCSIDevice] 22949 TSSocket-81 iscsitarget/SCSITask.cpp:664:Abort|srcid=3794 reqid=553648144 EIO-WARNING: cMaxSCSITaskLatencyAbortEventLimitMSec=20000 SCSITask: tag=1845493776 sessionID=347892354770 volumeID=160 RawLUN=0x0 generation=0xf0cdd9e896c498e8 mTaskState=Enabled mCommandType=Write mVolumeBlockSize=512 mExpectedTransferLength=65536 mSCSITransferLength=65536 mActualTransferLength=65536 mScsiStatus=0xff mAborted=1 mAbortCompleted=0 mAbortReason=InitiatorSentAbortTask mTaskMgmtSessionID=347892354770 mTaskMgmtTag=553648144 timerMS=60598 totalTimerMS=60598 mEndTaskMSec=0 mQosCallbackCount=0 mQosDelayMSec=0 expectedCallbacks=0 mVolumeRequestCount=0 mVolumeWaitUSec=0 queued=0 mQueuedCount=0 mQueuedUSec=0 mDataSequenceNumber=0 mNumOutstandingR2T=0 mR2TSentCount=0 mDataWaitUSec=0 mOutstandingVolumeWriteBytes=0 err=xSCSITaskAborted startOffsetLBA=29640248624 totalTransferBlocks=128 nextDataOutRequestBlockOffset=0 transferCompletedBlocks=0 immediateBytesLeftToWrite=0
- Error in
sf-slice.info
on multiple (or all) storage nodesslice82[57705]: [EXPERR-4] [Util] 39631 VolReadWrite-82 ss/SliceFileManager.cpp:652:WriteMetadata|[#### WILL SUPPRESS ####] srcid=3705 reqid=402653232 IO latency exceeded threshold: sdtLSFSWrite sdtLSFSWrite.ElapsedTimeUSFast()=15578 cIOLatencyThresholdLSFSWriteUS.GetUs()=14400
slice82[57705]: [EXPERR-4] [Util] 39627 VolReadWrite-82 ss/SliceFileManager.cpp:652:WriteMetadata|srcid=3856 reqid=16777232 IO latency exceeded threshold: sdtLSFSWrite sdtLSFSWrite.ElapsedTimeUSFast()=18307 cIOLatencyThresholdLSFSWriteUS.GetUs()=14400