ONTAP LUN VMware datastore heartbeat region corrupted
- Views:
- 1,288
- Visibility:
- Public
- Votes:
- 1
- Category:
- aff-series
- Specialty:
- san
- Last Updated:
- 9/11/2023, 10:47:57 AM
Applies to
- ONTAP
- VMware ESXi
Issue
- Periodic datastore disconnections.
- Datastore corruption may occur.
- Generally impacts Datastores shared and used by all ESXi hosts in a vSphere datacenter for example a disk or OVF image repository.
- No errors detected in ONTAP event logs
- Some ESXi use Atomic test and set ( ATS) and some ESXI have ATS disabled, only using SCSI reservation
Example:
/var/run/log/vobd Abandoned event (esx.problem.vmfs.heartbeat.corruptondisk) after 6 failures.
/var/run/log/hostd info hostd[2184843] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 159405 : At least one corrupt on-disk heartbeat region was detected on volume abc_(NetApp). Other regions of the volume might be damaged too.
/var/run/log/hostd.0:--> eventTypeId = "esx.problem.vmfs.heartbeat.corruptondisk"
Failed to send event (esx.problem.vmfs.heartbeat.corruptondisk); 2 failures so far.
/var/run/log/vobd [vmfsCorrelator] 16326186567549us: [esx.problem.vmfs.heartbeat.corruptondisk] abc_NetApp
/var/run/log/vobd An event (esx.problem.vmfs.heartbeat.corruptondisk) could not be sent immediately to hostd; queueing for retry.