Why do SQL verifications in SnapCenter sporadically cause an event that the Cluster resource failed
Applies to
- SnapCenter plug-in for SQL (SCSQL)
- Verification on one of the production cluster instances
Answer
Due to the high I/O and disk rescans a DBCC run during SnapCenter Verifications causes, production AG databases, on the same host, sporadically appear to loose a resource.
While the resource recovers after a while, it is recommended to use a separate verification instance for SnapCenter verifications.
Additional Information
The event message in the System event log can have multiple Event-IDs, and state:
Cluster resource 'AG_01_Test' of type 'SQL Server Availability Group' in clustered role 'AG_01_Test' failed.
Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.
Cluster messages would show the beginnings of failover and recovery without:
DESCRIPTION: The availability group database "highsystem_TEST" is changing roles from "PRIMARY" to "RESOLVING" because the mirroring session or availability group failed over due to role synchronization. This is an informational message only. No user action is required.
DESCRIPTION: The availability group database "highsystem_TEST" is changing roles from "RESOLVING" to "PRIMARY" because the mirroring session or availability group failed over due to role synchronization. This is an informational message only. No user action is required.
The sporadic nature of these events points to general processing performance limitations