How to know when data drives are given up and start reallocating the data
Applies to
Element Software
Answer
Cluster Alert : blockServiceUnhealthy is right alert to see when the data is reallocated to other drives (Bin Sync) . The alert occurs when the cluster gave up on a data drive. It resolves when Bin Sync completes.
Additional Information
Please do not be confused by these Cluster Alerts.
diveFailed: This occurs when Cluster Master node fails to get drive information from master service on each nodes. Besides real drive hardware failure, this also occurs when Cluster Master fail to get status from master service.blocksDegraded: This occurs when Cluster Master fail to check health of block service. It doesn't mean data on drives is given up.nodeOffline: This occurs when Cluster Master node fail to communicate to master service running on each nodes. It is possible that the node is still serving data when this alert is seen.unresponsiveService: This occurs when Cluster Master fail to communicate to each services.
Note :
- Cluster Fault Monitor service is running on Cluster Master Node. This check health of each components on every storage nodes periodically. When it finds bad status or fail to communicate, cluster alerts are generated depending on error. Volume failover(Slice Sync), BinSync are not judged by Cluster Fault Monitor.
- Bin Sync starts when block service isn't recovered in 5.5 minutes.
