volumeOffline due to unresponsive slice services
Applies to
Element OS 12.5
Issue
- This is a 4-nodes H610s SolidFire cluster with large volume count
- Thousands of API calls against the cluster is running per day
- Occationally reports volumeOfflineerror withunresponsiveServicewarning before and after thevolumeOfflineerror.
1 2023-03-04T17:33:00.206Z Warning service 1 cluster_name 1 Yes 2023-03-04T17:41:47.810Z unresponsiveService A metadata service is not responding.2 2023-03-04T17:37:22.127Z Error service 1 cluster_name 1 Yes 2023-03-04T17:39:40.148Z sliceServiceUnhealthy A metadata service is unhealthy and SolidFire is attempting to migrate data away from it.3 2023-03-04T17:38:55.966Z Error cluster 0 0 Yes 2023-03-04T17:44:40.299Z volumesOffline The following volumes are offline. [xxxxx, xxxxx, xxxxx]4 2023-03-04T17:38:57.584Z Warning service 2 cluster_name 25 Yes 2023-03-04T17:44:40.300Z unresponsiveService A metadata service is not responding.