What is the service impact if an entire StorageGRID site fails?
Applies to
StorageGRID 11.x
Answer
The service impact of an entire StorageGRID site failure (planed site shutdown or unplanned power outage etc) in a multiple sites grid depends on factors that are specific to your situation. For example:
- The type and number of nodes at the site. For example:
- If client requests are through Gateway Nodes and the site has the only Gateway Node in the grid, data access is interrupted.
- If the site has a Admin Node, audit logs might backlogged on Storage Nodes at other sites. If the
/var/local/
directory used by a Storage Node becomes 85% full, the node will start refusing S3 and Swift client requests with 503 Service Unavailable. - If the site has the primary Admin Node, maintenance actions e.g.
change-ip
are not allowed as maintenance tasks can only be completed on the primary Admin Node. - If a Storage Node has not been able to communicate with other Storage Nodes for more than 15 days, the Storage Node cannot rejoin the grid. If more than one Storage Node has failed (or is offline), contact technical support.
- Active ILM Policy. The number, type, and location of object copies in the grid is controlled by active ILM policy. The specifics of ILM policy can affect client requests and data availability. For example:
- If the site contains the only copy of an object, the object is unavailable.
- When Strict option is selected in an ILM rule, StorageGRID uses synchronous placement on ingest and immediately makes all object copies specified in the rule's placement instructions. Ingest fails if a required storage location is at the site.
- Bucket (or container) consistency. The consistency control affects how the metadata that StorageGRID uses to track objects is distributed between nodes, and therefore the availability of objects for client requests. For example:
- With default read-after-write consistency, if your application uses HEAD requests on objects that do not exist, you might receive a high number of 500 Internal Server errors when a site fails.