How to correct time-drift across StorageGRID nodes using NTP
Applies to
NetApp StorageGRID
Issue
StorageGRID is comprised of a distributed set of services, mostly running on separate hardware. This distributed nature requires precise and tight timing of the underlying hardware - if the clocks on the servers deviate too much, StorageGRID will sever communications to the outlying node(s). This can result in the NTP service being in an Error state and the Node reporting Blue in the NMS.
Timing is so important to StorageGRID functionality that the NTP service is a dependency for all other grid services - servermanager will not bring any other services on-line if NTP does not successfully start. In fact, this is utilized within maintenance procedures - it is possible to prevent auto-startup of the StorageGRID software by creating a 0-byte file: /etc/sv/ntp/DoNotStart
.
A best practice for installing StorageGRID is to configure NTP in a hierarchical manner - the Control Nodes are configured for time synchronization with external time sources. They are also configured as synchronizing peers (that is, the Control Nodes sync with each other as well as the external sources). All other nodes within the StorageGRID use Control Nodes as their time sources.
If enough Control Nodes deviate from the external time sources, the entire grid might begin to drift away from the external time sources.
Dramatic time drifts from configured sources are referred to as 'fly-wheeling', and can affect either single nodes or the whole grid.