StorageGRID services state changed to unknown due to out of memory

Last updated
Save as PDF
Share
1. Share
2. Tweet
3. Share

Views:: 327

Visibility:: Public

Votes:: 0

Category:: storagegrid

Specialty:: sgrid

Last Updated:

Applies to

StorageGRID
DDS service (Distributed Data Store)
LDR service (Local Distribution Router)
SSM service (Server Status Monitor)

Issue

State of StorageGRID services like DDS, LDR and SSM of a Storage Node change to unknown and recover after a few minutes.
servermanager.log indicates Cassandra service is ended and restarted:

2021-01-23 12:34:38 +0000 | cassandra | cassandra ended

2021-01-23 12:34:54 +0000 | cassandra | starting cassandra

Base OS messages log shows Java process (the Cassandra service for StorageGRID) is killed by oom_reaper:

Jan 23 12:34:22 localhost kernel: [123456.123456] oom_reaper: reaped process 1234 (java), now anon-rss:10347420kB, file-rss:27560kB, shmem-rss:144kB

StorageGRID node reboots due to Out-of-Memory errors found in daemon.log

Line 26927: Mar 22 13:39:37 localhost wdogd[1691]: OOMM: successfully forked OOM canary process
Line 26967: Mar 22 13:39:38 localhost wdogd[1691]: OOMM: /usr/bin/storagegrid-oom-recover considering initializing swap file at Tue Mar 22 13:39:38 UTC 2022
Line 27033: Mar 22 13:39:41 localhost wdogd[1691]: OOMM: Setting up swapspace version 1, size = 1024 MiB (1073737728 bytes)
Line 27034: Mar 22 13:39:41 localhost wdogd[1691]: OOMM: no label, UUID=