DDS service instability on StorageGRID nodes
Applies to
StorageGRID earlier version than 11.8.0.14
Issue
- Distributed Data Store (
DDS
) service goes up and down alternately on all nodes after upgrade. - Major alerts for both disk I/O being very slow and unable to communicate with nodes.
- Assertions in
bycast.log
similar to the following:
Feb 24 19:21:23 nodename ADE: |21681320 1565824260 CTCU ^RDY 2025-02-24T19:21:23.374271| CRITICAL 4249 CTCU: Assertion Failed#012[/build/src/libs/cassandra/CSTAR_DataModel_Objects.cc:4249 std::unique_ptr<Object> cassandra::ObjectRepository::buildDeleteMarkerObject118(const cassandra::Bucket &, const std::string &, const CXD_UUID &, bool, bool) const] ASSERT FAILED (bucket.name()) is false. []
Feb 24 19:21:23 nodename ADE: |21681320 1565824260 CTCU ^RDY 2025-02-24T19:21:23.588249| ERROR 0483 CTCU: 0# sgutil::stacktrace::current[abi:cxx11]() in /usr/local/dds/dds
Feb 24 19:21:23 nodename ADE: |21681320 1565824260 CTCU ^RDY 2025-02-24T19:21:23.588267| ERROR 0483 CTCU: 1# CXD_Debug_PrintStackCrawl() in /usr/local/dds/dds