StorageGRID Appliance compute controller loses access to ESeries storage controller due to a memory leak reboot
Applies to
- StorageGRID Appliances (SGA) models: SG56XX and SG57XX
- This issue can impact non-SGA E-Series systems running in a simplex configuration.
- This issue should not impact the SG6000 series as the E-Series storage controller shelf uses two controllers running in a duplex configuration allowing for storage controller failover support.
- Additionally, interoperability qualifications require the E-Series storage controller shelf to be running 08.40/11.40 or above.
- E-Series SANtricity software releases on 08.30/11.30 or older release.
Issue
- The SG56XX/SG57XX series can observe the compute controller lose temporary access to the storage controller.
- The resulting behavior can vary depending on the severity of the interruption.
- In most instances, the impact was minimal and limited to alarms only, as the node self-recovered.
- In one occurrence, we did see an impact on the compute controller that was more critical and required rebuilding it from the surviving SGRID storage node cluster.
- Below are the related PANIC strings found in the
excLogShow
:(ProcessEvents): PANIC: Caught bad allocation in processEventsMethod for 0x0 null
(bdbmSync): PANIC: Unhandled C++ exception triggered terminate().
(symTask0): ASSERT: Assertion failed: response, file sas2PhyErrorMgr.cc, line 1732