All StorageGRID nodes are displayed blue/unknown after starting decommission task
Applies to
- StorageGRID 11.6.0.10 or earlier
- StorageGRID 11.7.0.3 or earlier
- Grid task(like decommission) is started
Issue
- SUPPORT > Grid topology displays all nodes as blue/unknown or does not display nodes
/var/local/log/nms.log
indicatesinvalid Atom label error
is detected by/usr/local/lib/site_ruby/bycast/storage-grid/atom-container.rb
detects during processing bundle
MI: |2023-07-18T13:46:59.082| NOTICE [DataConnectionManager] BundleProtocol.java:288: Processed bundle GTSB version 1 namespace BNDL instance 0
NMS: |2023-07-18T13:46:59.096| ERROR invalid Atom label "S>oK" (ArgumentError)
NMS: |2023-07-18T13:46:59.096| ERROR /usr/local/lib/site_ruby/bycast/storage-grid/atom-container.rb:44:in `label='
/var/local/log/nms.log
indicates Java MI thread lost its connection
MI: |2023-07-21T13:10:25.725| ERROR [DATA_STREAM_25] AddNodeProtocol.java:226: Connection lost.
MI: |2023-07-21T13:10:25.761| NOTICE [CONTROL_STREAM] ControlConnection.java:191: Restarting control connection...
- Service mgmt-api fails to restart
/var/local/log/bycast-err.log
indicates error of mgmt-api
NMS: |2023-07-25T05:39:37.383| ERROR Exception in thread created by /usr/local/lib/site_ruby/mgmt-api/alertmanager/rules/prometheus-alert-rules-updater.rb:25:in `new'
NMS: |2023-07-25T05:39:37.383| ERROR Directory not empty @ dir_s_rmdir - /var/local/mgmt-api/prometheus-rules (Errno::ENOTEMPTY)
NMS: |2023-07-25T05:39:37.383| ERROR /usr/lib/ruby/2.5.0/fileutils.rb:1337:in `rmdir'
NMS: |2023-07-25T05:39:37.383| ERROR /usr/lib/ruby/2.5.0/fileutils.rb:1337:in `block in remove_dir1'
NMS: |2023-07-25T05:39:37.383| ERROR /usr/lib/ruby/2.5.0/fileutils.rb:1348:in `platform_support'
NMS: |2023-07-25T05:39:37.383| ERROR /usr/lib/ruby/2.5.0/fileutils.rb:1336:in `remove_dir1'
NMS: |2023-07-25T05:39:37.383| ERROR /usr/lib/ruby/2.5.0/fileutils.rb:1329:in `remove'
NMS: |2023-07-25T05:39:37.384| ERROR /usr/lib/ruby/2.5.0/fileutils.rb:691:in `block in remove_entry'
NMS: |2023-07-25T05:39:37.384| ERROR /usr/lib/ruby/2.5.0/fileutils.rb:1386:in `ensure in postorder_traverse'
NMS: |2023-07-25T05:39:37.384| ERROR /usr/lib/ruby/2.5.0/fileutils.rb:1386:in `postorder_traverse'
NMS: |2023-07-25T05:39:37.384| ERROR /usr/lib/ruby/2.5.0/fileutils.rb:689:in `remove_entry'
NMS: |2023-07-25T05:39:37.384| ERROR /usr/lib/ruby/2.5.0/fileutils.rb:717:in `remove_dir'
NMS: |2023-07-25T05:39:37.384| ERROR /usr/local/lib/site_ruby/mgmt-api/alertmanager/rules/prometheus-alert-rules-updater.rb:123:in `stage_rules!'
NMS: |2023-07-25T05:39:37.384| ERROR /usr/local/lib/site_ruby/mgmt-api/alertmanager/rules/prometheus-alert-rules-updater.rb:28:in `block (2 levels) in update_alert_rules!'
NMS: |2023-07-25T05:39:37.384| ERROR /usr/local/lib/site_ruby/mgmt-api/alertmanager/rules/prometheus-alert-rules-updater.rb:26:in `synchronize'
NMS: |2023-07-25T05:39:37.384| ERROR /usr/local/lib/site_ruby/mgmt-api/alertmanager/rules/prometheus-alert-rules-updater.rb:26:in `block in update_alert_rules!'
NMS: |2023-07-25T05:39:37.384| ERROR /usr/local/lib/site_ruby/mgmt-api/tools/api-thread.rb:21:in `block in initialize'
/var/local/log/nms.log
indicatesjava.net.ConnectException: Connection refused (Connection refused)
MI: |2023-07-25T05:51:36.885| NOTICE [DATA_STREAM_36] NMSClustersUtils.java:255: Failed to call /localhost/alert-notification-sender-update
java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:607)
at java.net.Socket.connect(Socket.java:556)
at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
at sun.net.www.http.HttpClient.<init>(HttpClient.java:242)
at sun.net.www.http.HttpClient.New(HttpClient.java:339)
at sun.net.www.http.HttpClient.New(HttpClient.java:357)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1223)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1162)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1056)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:990)
at sun.net.www.protocol.http.HttpURLConnection.getOutputStream0(HttpURLConnection.java:1337)
at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1312)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1521)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1495)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
at com.bycast.config.NMSClustersUtils.notifyMgmtApiOfAlertSenderChange(NMSClustersUtils.java:235)
at com.bycast.config.NMSClustersUtils.setSendingClusterId(NMSClustersUtils.java:211)
at com.bycast.clusters.ClustersUtils.getSendingClusterId(ClustersUtils.java:224)
at com.bycast.clusters.ClustersUtils.getEmailNotificationSendingClusterId(ClustersUtils.java:165)
at com.bycast.transactions.protocols.AttributeNotifyProtocol.saveAttributeData(AttributeNotifyProtocol.java:184)
at com.bycast.transactions.protocols.AttributeNotifyProtocol.processAttrNotify(AttributeNotifyProtocol.java:150)
at com.bycast.transactions.protocols.AddNodeProtocol.startProcessing(AddNodeProtocol.java:192)
at com.bycast.transactions.connectionagent.DataConnection.dataProcessing(DataConnection.java:140)
at com.bycast.transactions.connectionagent.DataConnection.run(DataConnection.java:55)