StorageGRID bare-metal host service stopped working
Applies to
- Netapp StorageGRID 11.4
- Ubuntu 18.04.5
- Docker version 20.10.2
Issue
The StorgageGRID Webscale services stopped working. The admin node was rebooted which resulted in Docker not being able to see the container.
StorageGRID Webscale host service unresponsive Event: "Unable to communicate with node (1 alert)". One or more services are unresponsive, or the node cannot be reached.
root@Nodename:~# systemctl status storagegrid ● storagegrid.service - StorageGRID host service Loaded: loaded (/usr/lib/systemd/system/storagegrid.service; enabled; vendor preset: enabled) Active: failed (Result: exit-code) since Mon 2021-05-03 13:55:41 UTC; 6min ago Main PID: 22780 (code=exited, status=1/FAILURE) May 03 13:55:41 lxchost4 systemd[1]: storagegrid.service: Service hold-off time over, scheduling restart. May 03 13:55:41 lxchost4 systemd[1]: storagegrid.service: Scheduled restart job, restart counter is at 5. May 03 13:55:41 lxchost4 systemd[1]: Stopped StorageGRID host service. May 03 13:55:41 lxchost4 systemd[1]: storagegrid.service: Start request repeated too quickly. May 03 13:55:41 lxchost4 systemd[1]: storagegrid.service: Failed with result 'exit-code'. May 03 13:55:41 lxchost4 systemd[1]: Failed to start StorageGRID host service. root@Nodename:~# storagegrid node status WARNING: Unable to connect to StorageGRID service: [Errno 111] Connection refused Name Config-State Run-State sg-management01 Configured Stopped root@Nodename:~# systemctl status storagegrid â— storagegrid.service - StorageGRID host service Loaded: loaded (/usr/lib/systemd/system/storagegrid.service; enabled; vendor preset: enabled) Active: failed (Result: exit-code) since Fri 2021-05-07 13:08:01 UTC; 1 weeks 5 days ago Process: 22854 ExecStart=/usr/sbin/storagegrid-daemon (code=exited, status=0/SUCCESS) Main PID: 22870 (code=exited, status=1/FAILURE) Attempts exceeded to restore access: May 07 13:08:01 lxchost4 systemd[1]: storagegrid.service: Service hold-off time over, scheduling restart. May 07 13:08:01 lxchost4 systemd[1]: storagegrid.service: Scheduled restart job, restart counter is at 8. May 07 13:08:01 lxchost4 systemd[1]: Stopped StorageGRID host service. May 07 13:08:01 lxchost4 systemd[1]: storagegrid.service: Start request repeated too quickly. May 07 13:08:01 lxchost4 systemd[1]: storagegrid.service: Failed with result 'exit-code'. May 07 13:08:01 lxchost4 systemd[1]: Failed to start StorageGRID host service.