StorageGRID node in Administratively down status during SANtricity OS upgrade
Applies to
- NetApp StorageGRID
- SG6060 appliance
Issue
- During a SANtricity OS upgrade, a single storage node encountered Unknown/Administratively down status.
/var/local/log/gdu-server.logon the Primary Admin node records repeated reboot loop with a connectivity time out:
[2026-02-20T06:17:54.404200 #842477] INFO -- gdu-server: Checking connectivity of <NODE_NAME>
[2026-02-20T06:18:04.404992 #842477] INFO -- gdu-server: Connectivity check for <NODE_NAME> timed out. Retrying one more time ...
[2026-02-20T06:18:14.405667 #842477] ERROR -- gdu-server: execution expired (Timeout::Error)
[2026-02-20T06:18:14.405761 #842477] ERROR -- gdu-server: /usr/lib/ruby/3.1.0/socket.rb:64:in `connect'
[2026-02-20T06:18:14.405778 #842477] ERROR -- gdu-server: /usr/lib/ruby/3.1.0/socket.rb:64:in `connect_internal'
[2026-02-20T06:18:14.405787 #842477] ERROR -- gdu-server: /usr/lib/ruby/3.1.0/socket.rb:137:in `connect'
[2026-02-20T06:18:14.405801 #842477] ERROR -- gdu-server: Unable to connect to <NODE_NAME>: execution expired
[2026-02-20T06:18:14.405976 #842477] INFO -- gdu-server: Node <NODE_NAME> has not fully come back up after a reboot. Attempt 1: key: "offline_error", options: {:hostname=>"<NODE_NAME>", :reason=>"Connectivity check timeout"}
[2026-02-20T08:45:51.741537 #842477] INFO -- gdu-server: Node <NODE_NAME> has not fully come back up after a reboot. Attempt 180: key: "offline_error", options: {:hostname=>"<NODE_NAME>", :reason=>"Connectivity check timeout"}
[2026-02-20T08:45:51.741563 #842477] ERROR -- gdu-server: Node <NODE_NAME> has failed to come completely back online after 181 attempts: key: "offline_error", options: {:hostname=>"<NODE_NAME>", :reason=>"Connectivity check timeout"}
[2026-02-20T08:45:51.741631 #842477] ERROR -- gdu-server: key: "managed_reboot.not_online_error", options: {:hostname=>"<NODE_NAME>"} (Gdu::LocalizedError)
[2026-02-20T08:45:51.741781 #842477] INFO -- gdu-server: Skipping update on <NODE_NAME>
[2026-02-20T08:45:51.771845 #842477] ERROR -- gdu-server: Failed to apply SANtricity OS to <NODE_NAME>: Fatal installation failure - Node reboot failed. See gdu-server.log for details.
/base-os-logs/run/mount-tmp/pge-actv-root/var/log/kern.logindicates the node booted into Pre-Grid Environment (PGE) mode with SATA link down errors:
2026-02-20T13:17:13.092961+00:00 StorageGRID-PGE kernel: [0.000000] Linux version 6.1.0-25-amd64 (debian-kernel@lists.debian.org) (gcc-12 (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC Debian 6.1.106-3+ntap0 (2024-09-12)2026-02-20T13:17:13.093029+00:00 StorageGRID-PGE kernel: [0.000000] Command line: BOOT_IMAGE=/vmlinuz root=LABEL=pge-actv-root ro console=uart,io,0x9000,6400n81 intel_idle.max_cstate=2 intel_iommu=off netapp_sga_set_console_by_cpu=yes fsck.repair=yes memmap=64K$4K
2026-02-20T13:17:13.099063+00:00 StorageGRID-PGE kernel:[10.180972] ERST: Error Record Serialization Table (ERST) support is initialized.
2026-02-20T13:17:13.100138+00:00 StorageGRID-PGE kernel:[13.269070] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
2026-02-20T13:17:13.100139+00:00 StorageGRID-PGE kernel:[13.269097] ata5: SATA link down (SStatus 0 SControl 300)
2026-02-20T13:17:13.100140+00:00 StorageGRID-PGE kernel:[13.269124] ata3: SATA link down (SStatus 0 SControl 300)
2026-02-20T13:17:13.100141+00:00 StorageGRID-PGE kernel:[13.269155] ata1: SATA link down (SStatus 0 SControl 300)
2026-02-20T13:17:13.100142+00:00 StorageGRID-PGE kernel:[13.269183] ata4: SATA link down (SStatus 0 SControl 300)
2026-02-20T13:17:13.100143+00:00 StorageGRID-PGE kernel:[13.269210] ata6: SATA link down (SStatus 0 SControl 300)
/base-os-logs/run/mount-tmp/pge-actv-root/var/log/records the node eventually boots out of PGE successfully:
2026-02-20T13:22:31.978220+00:00 StorageGRID-PGE kernel:[341.745991] sd 6:0:0:254: Power-on or device reset occurred
2026-02-20T13:22:33.426229+00:00 StorageGRID-PGE kernel:[343.191565] sd 15:0:0:238: [sdc] Spinning up disk...
- The BMC event log may also record repeated Watchdog timeout events during boot loop:
06:30:40 [Warning] [Watchdog] [Watchdog 2] Hard Reset(Timer use at expiration: OS Load) - Asserted
13:13:28 [Warning] [Watchdog] [Watchdog 2] Hard Reset(Timer use at expiration: OS Load) - Asserted
