Skip to main content
NetApp Knowledge Base

StorageGRID node in Administratively down status during SANtricity OS upgrade

Views:
22
Visibility:
Public
Votes:
0
Category:
storagegrid
Specialty:
sgrid
Last Updated:

Applies to

  • NetApp StorageGRID
  • SG6060 appliance 

Issue

  • During a SANtricity OS upgrade, a single storage node encountered Unknown/Administratively down status.
  • /var/local/log/gdu-server.log on the Primary Admin node records repeated reboot loop with a connectivity time out:

[2026-02-20T06:17:54.404200 #842477] INFO -- gdu-server: Checking connectivity of <NODE_NAME>
[2026-02-20T06:18:04.404992 #842477] INFO -- gdu-server: Connectivity check for <NODE_NAME> timed out. Retrying one more time ...
[2026-02-20T06:18:14.405667 #842477] ERROR -- gdu-server: execution expired (Timeout::Error)
[2026-02-20T06:18:14.405761 #842477] ERROR -- gdu-server: /usr/lib/ruby/3.1.0/socket.rb:64:in `connect'
[2026-02-20T06:18:14.405778 #842477] ERROR -- gdu-server: /usr/lib/ruby/3.1.0/socket.rb:64:in `connect_internal'
[2026-02-20T06:18:14.405787 #842477] ERROR -- gdu-server: /usr/lib/ruby/3.1.0/socket.rb:137:in `connect'
[2026-02-20T06:18:14.405801 #842477] ERROR -- gdu-server: Unable to connect to <NODE_NAME>: execution expired
[2026-02-20T06:18:14.405976 #842477] INFO -- gdu-server: Node <NODE_NAME> has not fully come back up after a reboot. Attempt 1: key: "offline_error", options: {:hostname=>"<NODE_NAME>", :reason=>"Connectivity check timeout"} 

[2026-02-20T08:45:51.741537 #842477] INFO -- gdu-server: Node <NODE_NAME> has not fully come back up after a reboot. Attempt 180: key: "offline_error", options: {:hostname=>"<NODE_NAME>", :reason=>"Connectivity check timeout"} 
[2026-02-20T08:45:51.741563 #842477] ERROR -- gdu-server: Node <NODE_NAME> has failed to come completely back online after 181 attempts: key: "offline_error", options: {:hostname=>"<NODE_NAME>", :reason=>"Connectivity check timeout"} 
[2026-02-20T08:45:51.741631 #842477] ERROR -- gdu-server: key: "managed_reboot.not_online_error", options: {:hostname=>"<NODE_NAME>"}  (Gdu::LocalizedError)
[2026-02-20T08:45:51.741781 #842477] INFO -- gdu-server: Skipping update on <NODE_NAME>
[2026-02-20T08:45:51.771845 #842477] ERROR -- gdu-server: Failed to apply SANtricity OS to <NODE_NAME>: Fatal installation failure - Node reboot failed. See gdu-server.log for details.

  • /base-os-logs/run/mount-tmp/pge-actv-root/var/log/kern.log indicates the node booted into Pre-Grid Environment (PGE) mode with SATA link down errors:

2026-02-20T13:17:13.092961+00:00 StorageGRID-PGE kernel: [0.000000] Linux version 6.1.0-25-amd64 (debian-kernel@lists.debian.org) (gcc-12 (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC Debian 6.1.106-3+ntap0 (2024-09-12)
2026-02-20T13:17:13.093029+00:00 StorageGRID-PGE kernel: [0.000000] Command line: BOOT_IMAGE=/vmlinuz root=LABEL=pge-actv-root ro console=uart,io,0x9000,6400n81 intel_idle.max_cstate=2 intel_iommu=off netapp_sga_set_console_by_cpu=yes fsck.repair=yes memmap=64K$4K

2026-02-20T13:17:13.099063+00:00 StorageGRID-PGE kernel:[10.180972] ERST: Error Record Serialization Table (ERST) support is initialized.
2026-02-20T13:17:13.100138+00:00 StorageGRID-PGE kernel:[13.269070] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
2026-02-20T13:17:13.100139+00:00 StorageGRID-PGE kernel:[13.269097] ata5: SATA link down (SStatus 0 SControl 300)
2026-02-20T13:17:13.100140+00:00 StorageGRID-PGE kernel:[13.269124] ata3: SATA link down (SStatus 0 SControl 300)
2026-02-20T13:17:13.100141+00:00 StorageGRID-PGE kernel:[13.269155] ata1: SATA link down (SStatus 0 SControl 300)
2026-02-20T13:17:13.100142+00:00 StorageGRID-PGE kernel:[13.269183] ata4: SATA link down (SStatus 0 SControl 300)
2026-02-20T13:17:13.100143+00:00 StorageGRID-PGE kernel:[13.269210] ata6: SATA link down (SStatus 0 SControl 300)

  • /base-os-logs/run/mount-tmp/pge-actv-root/var/log/ records the node eventually boots out of PGE successfully:

2026-02-20T13:22:31.978220+00:00 StorageGRID-PGE kernel:[341.745991] sd 6:0:0:254: Power-on or device reset occurred
2026-02-20T13:22:33.426229+00:00 StorageGRID-PGE kernel:[343.191565] sd 15:0:0:238: [sdc] Spinning up disk...

  • The BMC event log may also record repeated Watchdog timeout events during boot loop:

06:30:40 [Warning] [Watchdog] [Watchdog 2] Hard Reset(Timer use at expiration: OS Load) - Asserted
13:13:28 [Warning] [Watchdog] [Watchdog 2] Hard Reset(Timer use at expiration: OS Load) - Asserted

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.