"sensorReadingFailed" and "ensembleDegraded" alerts are detected
Applies to
- NetApp SolidFire
- NetApp Element software prior to 12.3.x
Issue
- The below alerts are seen on NetApp SolidFire Active IQ and Element cluster web GUI:
sensorReadingFailed
IPMI diagnostics are currently unresponsive. Please contact support if this problem persists.
ensembleDegraded
Ensemble degraded: 1/5 database servers not connectable: {3:x.x.x.x}
- EXT4-fs errors are being shown at the remote console of a storage node
[3367598.061077] EXT4-fs error (device sda2): ext4_journal_check_start:61:Detected aborted journal
[3367598.061078] EXT4-fs error (sda2): Remounting filesystem read-only xxxxxxxx
[3367598.125694] EXT4-fs error (sda3): in ext4_writepages:2878: IO failure
- The event log shows:
networkEvent Failed to install SSL certificate 3 { "message": "Failed to remove path=[/sf/etc/ssl/active.crt] errorCode=system:30 errorCode.message()=Read-only file system", "name": "xCheckFailure" }
platformHardwareEvent Updating BMC cold reset date 6 3 { "bmcResetDurationMinutes": 0, "bmcResetDate": "2021-05-11T23:16:41" }
unexpectedException Unexpected Exception - xCreateRepositorySourceFileFailed Failed to open and truncate /sf/apt/sources.list.new.tmp callback=[ {4:RepositorySources::packageManagerCallbackTag}] wtype=[SessionConnected] - Contact SolidFire Support. 6 3 ""
- When generating a Support Bundle the below error occurs:
Error creating support bundle: SolidFireApiError server=[xx.xx.xx.xx]
method=[CreateSupportBundle], params=[{'bundleName': 'supportbundle', 'extraArgs': ' --binary'}] - error name=[xAPIMethodFailed],
message=[Create Support Bundle failed processing:
command=timeout -s KILL 1500s /sf/scripts/sfsupportbundle --quiet ?--binary
/tmp/solidfire-dtemp.ScjN9i/supportbundle
cmdResult={ rc=1 stdout="" stderr="mount: /: cannot remount /dev/sda2 read-write, is write-protected.
mv: cannot move '/var/log/sf-supportbundle.info.3' to '/var/log/sf-supportbundle.info.4': Read-only file system
- Enable SSH API is successfully implemented but the SSH service cannot be started by the OS
- The node's BMC webpage can be accessed normally and 1G/10G network is reachable