"sensorReadingFailed" and "ensembleDegraded" alerts due to filesystem issue
Applies to
- NetApp HCI Storage node
- NetApp SolidFire Storage node
Issue
- The below alerts are seen on NetApp SolidFire Active IQ and Element cluster web GUI:
sensorReadingFailed
IPMI diagnostics are currently unresponsive. Please contact support if this problem persists.
ensembleDegraded
Ensemble degraded: 1/5 database servers not connectable: {3:x.x.x.x}
- EXT4-fs errors are being shown at the remote console of a storage node
[3367598.061077] EXT4-fs error (device sda2): ext4_journal_check_start:61:Detected aborted journal
[3367598.061078] EXT4-fs error (sda2): Remounting filesystem read-only xxxxxxxx
[3367598.125694] EXT4-fs error (sda3): in ext4_writepages:2878: IO failure
- The event log shows:
networkEvent Failed to install SSL certificate 3 { "message": "Failed to remove path=[/sf/etc/ssl/active.crt] errorCode=system:30 errorCode.message()=Read-only file system", "name": "xCheckFailure" }
platformHardwareEvent Updating BMC cold reset date 6 3 { "bmcResetDurationMinutes": 0, "bmcResetDate": "2021-05-11T23:16:41" }
unexpectedException Unexpected Exception - xCreateRepositorySourceFileFailed Failed to open and truncate /sf/apt/sources.list.new.tmp callback=[ {4:RepositorySources::packageManagerCallbackTag}] wtype=[SessionConnected] - Contact SolidFire Support. 6 3 ""
- The node's BMC webpage can be accessed normally and 1G/10G network is reachable
- When rebooting the node by following the solution of KB, the below error is detected during the booting:
Version 2.17.1249. Copyright (C) 2017 American Megatrends, Inc.
NetApp H500S BIOS Date:07/10/2017 Rev:NA2.1
CPU : Intel(R) Xenom(R) CPU E5-2620 v4 @ 2.10GHz
Speed : 2.10 GHz
The IMC is operating with DDR4 2133 MHz
Port 0 : Micron_5100_XXXXXXXXXX
S.M.A.R.T Status Bad, Backup and Replace.
Press F1 to Resume...