CONTAP-567444: repetitive ucore cause disaggregated mroot enospc leading to unbootable node
Issue
A bootmedia running out of space in disaggregated storage platforms will lead to an initial node watchdog panic (for example: {}PANIC: Process vldb unresponsive for 196 seconds in process nodewatchdog{}) and subsequently the node will fail to boot in a loop due to no space left on boot media:CRITICAL. This node is not healthy because the boot media is low on space
(<10MB). The node can still serve data, but it cannot participate in cluster
operations until this situation is rectified. Free space using the systemshell
or contact technical support for assistance.
Oct 20 10:45:32 [MyCluster-01:callhome.mdb.recovery.unsuccessful:EMERGENCY]: Call home for MDB RECOVERY UNSUCCESSFUL FOR THE vldb WARNING
Oct 20 10:45:32 [MyCluster-01:ucore.panicString:error]: 'vldb: assertion (0 == "Failed to create the lock file directory") at src/corrupt_rdb.cc:833 failed, raising SIGABRT(6) at RIP 0x80838ae8a (pid 66018, uid 0, timestamp 1760949933)'
Data ONTAP failed to initialize swap space in /mroot/etc/swapfile due to error code: 1
Please pick another volume/aggregate as root or contact technical support.
The amount of space available in the root volume is as follows:
Filesystem 1024-blocks Used Avail Capacity Mounted on
pool0/mroot 15728744 15728744 0 100% /mroot
We need 40496MB of free space on root volume to initialize the swap space.
