Snapshot copies are not taken per schedule for any volumes on a cluster node
Applies to
- ONTAP 9 (all versions)
- ONTAP 9.1 (all versions prior to 9.1P9)
Issue
This issue can manifest in three ways:
1. Automatic volume snapshot copy creation fails for all volumes on the affected node, including volumes created after the issue starts. No snapshot policies or schedules produce snapshot copies. Manual snapshot creation works without issue.
2. Volume snapshots are not deleted. Eventually, we reach the limit of 255 snapshots and no more snapshot copies can be taken, manual or otherwise.
In this scenario, manual snapshot creation fails with the following:
cluster1::> volume snapshot create -vserver vserver_name -volume volume_name -snapshot snap_name
Error: command failed: Failed to create snapshot "snap_name" of volume "volume_name" on Vserver "vserver_name". Reason: Cannot
exceed maximum number of Snapshot copies.
Automatic snapshot creation failures can be seen in the event log by running the following:
cluster1::> event log show -node cluster1-01 -message-name wafl.snap.create*
Time Node Severity Event
------------------- ---------------- ------------- ---------------------------
6/7/2018 21:05:00 cluster1-01 NOTICE wafl.snap.create.skip.reason: vol18ee7c9be-036f-11e7-aa76-005056920d02: skipping creation of hourly.2018-06-07_2105 Snapshot copy (Cannot exceed maximum number of Snapshot copies.).
3. The system panics with a message similar to the following:
Sun Feb 04 22:00:07 GMT [cluster1-01: snapd_cmode: sk.panic:alert]: Panic String: process on cpu2 hung (snapd_cmode) for 5003 milliseconds! in SK process snapd_cmode on release 9.1 (C)