Aggregate experienced a long CP with "wafl.cp.toolong:error" event
Applies to
- ONTAP 9
- All FAS systems with HDD disks
Issue
wafl.cp.toolong
error messages logged , causing latency issues on the aggregate.
Tue Oct 17 09:34:07 +0000 [node01: wafl_exempt01: wafl.cp.toolong:error]: Aggregate aggr1 experienced a long CP.
Tue Oct 17 09:34:47 +0000 [node01: wafl_exempt11: wafl.cp.toolong:error]: Aggregate aggr1 experienced a long CP.
Tue Oct 17 09:35:37 +0000 [node01: wafl_exempt05: wafl.cp.toolong:error]: Aggregate aggr1 experienced a long CP.
- EMS log may report disk related error for aggregate of concern:
Sun Dec 17 17:33:59 +0000 [Cluster01-01: disk_latency_monitor: shm.threshold.ioLatency:debug]:
Disk 1b.53.47 has exceeded the expected IO latency in the current window with average latency of 50 msecs and average utilization of 77 percent. Highest average IO latency: 1b.53.47: 50 msecs; next highest IO latency: 1b.53.9: 10 msecs.
Disk 1b.53.47 Shelf 53 Drawer 4 Slot 11 Bay 47 [NETAPP X375_SCMNE04TA07 NA00] S/N [1234abcd] UID [ABCD1234:EFGH5678:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000]
- The
statit
command reports One or more disks with higher utilization than those in the same raid group.
1b.03.51 88 29.36 27.66 4.44 33792 1.47 61.25 1232 0.23 25.25 849 0.00 .... . 0.00
9b.10.52 25 31.66 30.00 3.93 2994 1.46 61.85 120 0.20 36.43 308 0.00 .... . 0.00
1a.05.55 1 1.85 0.00 .... . 1.52 60.87 61 0.33 38.13 244 0.00 .... . 0.00
9a.02.54 81 29.30 27.74 4.13 29802 1.33 60.53 1069 0.23 22.25 1689 0.00 .... . 0.00