Intermittent latency spikes and latency on remote disk
Applies to
- ONTAP 9
- AFF -A400
- MetroCluster
- Disk X4013S17337T6NTE
Issue
- High latency observed at certain times along with service outage on ESXI datastores
- EMS logs shows the following messages during the time of the issue
[XXXXXX-XX:wafl_exempt18: wafl.cp.toolong:error]: Aggregate XXXX_XXXX_aggr1 experienceda long CP
[XXXXXX -XX:disk_latency_monitor: shm.ssd.threshold.ioLatency:notice]: SSD 0v.i1.2L20 hasexceeded the expected block latency in the current timeframe with an averagelatency of 4670 us and an average utilization of 11 percent. The next highestSSD latency: 110 us. Disk 0v.i1.2L20 Shelf 10 Bay 22 [NETAPP X4013S17337T6NTE NA53] S/N [S60RNA0R900618] UID[36305230:52900618:00253841:00000002:00000000:00000000:00000000:00000000:00000000:00000000
- Back to back CP is observed with most time spent on P2 Flush phase
- Latency is from one of the disks on the remote plex
- Frequent disk failures seen on disk of the model
[X4013S17337T6NTE]
On the same Shelf