Loss of connectivity after disk failure

Last updated
Save as PDF
Share
1. Share
2. Tweet
3. Share

Views:: 289

Visibility:: Public

Votes:: 0

Category:: disk-drives

Specialty:: Perf

Last Updated:

Applies to

Hardware disk failure
Long Consistency Point (CP) errors reported
Data outage

Issue

Data outage noticed on customer side, for some seconds. Examples:
- NFS exports disconnected
- CIFS shares not accessible
- Missing VMs
Hardware disk failure reported. Example:

[node_name: config_thread: raid.config.filesystem.disk.not.responding:notice]: File system Disk /aggr_name/plex0/rg0/0a.0.1 Shelf 0 Bay 1 [...] is not responding.

[node_name: monitor: monitor.globalStatus.nonCritical:error]: Disk on adapter FPF1939S03T:9, shelf 1, bay 5, not responding.

ONTAP event error for a long CP reported in data and/or root aggregate. Example:

[node_name: wafl_exempt13: wafl.cp.toolong:error]: Aggregate aggr0 experienced a long CP. [node_name: wafl_exempt16: wafl.cp.toolong:error]: Aggregate aggr_name experienced a long CP.

Too long Consistency Point (CP) phase 2 reported in the sktraces AutoSupport section, when flushing data to disks. Example:

2024-1-1T00:01:01Z 12345678912345678 [5:0] CRUISE_6: CP toolong: aggr0[5678901] CP_P2_FLUSH 498765ms 2024-1-1T01:01:05Z 23456789123456789 [2:0] CRUISE_6: CP toolong: aggr_name[5789012] CP_P2_FLUSH 512345ms