Skip to main content
NetApp Knowledge Base

Why does an aggregate go offline after multiple disks failures

Views:
375
Visibility:
Public
Votes:
0
Category:
fas-systems
Specialty:
hw
Last Updated:

Applies to

  • ONTAP 9
  • AFF
  • FAS

Answer

The number of multiple disk failures are above the RAID tolerance threshold
  • 2 or more in RAID4
  • 3 or more in RAID-DP
  • 4 or more in RAID-TEC

Example for a RAID-DP:

Cluster::> run -node Node-01 sysconfig -r

  • Before:
RAID group Aggr1/plex0/rg1 (normal, block checksums)

RAID Disk    Device      HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
---------    ------      ------------- ---- ---- ---- ----- --------------    --------------
dparity     0a.12.16    0a    12  16  SA:B   0   SAS 10000 857000/1755136000 858483/1758174768
parity      0a.12.17    0a    12  17  SA:B   0   SAS 10000 857000/1755136000 858483/1758174768
...
data        0a.11.6     0a    11  6   SA:B   0   SAS 10000 857000/1755136000 858483/1758174768
data        0a.11.7     0a    11  7   SA:B   0   SAS 10000 857000/1755136000 858483/1758174768
data        0a.11.8     0a    11  8   SA:B   0   SAS 10000 857000/1755136000 858483/1758174768
data        0a.11.9     0a    11  9   SA:B   0   SAS 10000 857000/1755136000 858483/1758174768
data        0a.11.10    0a    11  10  SA:B   0   SAS 10000 857000/1755136000 858483/1758174768
data        0a.11.19    0a    11  19  SA:B   0   SAS 10000 857000/1755136000 858483/1758174768
data        0a.11.12    0a    11  12  SA:B   0   SAS 10000 857000/1755136000 858483/1758174768
data        0a.11.13    0a    11  13  SA:B   0   SAS 10000 857000/1755136000 858483/1758174768
data        0a.11.14    0a    11  14  SA:B   0   SAS 10000 857000/1755136000 858483/1758174768
data        0a.11.15    0a    11  15  SA:B   0   SAS 10000 857000/1755136000 858483/1758174768
...
  • Two disk fail:
RAID group Aggr1/plex0/rg1 (double degraded, block checksums)

RAID Disk    Device      HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
---------    ------      ------------- ---- ---- ---- ----- --------------    --------------
dparity     0a.12.16    0a    12  16  SA:B   0   SAS 10000 857000/1755136000 858483/1758174768
parity      0a.12.17    0a    12  17  SA:B   0   SAS 10000 857000/1755136000 858483/1758174768
...
data        0a.11.6     0a    11  6   SA:B   0   SAS 10000 857000/1755136000 858483/1758174768
data    FAILED          N/A                        857000/ -
data        0a.11.8     0a    11  8   SA:B   0   SAS 10000 857000/1755136000 858483/1758174768
data        0a.11.9     0a    11  9   SA:B   0   SAS 10000 857000/1755136000 858483/1758174768
data        0a.11.10    0a    11  10  SA:B   0   SAS 10000 857000/1755136000 858483/1758174768
data        0a.11.19    0a    11  19  SA:B   0   SAS 10000 857000/1755136000 858483/1758174768
data        0a.11.12    0a    11  12  SA:B   0   SAS 10000 857000/1755136000 858483/1758174768
data        0a.11.13    0a    11  13  SA:B   0   SAS 10000 857000/1755136000 858483/1758174768
data    FAILED          N/A                        857000/ -
data        0a.11.15    0a    11  15  SA:B   0   SAS 10000 857000/1755136000 858483/1758174768
...
 
  • More than two disk failures:

Aggregate Aggr1 (failed, raid_dp, partial) (block checksums)
  Plex /Aggr1/plex0 (offline, failed, inactive)
    RAID group /Aggr1/plex0/rg1 (normal, block checksums)

RAID Disk    Device      HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
---------    ------      ------------- ---- ---- ---- ----- --------------    --------------
dparity     0a.12.16    0a    12  16  SA:B   0   SAS 10000 857000/1755136000 858483/1758174768
parity      0a.12.17    0a    12  17  SA:B   0   SAS 10000 857000/1755136000 858483/1758174768
...
data        0a.11.6     0a    11  6   SA:B   0   SAS 10000 857000/1755136000 858483/1758174768
data    FAILED          N/A                        857000/ -
data        0a.11.11    0a    11  11  SA:B   0   SAS 15000 857000/1755136000 858483/1758174768 (reconstruct stalled)
data        0a.11.9     0a    11  9   SA:B   0   SAS 10000 857000/1755136000 858483/1758174768
data        0a.11.10    0a    11  10  SA:B   0   SAS 10000 857000/1755136000 858483/1758174768
data    FAILED          N/A                        857000/ -
data        0a.11.12    0a    11  12  SA:B   0   SAS 10000 857000/1755136000 858483/1758174768
data        0a.11.13    0a    11  13  SA:B   0   SAS 10000 857000/1755136000 858483/1758174768
data    FAILED          N/A                        857000/ -
data        0a.11.15    0a    11  15  SA:B   0   SAS 10000 857000/1755136000 858483/1758174768
...
Raid group is missing 3 disks.

Additional Information

additionalInformation_text
NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.