Skip to main content

NetApp_Insight_2020.png 

NetApp Knowledgebase

Shared Storage pool is unhealthy

Views:
171
Visibility:
Public
Votes:
0
Category:
ontap-9
Specialty:
core
Last Updated:

 

Applies to

ONTAP

Answer

The following message is displayed when a storage pool goes unhealthy.

Dec 14 04:06:21 [cluster01-01:raid.sp.unhealthy:notice]: Storage pool sp1 is unhealthy. Reason: One of the aggregates belonging to the storage pool is not in normal state.

When this error occurs, check if there are any SSD disk failures in the storage pool:

cluster01::> storage pool show -storage-pool sp1 -instance

                        Storage Pool Name: sp1
                     UUID of Storage Pool: 84afe3e1-a215-11e5-ac48-00a09854bc10
           Nodes Sharing the Storage Pool: cluster01-01, cluster01-02
          Number of Disks in Storage Pool: 22
                     Allocation Unit Size: 1023GB
      Allocation Unit Data Size for RAID4: 976.6GB
    Allocation Unit Data Size for RAID-DP: 930.1GB
   Allocation Unit Data Size for RAID-TEC: 883.6GB
                             Storage Type: SSD
                 Storage Pool Usable Size: 2.00TB
                  Storage Pool Total Size: 4.00TB
                         Is Pool Healthy?: false
                State of the Storage Pool: degraded
  Reason for Storage Pool Being Unhealthy: One of the aggregates belonging to the storage pool is not in normal state.
Job ID of the Currently Running Operation: -
               Is Allocation Unit Broken?: false

cluster01::>


cluster01::storage pool*> run local aggr status -r aggr1
Aggregate aggr1 (online, raid_dp, reconstruct, hybrid) (block checksums)
  Plex /aggr1/plex0 (online, normal, active, pool0)
    RAID group /aggr1/plex0/rg0 (normal, block checksums)

      RAID Disk Device          HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
      --------- ------          ------------- ---- ---- ---- ----- --------------    --------------
      dparity   2b.64           2b    4   0   FC:B   0  FCAL 15000 272000/557056000  280104/573653840
      parity    2a.50           2a    3   2   FC:A   0  FCAL 15000 272000/557056000  280104/573653840
      data      2a.34           2a    2   2   FC:A   0  FCAL 15000 272000/557056000  274845/562884296
      data      2a.18           2a    1   2   FC:A   0  FCAL 15000 272000/557056000  274845/562884296
      data      2a.65           2a    4   1   FC:A   0  FCAL 15000 272000/557056000  280104/573653840

    RAID group /aggr1/plex0/rg1 (reconstruction 74% completed, block checksums)

      RAID Disk Device          HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
      --------- ------          ------------- ---- ---- ---- ----- --------------    --------------
      dparity   0a.30.4P1       0a    30  4   SA:A   0   SSD   N/A 47619/97525248    47627/97541632
      parity    0a.30.11P1      0a    30  11  SA:A   0   SSD   N/A 47619/97525248    47627/97541632
      data      0b.10.22P1      0b    10  22  SA:B   0   SSD   N/A 47619/97525248    47627/97541632 (reconstruction 74% completed)
      data      0a.30.5P1       0a    30  5   SA:A   0   SSD   N/A 47619/97525248    47627/97541632



The unhealthy state should change to normal when the reconstruction completes.


However, if the broken disk is put back in the system as a partitioned SSD, then the storage pool will remain unhealthy with the following state:

cluster01::storage pool*> show -storage-pool sp1 -instance

                        Storage Pool Name: sp1
                     UUID of Storage Pool: 84afe3e1-a215-11e5-ac48-00a09854bc10
           Nodes Sharing the Storage Pool: cluster01-01, cluster01-02
          Number of Disks in Storage Pool: 22
                     Allocation Unit Size: 1023GB
      Allocation Unit Data Size for RAID4: 976.6GB
    Allocation Unit Data Size for RAID-DP: 930.1GB
   Allocation Unit Data Size for RAID-TEC: 883.6GB
                             Storage Type: SSD
                 Storage Pool Usable Size: 2.00TB
                  Storage Pool Total Size: 4.00TB
                         Is Pool Healthy?: false
                State of the Storage Pool: degraded
  Reason for Storage Pool Being Unhealthy: Storage pool has more number of disks than expected.
Job ID of the Currently Running Operation: -
               Is Allocation Unit Broken?: false

cluster01::storage pool*>


The following message is displayed:

cluster01::storage pool*> Dec 14 04:39:44 [cluster01-01:raid.sp.unhealthy:notice]: Storage pool sp1 is unhealthy. Reason: Storage pool has more number of disks than expected.

The storage pool will remain in this state until the previously failed and replaced SSD is either physically removed from the system or unpartitioned.

Additional Information

Perform the following steps to manually unpartition the SSD drive that was replaced from the storage pool.
Note: A disk can be removed from the storage pool only if it is not in use by any aggregate.


Exercise extra caution or contact NetApp Technical Support before performing the following steps.

  1. Run the storage pool show-disks -storage-pool <sp name> command to find the existing drives in the storage pool.
  2. Select the drive that was replaced by reconstruction and needs to be removed from the storage pool.
  3. Run the storage disk show -disk <disk name>" -fields diskpathnames, owner command to determine the owner node of the shared drive and the local name of the drive on the owner node. The diskpathnames field will give names in the hostname:localname format.
  4. Run the storage disk partition show -container-disk <disk name> -fields owner-node-name command to know all partition names and their owner nodes from this shared drive.
  5. For partitions which have a different owner than the disk, change ownership by running the storage disk partition removeowner -partition <partition name> and storage disk partition assign -partition <partition name> -owner <disk owner> commands.
    After this step, the owner for the disk and all its partitions should be the same.
  6. Drop to the node shell of the owner node.
  7. Run the disk unpartition <disk name> command
    Once the disk is unpartitioned, it will come back as a spare disk.
    Once the disk becomes a spare disk, then the storage pool should report a healthy state.