Shared Storage pool is unhealthy

Last updated
Save as PDF
Share
1. Share
2. Tweet
3. Share

Views:: 2,452

Visibility:: Public

Votes:: 0

Category:: ontap-9

Specialty:: core

Last Updated:

Applies to

ONTAP

Answer

The following message is displayed when a storage pool goes unhealthy.

Dec 14 04:06:21 [cluster01-01:raid.sp.unhealthy:notice]: Storage pool sp1 is unhealthy. Reason: One of the aggregates belonging to the storage pool is not in normal state.

When this error occurs, check if there are any SSD disk failures in the storage pool:

cluster01::> storage pool show -storage-pool sp1 -instance Storage Pool Name: sp1 UUID of Storage Pool: 84afe3e1-a215-11e5-ac48-00a09854bc10 Nodes Sharing the Storage Pool: cluster01-01, cluster01-02 Number of Disks in Storage Pool: 22 Allocation Unit Size: 1023GB Allocation Unit Data Size for RAID4: 976.6GB Allocation Unit Data Size for RAID-DP: 930.1GB Allocation Unit Data Size for RAID-TEC: 883.6GB Storage Type: SSD Storage Pool Usable Size: 2.00TB Storage Pool Total Size: 4.00TB Is Pool Healthy?: false State of the Storage Pool: degraded Reason for Storage Pool Being Unhealthy: One of the aggregates belonging to the storage pool is not in normal state. Job ID of the Currently Running Operation: - Is Allocation Unit Broken?: false cluster01::> cluster01::storage pool*> run local aggr status -r aggr1 Aggregate aggr1 (online, raid_dp, reconstruct, hybrid) (block checksums) Plex /aggr1/plex0 (online, normal, active, pool0) RAID group /aggr1/plex0/rg0 (normal, block checksums) RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks) --------- ------ ------------- ---- ---- ---- ----- -------------- -------------- dparity 2b.64 2b 4 0 FC:B 0 FCAL 15000 272000/557056000 280104/573653840 parity 2a.50 2a 3 2 FC:A 0 FCAL 15000 272000/557056000 280104/573653840 data 2a.34 2a 2 2 FC:A 0 FCAL 15000 272000/557056000 274845/562884296 data 2a.18 2a 1 2 FC:A 0 FCAL 15000 272000/557056000 274845/562884296 data 2a.65 2a 4 1 FC:A 0 FCAL 15000 272000/557056000 280104/573653840 RAID group /aggr1/plex0/rg1 (reconstruction 74% completed, block checksums) RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks) --------- ------ ------------- ---- ---- ---- ----- -------------- -------------- dparity 0a.30.4P1 0a 30 4 SA:A 0 SSD N/A 47619/97525248 47627/97541632 parity 0a.30.11P1 0a 30 11 SA:A 0 SSD N/A 47619/97525248 47627/97541632 data 0b.10.22P1 0b 10 22 SA:B 0 SSD N/A 47619/97525248 47627/97541632 (reconstruction 74% completed) data 0a.30.5P1 0a 30 5 SA:A 0 SSD N/A 47619/97525248 47627/97541632

The unhealthy state should change to normal when the reconstruction completes.

However, if the broken disk is put back in the system as a partitioned SSD, then the storage pool will remain unhealthy with the following state:

cluster01::storage pool*> show -storage-pool sp1 -instance Storage Pool Name: sp1 UUID of Storage Pool: 84afe3e1-a215-11e5-ac48-00a09854bc10 Nodes Sharing the Storage Pool: cluster01-01, cluster01-02 Number of Disks in Storage Pool: 22 Allocation Unit Size: 1023GB Allocation Unit Data Size for RAID4: 976.6GB Allocation Unit Data Size for RAID-DP: 930.1GB Allocation Unit Data Size for RAID-TEC: 883.6GB Storage Type: SSD Storage Pool Usable Size: 2.00TB Storage Pool Total Size: 4.00TB Is Pool Healthy?: false State of the Storage Pool: degraded Reason for Storage Pool Being Unhealthy: Storage pool has more number of disks than expected. Job ID of the Currently Running Operation: - Is Allocation Unit Broken?: false cluster01::storage pool*>

The following message is displayed:

cluster01::storage pool*> Dec 14 04:39:44 [cluster01-01:raid.sp.unhealthy:notice]: Storage pool sp1 is unhealthy. Reason: Storage pool has more number of disks than expected.

The storage pool will remain in this state until the previously failed and replaced SSD is either physically removed from the system or unpartitioned.

Additional Information

Perform the following steps to manually unpartition the SSD drive that was replaced from the storage pool.
Note: A disk can be removed from the storage pool only if it is not in use by any aggregate.

Exercise extra caution or contact NetApp Technical Support before performing the following steps.

Run the storage pool show-disks -storage-pool <sp name> command to find the existing drives in the storage pool.
Select the drive that was replaced by reconstruction and needs to be removed from the storage pool.
Run the storage disk show -disk <disk name>" -fields diskpathnames, owner command to determine the owner node of the shared drive and the local name of the drive on the owner node. The diskpathnames field will give names in the hostname:localname format.
Run the storage disk partition show -container-disk <disk name> -fields owner-node-name command to know all partition names and their owner nodes from this shared drive.
For partitions which have a different owner than the disk, change ownership by running the storage disk partition removeowner -partition <partition name> and storage disk partition assign -partition <partition name> -owner <disk owner> commands.
After this step, the owner for the disk and all its partitions should be the same.
Drop to the node shell of the owner node.
Run the disk unpartition <disk name> command
Once the disk is unpartitioned, it will come back as a spare disk.
Once the disk becomes a spare disk, then the storage pool should report a healthy state.