Skip to main content
NetApp Knowledge Base

Dual carrier disks stuck in endless SDC loop after upgrade to 9.8

Views:
311
Visibility:
Public
Votes:
0
Category:
fas-systems
Specialty:
hw
Last Updated:

Applies to

  • DS4486 storage shelf
  • ONTAP 9.8

Issue

  • After upgrade to 9.8, multiple disks on DS4486 shelf report shm_setup_for_failure without any particular cause:
Wed Feb 10 09:40:25 -0800 [nodeb: api_dpool_17: scsi.debug:debug]: shm_setup_for_failure disk 3a.20.7L1 (S/N ZC1xxxxx) error 40000000h
Wed Feb 10 09:40:25 -0800 [nodeb: api_dpool_18: scsi.debug:debug]: shm_setup_for_failure disk 3a.20.17L1 (S/N K7Hxxxxx) error 40000000h
Wed Feb 10 09:40:26 -0800 [nodeb: api_dpool_20: scsi.debug:debug]: shm_setup_for_failure disk 3a.21.20L2 (S/N ZC1xxxxx) error 40000000h
 
  • Following this, while one disk in the carrier gets evacuated/failed out, the other disk in same carrier goes through a sick disk copy loop that will never progress past 0% and eventually cancel itself:
 
RAID Disk Device          HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
--------- ------          ------------- ---- ---- ---- ----- --------------    --------------
dparity   0d.23.2L2       0d    23  2   SA:A   0 MSATA  7200 3748319/7676558720 3815447/7814037168
parity    3b.13.22L1      3b    13  22  SA:A   0 MSATA  7200 3748319/7676558720 3815447/7814037168
data      3a.20.19L1      3a    20  19  SA:B   0 MSATA  7200 3748319/7676558720 3815447/7814037168
data      3a.21.20L1      3a    21  20  SA:B   0 MSATA  7200 3748319/7676558720 3815447/7814037168 (evacuating, copy in progress)
-> copy   3a.21.16L2      3a    21  16  SA:B   0 MSATA  7200 3748319/7676558720 3815447/7814037168 (copy 0% completed)
 
  • The following messages are seen in EMS log:
  raid_lm: raidlm.carrier.evac.start
  config_thread: raid.rg.diskcopy.start
  config_thread: raid.rg.diskcopy.progress
  raid_lm: raidlm.carrier.evac.abort
  config_thread: raid.rg.diskcopy.aborted
  • Example:

[raid.rg.diskcopy.start:notice]: /nodea_aggr_vol0/plex0/rg0: starting disk copy from 3a.21.20L1 (S/N [PCJHxxx]) to 3a.21.16L2 (S/N [ZC1Dxxxx]). Reason: Disk replace was started..
raid.rg.diskcopy.progress:debug]: Disk copy progress from  (S/N PCJHxxxx) to  (S/N ZC1Dxxxx) is on disk block 0 and is 0%% complete after 0:00:00 (HH:MM:SS).
[raid_rg_diskcopy_aborted_1:notice]: params: {'target': '3a.21.16L2', 'duration': '3:13.16', 'source': '3a.21.20L1', 'reason': 'Source disk failed.', 'rg': '/nodea_aggr_vol0/plex0/rg0', 'owner': '', 'aggregate_uuid': 'f0f3c156-b7f6-4344-adce-249752a6fcf4', 'blockNum': '2156224'}
[raid.rg.diskcopy.start:notice]: /nodea_aggr_02/plex0/rg1: starting disk copy from 3a.21.20L1 (S/N [K4G5xxxx]) to 3a.21.16L2 (S/N [ZC1Dxxxx]). Reason: Disk replace was started..
[raid.rg.diskcopy.progress:debug]: Disk copy progress from 2.21.20.1 (S/N K4G5xxxx) to 2.21.16.2 (S/N ZC1Dxxxx) is on disk block 0 and is 0%% complete after 0:00:00 (HH:MM:SS).

[raid_rg_diskcopy_aborted_1:notice]: params: {'target': '3a.21.16L2', 'duration': '2:53.00', 'source': '3a.21.20L1', 'reason': 'Source disk failed.', 'rg': '/nodea_aggr_02/plex0/rg1', 'owner': '', 'aggregate_uuid': 'cd3fa773-6ba5-48f6-9872-8c4a7ed5ff6f', 'blockNum': '793024'}
[raid.rg.diskcopy.start:notice]: /nodea_aggr_vol0/plex0/rg0: starting disk copy from 3a.21.20L1 (S/N [PCJHxxxx]) to 3a.21.16L2 (S/N [ZC1Dxxxx]). Reason: Disk replace was started..
[raid.rg.diskcopy.progress:debug]: Disk copy progress from 2.21.20.1 (S/N PCJHxxxx) to 2.21.16.2 (S/N ZC1Dxxxx) is on disk block 0 and is 0%% complete after 0:00:00 (HH:MM:SS).

 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.

 

  • Was this article helpful?