Skip to main content
NetApp Knowledge Base

Continuous raid.disk.replace.job.failed messages in systems with internal storage

Views:
841
Visibility:
Public
Votes:
0
Category:
fas-systems
Specialty:
hw
Last Updated:

Applies to

  • FAS2820
  • FAS2720
  • FAS2750
  • Fully populated internal shelf (12 disks)
  • Fully populated external shelf DS212-C (12 disks)
  • Data aggregates using shared disks from the internal and the external shelf
  • No spare drives available in the internal shelf

Issue

  • Both nodes reporting raid.disk.replace.job.failed messages, when trying to use the same internal spare disk:
[node_name-01: mgwd: raid.disk.replace.job.start:debug]: Starting disk replacement of disk 2.2.7 with disk 2.1.10.
[node_name-01: mgwd: raid.disk.replace.job.failed:debug]: Failed to replace disk 2.2.7 with disk 2.1.10. Reason: Reserve: Unable to reserve disks on target node node_name-02.
[node_name-01: mgwd: mgmtgwd.jobmgr.jobcomplete.failure:debug]: Job "Disk Replace:2.2.7" [id 55] (Disk Replace) completed unsuccessfully: Reserve: Unable to reserve disks on target node node_name-02 (1).
[node_name-01: mgwd: raid.disk.replace.job.start:debug]: Starting disk replacement of disk 2.2.7 with disk 2.1.10.

 
[node_name-02: config_thread: raid.rg.spares.low:debug]: /data_aggr-2/plex0/rg0
[node_name-02: config_thread: callhome.spares.low:debug]: Call home for SPARES_LOW: /data_aggr-2/plex0/rg0
[node_name-02: mgwd: raid.disk.replace.job.start:notice]: Starting disk replacement of disk 2.2.8 with disk 2.1.10.
[node_name-02: mgwd: raid.disk.replace.job.failed:error]: Failed to replace disk 2.2.8 with disk 2.1.10. Reason: Reserve: Unable to reserve disks on target node node_name-02.
[node_name-02: mgwd: mgmtgwd.jobmgr.jobcomplete.failure:info]: Job "Disk Replace:2.2.8" [id 54] (Disk Replace) completed unsuccessfully: Reserve: Unable to reserve disks on target node node_name-02 (1).
[node_name-02: mgwd: raid.disk.replace.job.start:notice]: Starting disk replacement of disk 2.2.8 with disk 2.1.10.
  • RAID-LM logs continuously reporting in both nodes:

...: ksmfContainerDiskReplace run -disk 2.2.7 -allow-mixing true -reason layout_optimization -replacement 2.1.10: succeeded
...: 2.2.7 needs to be copied to an internal disk.
...: 2.2.1 needs to be copied to an internal disk.
...: 2.2.5 needs to be copied to an internal disk.
...: 2.2.9 needs to be copied to an internal disk.
...: 2.2.3 needs to be copied to an internal disk.
...: ksmfContainerDiskReplace run  -disk 2.2.7 -allow-mixing true -reason layout_optimization -replacement 2.1.10: succeeded

...: ksmfContainerDiskReplace run -disk 2.2.8 -allow-mixing true -reason layout_optimization -replacement 2.1.10: succeeded
...: 2.2.8 needs to be copied to an internal disk.
...: 2.2.2 needs to be copied to an internal disk.
...: 2.2.0 needs to be copied to an internal disk.
...: 2.2.6 needs to be copied to an internal disk.
...: 2.2.4 needs to be copied to an internal disk.
...: 2.2.10 needs to be copied to an internal disk.
...: ksmfContainerDiskReplace run -disk 2.2.8 -allow-mixing true -reason layout_optimization -replacement 2.1.10: succeeded

  • Both nodes trying to get the ownership of the same drive:

[node_name-01: sanown_io: diskown_changingOwner_1:debug]: params: {'diskname': '0c.01.10P1', 'serialno': '0AB12C3DEP001', 'oldownername': 'node_name-01', 'oldownerid': '111111111', 'oldhomeownerid': '111111111', 'olddrhomeownerid': '222222222', 'newownername': 'node_name-02', 'newownerid': '538302348', 'newhomeownerid': '538302348', 'newdrhomeownerid': '222222222', 'thread': 'svc_queue_thread', 'APIname': 'zapi_disk_sanown_assign'}
[node_name-01: sanown_io: diskown_changingOwner_1:debug]: params: {'diskname': '0c.01.10P1', 'serialno': '0AB12C3DEP001', 'oldownername': 'node_name-01', 'oldownerid': '111111111', 'oldhomeownerid': '111111111', 'olddrhomeownerid': '222222222', 'newownername': 'node_name-02', 'newownerid': '538302348', 'newhomeownerid': '538302348', 'newdrhomeownerid': '222222222', 'thread': 'svc_queue_thread', 'APIname': 'zapi_disk_sanown_assign'}

[node_name-02: sanown_io: diskown_changingOwner_1:notice]: params: {'newdrhomeownerid': '222222222', 'diskname': '0c.01.10P1', 'APIname': 'zapi_disk_sanown_assign', 'thread': 'svc_queue_thread', 'serialno': '0AB12C3DEP001', 'oldhomeownerid': '538302348', 'newownername': 'node_name-01', 'newownerid': '111111111', 'oldownerid': '538302348', 'oldownername': 'node_name-02', 'newhomeownerid': '111111111', 'olddrhomeownerid': '222222222'}
[node_name-02: sanown_io: diskown_changingOwner_1:notice]: params: {'newdrhomeownerid': '222222222', 'diskname': '0c.01.10P1', 'APIname': 'zapi_disk_sanown_assign', 'thread': 'svc_queue_thread', 'serialno': '0AB12C3DEP001', 'oldhomeownerid': '538302348', 'newownername': 'node_name-01', 'newownerid': '111111111', 'oldownerid': '538302348', 'oldownername': 'node_name-02', 'newhomeownerid': '111111111', 'olddrhomeownerid': '222222222'}

  • SYSCONFIG -R shows different ownership of the affected disk parition from output of storage disk show -partition-ownership.
    • SYSCONFIG -R shows:

      node_name-01

      Pool0 spare disks

      RAID Disk    Device      HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
      ---------    ------      ------------- ---- ---- ---- ----- --------------    --------------
      Spare disks for block checksum
      ...
      spare       0b.01.10P2    0b    1   11  SA:A   0  FSAS  7200 63849/130764288   63857/130780672 (fast zeroed)
      spare       0b.01.10P1    0b    1   11  SA:A   0  FSAS  7200 3743930/7667569664 3743938/7667586048 (fast zeroed)

    • Output of storage disk show -partition-ownership:

::> storage disk show -partition-ownership -disk 2.1.10
Disk     Partition Home              Owner             Home ID     Owner ID
-------- --------- ----------------- ----------------- ----------- -----------
2.1.10   Container node_name-01      node_name-01      5xxxxxxx1   5xxxxxxx1
         Root      node_name-01      node_name-01      5xxxxxxx1   5xxxxxxx1
         Data      node_name-02      node_name-02      5xxxxxxx2   5xxxxxxx2

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.