Continuous raid.disk.replace.job.failed messages in systems with internal storage
Applies to
- FAS2820
- FAS2720
- FAS2750
- Fully populated internal shelf (12 disks)
- Fully populated external shelf DS212-C (12 disks)
- Data aggregates using shared disks from the internal and the external shelf
- No spare drives available in the internal shelf
Issue
- Both nodes reporting
raid.disk.replace.job.failed
messages, when trying to use the same internal spare disk:
[node_name-01: mgwd: raid.disk.replace.job.start:debug]: Starting disk replacement of disk 2.2.7 with disk 2.1.10.
[node_name-01: mgwd: raid.disk.replace.job.failed:debug]: Failed to replace disk 2.2.7 with disk 2.1.10. Reason: Reserve: Unable to reserve disks on target node node_name-02.
[node_name-01: mgwd: mgmtgwd.jobmgr.jobcomplete.failure:debug]: Job "Disk Replace:2.2.7" [id 55] (Disk Replace) completed unsuccessfully: Reserve: Unable to reserve disks on target node node_name-02 (1).
[node_name-01: mgwd: raid.disk.replace.job.start:debug]: Starting disk replacement of disk 2.2.7 with disk 2.1.10.
[node_name-02: config_thread: raid.rg.spares.low:debug]: /data_aggr-2/plex0/rg0
[node_name-02: config_thread: callhome.spares.low:debug]: Call home for SPARES_LOW: /data_aggr-2/plex0/rg0
[node_name-02: mgwd: raid.disk.replace.job.start:notice]: Starting disk replacement of disk 2.2.8 with disk 2.1.10.
[node_name-02: mgwd: raid.disk.replace.job.failed:error]: Failed to replace disk 2.2.8 with disk 2.1.10. Reason: Reserve: Unable to reserve disks on target node node_name-02.
[node_name-02: mgwd: mgmtgwd.jobmgr.jobcomplete.failure:info]: Job "Disk Replace:2.2.8" [id 54] (Disk Replace) completed unsuccessfully: Reserve: Unable to reserve disks on target node node_name-02 (1).
[node_name-02: mgwd: raid.disk.replace.job.start:notice]: Starting disk replacement of disk 2.2.8 with disk 2.1.10.
RAID-LM
logs continuously reporting in both nodes:
...: ksmfContainerDiskReplace run -disk 2.2.7 -allow-mixing true -reason layout_optimization -replacement 2.1.10: succeeded
...: 2.2.7 needs to be copied to an internal disk.
...: 2.2.1 needs to be copied to an internal disk.
...: 2.2.5 needs to be copied to an internal disk.
...: 2.2.9 needs to be copied to an internal disk.
...: 2.2.3 needs to be copied to an internal disk.
...: ksmfContainerDiskReplace run -disk 2.2.7 -allow-mixing true -reason layout_optimization -replacement 2.1.10: succeeded
...: ksmfContainerDiskReplace run -disk 2.2.8 -allow-mixing true -reason layout_optimization -replacement 2.1.10: succeeded
...: 2.2.8 needs to be copied to an internal disk.
...: 2.2.2 needs to be copied to an internal disk.
...: 2.2.0 needs to be copied to an internal disk.
...: 2.2.6 needs to be copied to an internal disk.
...: 2.2.4 needs to be copied to an internal disk.
...: 2.2.10 needs to be copied to an internal disk.
...: ksmfContainerDiskReplace run -disk 2.2.8 -allow-mixing true -reason layout_optimization -replacement 2.1.10: succeeded
- Both nodes trying to get the ownership of the same drive:
[node_name-01: sanown_io: diskown_changingOwner_1:debug]: params: {'diskname': '0c.01.10P1', 'serialno': '0AB12C3DEP001', 'oldownername': 'node_name-01', 'oldownerid': '111111111', 'oldhomeownerid': '111111111', 'olddrhomeownerid': '222222222', 'newownername': 'node_name-02', 'newownerid': '538302348', 'newhomeownerid': '538302348', 'newdrhomeownerid': '222222222', 'thread': 'svc_queue_thread', 'APIname': 'zapi_disk_sanown_assign'}
[node_name-01: sanown_io: diskown_changingOwner_1:debug]: params: {'diskname': '0c.01.10P1', 'serialno': '0AB12C3DEP001', 'oldownername': 'node_name-01', 'oldownerid': '111111111', 'oldhomeownerid': '111111111', 'olddrhomeownerid': '222222222', 'newownername': 'node_name-02', 'newownerid': '538302348', 'newhomeownerid': '538302348', 'newdrhomeownerid': '222222222', 'thread': 'svc_queue_thread', 'APIname': 'zapi_disk_sanown_assign'}
[node_name-02: sanown_io: diskown_changingOwner_1:notice]: params: {'newdrhomeownerid': '222222222', 'diskname': '0c.01.10P1', 'APIname': 'zapi_disk_sanown_assign', 'thread': 'svc_queue_thread', 'serialno': '0AB12C3DEP001', 'oldhomeownerid': '538302348', 'newownername': 'node_name-01', 'newownerid': '111111111', 'oldownerid': '538302348', 'oldownername': 'node_name-02', 'newhomeownerid': '111111111', 'olddrhomeownerid': '222222222'}
[node_name-02: sanown_io: diskown_changingOwner_1:notice]: params: {'newdrhomeownerid': '222222222', 'diskname': '0c.01.10P1', 'APIname': 'zapi_disk_sanown_assign', 'thread': 'svc_queue_thread', 'serialno': '0AB12C3DEP001', 'oldhomeownerid': '538302348', 'newownername': 'node_name-01', 'newownerid': '111111111', 'oldownerid': '538302348', 'oldownername': 'node_name-02', 'newhomeownerid': '111111111', 'olddrhomeownerid': '222222222'}
SYSCONFIG -R
shows different ownership of the affected disk parition from output ofstorage disk show -partition-ownership.
SYSCONFIG -R
shows:node_name-01
Pool0 spare disks
RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
Spare disks for block checksum
...
spare 0b.01.10P2 0b 1 11 SA:A 0 FSAS 7200 63849/130764288 63857/130780672 (fast zeroed)
spare 0b.01.10P1 0b 1 11 SA:A 0 FSAS 7200 3743930/7667569664 3743938/7667586048 (fast zeroed)-
Output of
storage disk show -partition-ownership
:
::> storage disk show -partition-ownership -disk 2.1.10
Disk Partition Home Owner Home ID Owner ID
-------- --------- ----------------- ----------------- ----------- -----------
2.1.10 Container node_name-01 node_name-01 5xxxxxxx1 5xxxxxxx1
Root node_name-01 node_name-01 5xxxxxxx1 5xxxxxxx1
Data node_name-02 node_name-02 5xxxxxxx2 5xxxxxxx2