Skip to main content
NetApp Knowledge Base

StorageGRID storage node DECOM stuck due to site not having enough destination nodes for old EC profile

Views:
191
Visibility:
Public
Votes:
0
Category:
storagegrid
Specialty:
sgrid
Last Updated:

Applies to

StorageGRID Versions 11.5.0.8 and 11.6.0.7 and earlier.

Issue

Customer is unable to complete decommission of storage node following EC profile change.

EC job decommission error reports on the EC leader (Node_Name). Enabled ECJM level 1 on leader and captured a log bundle. Found below messages ("Selecting destination for EC group failed after 5 retries.") which suggests the decommission is pausing because old EC profile cannot find enough destination in the storage pool as decommission of node Node_Name will leave the pool with only 4 nodes.

Dec 9 19:29:01 Node_Name ADE: |21426716 1820442787 ECJM CSRT 2022-12-09T19:29:01.253077| NOTICE 0376 ECJM: EcgDecomJob: '11696086893380218698' ECG: 'DB1B050F-1755-4F86-995C-81085336DC19' VCS: 'DB349EB5-32DE-40C6-BB52-DA99AEF0A607': Selecting possible destination for affectedBytes: 0

...
Dec 9 19:29:01 Node_Name ADE: |21426716 1820442787 ECJM EPRP 2022-12-09T19:29:01.253925| ERROR 1054 PROC: Exception: /build/src/modules/ErasureCoding/EC_JobManager_Module/EcgDecommissionJob.cc(368): Throw in function void erasurecoding::EcgDecommissionJob::selectDestinationNode()#012Dynamic exception type: boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<std::runtime_error> >#012std::exception::what: ENFORCE failed: !"Selecting destination for EC group failed after 5 retries."#012

Dec 9 19:29:06 Node_Name ADE: |21426716 1820442641 ECJM CSRT 2022-12-09T19:29:06.397947| ERROR 0112 ECJM: Exception caught during decommissioning ENFORCE failed: 'SUCS' == *jobResult.

Dec 9 19:29:06 Node_Name ADE: |21426716 1820442641 ECJM CSRT 2022-12-09T19:29:06.398057| ERROR 1054 PROC: Exception: /build/src/modules/ErasureCoding/EC_JobManager_Module/NodeDecommissionJob.cc(447): Throw in function CXD_AtomContainer erasurecoding::NodeDecommissionJob::waitForJobCompletions()#012Dynamic exception type: boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<std::runtime_error> >#012std::exception::what: ENFORCE failed: 'SUCS' == *jobResult#012

 

 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.