StorageGRID decommission stuck at decommissioning erasure coded data due to old ec profile
Applies to
- StorageGRID 11.4
- Erasure coding(EC) 4+1
- Storage Pool with 5 nodes
- EC profile 4+1 using this Storage Pool
Issue
- Decommission of two storage nodes in a site with 5 nodes failing with the
bycast.log
of EC Leader node reporting:
ECPL SQRT 2021-05-19T00:24:11.119516| NOTICE 0360 ECPL: Found 0 chunk services in 'any' site.
ECPL SQRT 2021-05-19T00:24:11.119592| WARNING 0087 ECPL: Placement request failed due to: Unable to find alternative for 1 out of 1 chunk service(s).
ECJM EPRP 2021-05-19T00:24:11.119633| WARNING 0395 ECJM: EcgDecomJob: '1224304969118050274' ECG: 'C3B9DBC0-E6E5-42FE-984D-68A003C7D142' VCS: '13092439-3B02-4A53-BF5C-A53393562B2C': Unable to select destination nodes: 'FAIL'. Retrying
ECJM EPRP 2021-05-19T00:24:11.119686| WARNING 0059 ECJM: Caught exception 'ENFORCE failed: !"Selecting destination for EC group failed after 5 retries."' when running job xxxxxxxxxxxxxx
ECJM EPRP 2021-05-19T00:24:11.119736| ERROR 1049 PROC: Exception: ../modules/ErasureCoding/EC_JobManager_Module/EcgDecommissionJob.cc(373): Throw in function void erasurecoding::EcgDecommissionJob::selectDestinationNode()#012Dynamic exception type: boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<std::runtime_error> >#012std::exception::what: ENFORCE failed: !"Selecting destination for EC group failed after 5 retries."#012
- CMN task in the GUI will report:
The Storage Node decommission cannot complete. Ensure all nodes are running and connected. Then, review all Erasure Coding
profiles in use to ensure they can still be fulfilled. As required, update the ILM policy and try again. For details about
this error, check bycast.log