Skip to main content
NetApp Knowledge Base

EC-rebalance fails with Service unavailable. Error contacting EC Job Manager

Views:
34
Visibility:
Public
Votes:
0
Category:
storagegrid
Specialty:
sgrid
Last Updated:

Applies to

NetApp StorageGRID

Issue

When trying to perform a EC-rebalance, it reports Service unavailable. Error contacting EC Job Manager. One or multiple nodes may go in an unknown state from the GUI, but do not actually report any errors. Nodes will go back in online state on their own eventually.

Bycast.log from EC leader reports:

Feb  4 12:29:51 <nodename> ADE: |21835099 0110740693 ECJM ???? 2025-02-04T12:29:51.840953| NOTICE   0040 54375572ef20a272 ECJM: Starting job 1581323391038607979: Site Rebalance - Group ID 10.
Feb  4 12:29:51 <nodename> ADE: |21835099 0110740693 ECJM ^RDY 2025-02-04T12:29:51.841797| NOTICE   0106 54375572ef20a272 ECJM: Job status of job 1581323391038607979 is JOBSTATUS_IN_PROGRESS
Feb  4 12:29:51 <nodename> ADE: |21835099 0110740693 ECJM ^RDY 2025-02-04T12:29:51.841824| NOTICE   0112 54375572ef20a272 ECJM: Resuming job 1581323391038607979
Feb  4 12:29:51 <nodename> ADE: |21835099 0110740693 ECJM ^RDY 2025-02-04T12:29:51.841833| NOTICE   0219 54375572ef20a272 ECJM: 1581323391038607979(rebalance 10): Resuming
Feb  4 12:29:51 <nodename> ADE: |21835099 0110740693 ECJM %MDW 2025-02-04T12:29:51.850301| WARNING  1005 54375572ef20a272 ECJM: Volume Info request timed out.
Feb  4 12:29:51 <nodename> ADE: |21835099 0110740693 ECJM %MDW 2025-02-04T12:29:51.850336| NOTICE   0994 54375572ef20a272 ECJM: 1581323391038607979(rebalance 10): Cannot determine if there are Offline Volumes in the Grid.
Feb  4 12:29:51 <nodename> ADE: |21835099 0110740693 ECJM %MDW 2025-02-04T12:29:51.850347| NOTICE   1271 54375572ef20a272 ECJM: 1581323391038607979(rebalance 10): saving state. status: JOBSTATUS_PAUSED
Feb  4 12:29:51 <nodename> ADE: |21835099 0110740693 ECJM ^RDY 2025-02-04T12:29:51.852021| NOTICE   1057 54375572ef20a272 ECJM: 1581323391038607979(rebalance 10): Stopping child jobs.
Feb  4 12:29:51 <nodename> ADE: |21835099 0110740693 ECJM ^RDY 2025-02-04T12:29:51.852132| WARNING  0062 54375572ef20a272 ECJM: Caught exception 'Failed to ensure all volumes are online. pausing job...' when running job 1581323391038607979: Site Rebalance - Group ID 10.
Feb  4 12:29:51 <nodename> ADE: |21835099 0110740691 ECJM _DON 2025-02-04T12:29:51.852204| NOTICE   0934 54375572ef20a272 ECJM: Received job completion message.
Feb  4 12:29:51 <nodename> ADE: |21835099 0110740691 ECJM _DON 2025-02-04T12:29:51.852230| NOTICE   0940 54375572ef20a272 ECJM: Job 1581323391038607979 completed with result GERR.
Feb  4 12:29:51 <nodename> ADE: |21835099 0110740693 ECJM ^RDY 2025-02-04T12:29:51.852232| ERROR    1081 54375572ef20a272 PROC: Exception: Dynamic exception type: std::runtime_error#012std::exception::what: Failed to ensure all volumes are online. pausing job...#012

 

 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.