StorageGRID node is less utilized and reports InternalError 500 as its unable to communicate with other grid nodes
Applies to
StorageGRID 11.7.0
Issue
- StorageGRID node is less utilized in the GRID and reports
InternalError 500due to communiucation issue with ADC nodes.
Jul 2 00:32:17 <Nodename> ADE: |12038591 0716178440 S3RQ ^RDY 2024-07-02T00:32:17.354063| NOTICE 0138 3fd527aa22bee2b8 S3RQ: S3 error response: RequestId=1719880277344899, TraceId=3fd527aa22bee2b8, Resource=/<Object_path>, HTTP Method=HEAD, HTTP Status Code=500, X-Forwarded-For: '<>', ErrorMsg=InternalError, ErrorType=Internal, CustomErrorMessage={None}, Details={Failed to query any account server (3 candidates); last error: Failed to connect to Account Server at <ADC_NODE_IP>: Account Server at <ADC_NODE_IP> responded with 0 ().}-
Can see network isolation events from the remaining nodes when they attempt to communicate with the affected one:
/var/local/log/dynip.log
[2024-07-04T03:26:47.152] Dummy-954194 - WARNING -- : heartbeat to <grid_node>/<Grid_IP> failed: <urlopen error timed out>
- The Storage node cannot connect to any GRID node on any GRID port and the IP of the affected node is missing from the element
grid_ipsinnft ruleset.- Verified by running command:
nft list ruleset
- Verified by running command:
