18 minutes delay between object-store going unavailable and available

Last updated

Apr 12, 2022
Save as PDF
Share
1. Share
2. Tweet
3. Share

Views:: 385

Visibility:: Public

Votes:: 8

Category:: storagegrid

Specialty:: sgrid

Last Updated:: 4/12/2022, 7:50:53 AM

Applies to

Fabricpool
StorageGRID
Virtual gateway nodes
HA-groups

Issue

Ontap EMS logs will show close to 18 minutes delay between Object Store going unavailable and back to available

Sat Apr 10 09:27:07 +0200 [mc-netapp-a01: OscLowPriThreadPool: object.store.unavailable:EMERGENCY]: Unable to connect to the object store "fp-mc-netapp-a-primary" from node 727010ba-db24-11ea-b4cf-d039ea1ef264. Reason: Connection unavailable.

Sat Apr 10 09:44:55 +0200 [mc-netapp-a01: OscLowPriThreadPool: object.store.available:notice]: Able to connect to the object store "fp-mc-netapp-a-primary" from node 727010ba-db24-11ea-b4cf-d039ea1ef264.

Tue Apr 13 20:12:24 +0200 [mc-netapp-a01: OscHighPriThreadPool: object.store.unavailable:EMERGENCY]: Unable to connect to the object store "fp-mc-netapp-a-primary" from node 727010ba-db24-11ea-b4cf-d039ea1ef264. Reason: Connection unavailable.

Tue Apr 13 20:30:10 +0200 [mc-netapp-a01: OscLowPriThreadPool: object.store.available:notice]: Able to connect to the object store "fp-mc-netapp-a-primary" from node 727010ba-db24-11ea-b4cf-d039ea1ef264.

Object Store goes unavailable after 1000 failing S3 ops or after 2 minutes not responding, whichever comes first.

In the latter situation the total time of Object Store being unavailable is 20 minutes (2 minutes for the first Unavailable message + 18 minutes later for the Available message)

Usual behavior is that a HEAD call is made after going unavailable, which unless there is a more serious issue will be successful and the Object Store will be marked Available again with 1-2 seconds.