CONTAP-545577: Operation Timedout error on Faricpool tiering to ONTAP S3
Issue
This issue involves FabricPool tiering to an intra-cluster ONTAP S3 bucket where the Operation Timedout / object.store.unavailable EMS is triggered by failed S3 HEAD health-check requests (used as an aliveness probe during idle tiering periods), not necessarily by failures during active data tiering (GET/PUT). The object store is marked unavailable after five consecutive HEAD failures, so the EMS can fire even when no tiering workload is active.
In this case, the HEAD failures were ultimately caused by a TCP connection teardown/reconnect edge case (server FIN on idle connections + client RST/reconnect) interacting with ipfw dynamic rule handling and network-device behavior, leading to dropped packets/timeouts despite the S3 service itself not being down.
Following messages are seen in the EMS logs:
Tue Aug 12 04:09:52 +0800 [cluster1-node2: OscLowPriThreadPool_7: object.store.unavailable:EMERGENCY]:
Unable to connect to the object store "BackupS3Store" from node a1b2c3d4-1234-5678-9abc-def012345678. Reason: Operation Timedout.
...
Tue Aug 12 04:09:52 +0800 [cluster1-node2: OscLowPriThreadPool_2: object.store.available:notice]:
Able to connect to the object store "BackupS3Store" from node a1b2c3d4-1234-5678-9abc-def012345678.
