CONTAP-229226: Slow response to API calls on the takeover/surviving node while the partner node was stuck and failed to reboot
Issue
- One node (Node 1) took over the other node (Node 2)
- Node 2 was not accessible through SSH or REST API calls though node 1 saw it was still up
- Node 1, the takeover/surviving node, continued to serve user data
- Node 1 was slow to SSH and REST API calls, resulting in health check timeout/failures by FSx Control Plane
- Node 2 had to be restarted/NMI’ed from AWS Console or CLI to recover