ONTAP Select HA experiences random reboots due to network partitioning caused by a flaky Interconnect
Applies to
- NetApp ONTAP Select
- NetApp ONTAP Select Deploy
Issue
- This OTS HA experiences random reboots due to network partitioning caused by a flaky Interconnect.
- First
ha.netPartition.othercan be observed as early as the 20th Sept 2021
Mon Sep 20 18:20:14 +0200 [cluster01-02: nvram_sync: ha.netPartition.other:debug]: Network partition due to other error. Duration 803 msecs, takeover wait 0 msecs; error code 5; status: 0x201081; request id: 162.
- Problem gradually got more severe leading up to node cluster01-02 being stuck in a boot loop with "
Waiting for requisite number of local mailboxes.." -error - Node cluster01-01 was used to create a single node cluster for the customer to continue serving data whilst a new cluster is being set up and can be migrated to.
