Takeover initiated after no heartbeat was detected from the partner node
Applies to
Issue
- Node experiences unexpected takeover
[UC_PRD_A400-01: cf_main: cf.fsm.takeover.noHeartbeat:alert]: Failover monitor: Takeover initiated after no heartbeat was detected from the partner node.
- Node seems to be stuck in boot loop during Power On Self Test
PEI start.
CPU PEI initialization.
Wait BMC self-test result.
BMC self-test: OK.
UPI initialization.
CPU initialization.
Running full memory initialization.
CPU reset.
[m[2J[HBIOS Version: 16.8
PEI start.
CPU PEI initialization.
Wait BMC self-test result.
BMC self-test: OK.
UPI initialization.
CPU initialization.
Running full memory initialization.
[m[2J[HBIOS Version: 16.8
PEI start.
CPU PEI initialization.
Wait BMC self-test result.
BMC self-test: OK.
UPI initialization.
CPU initialization.
Running full memory initialization.
CPU reset.
[m[2J[HBIOS Version: 16.8
PEI start.
CPU PEI initialization.
Wait BMC self-test result.
BMC self-test: OK.
UPI initialization.
CPU initialization.
Running full memory initialization.
[m[2J[HBIOS Version: 1
- Some of the Power On Self Test in boot loop finish with missing DIMM messages
No firmware was updated, so no need to reboot
DIMM missing in slot DIMM-2
DIMM missing in slot DIMM-13
DIMM missing in slot DIMM-15
System DIMM configuration is not supported by AFF-A400
Halting...
- BMC logs indicate i2c bus busy
May 6 10:50:55 kernel: [16899603.850000] I2C5: (5984897681) Master-Xfer failed. Bus busy count 1, Time (in seconds) : 0 -
May 6 10:50:56 kernel: [16899604.000000] I2C5: (5984897696) Master-Xfer failed. Bus busy count 2, Time (in seconds) : 1 -
May 6 10:50:57 kernel: [16899605.810000] I2C5: (5984897877) Master-Xfer failed. Bus busy count 3, Time (in seconds) : 2 -
May 6 10:50:57 kernel: [16899605.930000] I2C5: (5984897889) Master-Xfer failed. Bus busy count 4, Time (in seconds) : 2 -
May 6 10:51:00 kernel: [16899608.850000] I2C5: (5984898181) Master-Xfer failed. Continuous Bus busy, Normal reset of i2c bus. -
May 6 10:51:00 kernel: [16899608.850000] I2C5: (5984898181) Master-Xfer failed. Bus busy count 0, Time (in seconds) : 16899308 -
- BMC logs show different errors related to sensors
628 | 05/06/2024 | 19:53:36 | Voltage PSU1_VIN | Lower Non-critical going low | Reading 1.10 < Threshold 93.50 Volts
629 | 05/06/2024 | 19:53:36 | OEM PSU1_PIN | Lower Non-critical going low | Reading 7.10 < Threshold 14.20 Watts
62a | 05/06/2024 | 19:53:41 | Voltage PSU2_VIN | Lower Non-critical going low | Reading 1.10 < Threshold 93.50 Volts
62b | 05/06/2024 | 19:53:41 | OEM PSU2_PIN | Lower Non-critical going low | Reading 7.10 < Threshold 14.20 Watts
62c | 05/06/2024 | 19:53:42 | Voltage PSU1_VIN | Lower Critical going low | Reading 0 < Threshold 90.20 Volts
62d | 05/06/2024 | 19:53:42 | Current PSU1_IOUT | Lower Non-critical going low | Reading 0 < Threshold 0 Amps
62e | 05/06/2024 | 19:53:42 | OEM PSU1_PIN | Lower Critical going low | Reading 7.10 < Threshold 7.10 Watts
62f | 05/06/2024 | 19:53:42 | OEM PSU1_POUT | Lower Non-critical going low | Reading 0 < Threshold 14.20 Watts
630 | 05/06/2024 | 19:53:44 | Voltage PSU1_VOUT | Lower Non-critical going low | Reading 0 < Threshold 11.44 Volts