An error message "Cf.hwassist.missedKeepAlive" appears after SP Firmware update failure
Applies to
- AFF A300
- ONTAP 9
- Upgrade Service Processor (SP) 5.6P2 to 5.8
Issue
- Auto-update to SP 5.8 version fails at 5%.
Example:
[Sat Jun 20 01:23:06.047 2020] netapp::*> system service-processor image update -node *2 -package 308-03991_A0-FAS26X0-FAS8200_5.8_SP_FW.zip
[Sat Jun 20 01:23:24.218 2020]
[Sat Jun 20 01:23:24.218 2020] Note: Firmware update will need to reboot the SP on completion. If your console
[Sat Jun 20 01:23:24.234 2020] connection is through the SP, it will be disconnected
[Sat Jun 20 01:23:24.246 2020] Do you want to proceed with the firmware update ? {y|n}: y
[Sat Jun 20 01:23:34.613 2020] SP firmware update has been successfully scheduled.
[Sat Jun 20 01:23:34.649 2020] 1 entry was acted on.
[Sat Jun 20 01:23:34.665 2020]
[Sat Jun 20 01:23:34.665 2020] netapp::*> system service-processor image update-progress show
[Sat Jun 20 01:23:38.711 2020] In Percent
[Sat Jun 20 01:23:38.751 2020] Node Progress Start Time Done End Time
[Sat Jun 20 01:23:38.751 2020] ---------------- -------- ------------------- ------- -------------------
[Sat Jun 20 01:23:38.755 2020] netapp-01 no 6/20/2020 01:20:53 5 6/20/2020 01:23:32
[Sat Jun 20 01:23:38.755 2020] netapp-02 yes 6/20/2020 01:24:34 1 -
[Sat Jun 20 01:23:38.755 2020] 2 entries were displayed.
system service-processor image update-progress show
[Sat Jun 20 01:41:58.776 2020] netapp::*> system service-processor image update-progress show
[Sat Jun 20 01:42:09.978 2020] In Percent
[Sat Jun 20 01:42:09.978 2020] Node Progress Start Time Done End Time
[Sat Jun 20 01:42:10.002 2020] ---------------- -------- ------------------- ------- -------------------
[Sat Jun 20 01:42:10.002 2020] netapp-01 no 6/20/2020 01:20:53 5 6/20/2020 01:23:32
[Sat Jun 20 01:42:10.030 2020] netapp-02 no 6/20/2020 01:24:34 5 6/20/2020 01:27:13
[Sat Jun 20 01:42:10.038 2020] 2 entries were displayed.
- "
event log show
" shows the following event.
Example:
6/20/2020 01:23:32 netapp-01 DEBUG sp.servprocd.upd.unexpt.evts: reason="Unable to transfer SP firmware image using network interface"
- "
hwassist stats show
" indicates the system didn't receive KeepAlive after the update failure.
Example:
[Sat Jun 20 01:13:51.163 2020] netapp::> hwassist stats show
[Sat Jun 20 01:13:53.185 2020] (storage failover hwassist stats show)
[Sat Jun 20 01:13:53.185 2020]
[Sat Jun 20 01:13:53.201 2020] Node: netapp-01
[Sat Jun 20 01:13:53.201 2020] Local Enabled: true
[Sat Jun 20 01:13:53.201 2020] Partner Inactive Reason: -
[Sat Jun 20 01:13:53.217 2020]
[Sat Jun 20 01:13:53.217 2020] Alert Type Alert Event Count Takeover Last Received
[Sat Jun 20 01:13:53.217 2020] ------------ -------------------- ------ --------- --------------------
[Sat Jun 20 01:13:53.237 2020] system_down power_loss 0 Yes ---
[Sat Jun 20 01:13:53.256 2020] system_down l2_watchdog_reset 0 Yes ---
[Sat Jun 20 01:13:53.256 2020] system_down power_off_via_rlm 0 Yes ---
[Sat Jun 20 01:13:53.273 2020] system_down power_cycle_via_rlm 0 Yes ---
[Sat Jun 20 01:13:53.273 2020] system_down reset_via_rlm 0 Yes ---
[Sat Jun 20 01:13:53.292 2020] system_down power_off_via_sp 0 Yes ---
[Sat Jun 20 01:13:53.292 2020] system_down power_cycle_via_sp 0 Yes ---
[Sat Jun 20 01:13:53.312 2020] system_down reset_via_sp 0 Yes ---
[Sat Jun 20 01:13:53.312 2020] system_down post_error 0 No ---
[Sat Jun 20 01:13:53.328 2020] system_down abnormal_reboot 0 No ---
[Sat Jun 20 01:13:53.328 2020] system_down loss_of_heartbeat 0 No ---
[Sat Jun 20 01:13:53.344 2020] keep_alive periodic_message 130800 No Fri Jun 19 22:59:52 JST 2020
[Sat Jun 20 01:13:53.364 2020] test test 0 No ---
[Sat Jun 20 01:13:53.364 2020] ID_mismatch --- 0 --- ---
[Sat Jun 20 01:13:53.384 2020] Key_mismatch --- 0 --- ---
[Sat Jun 20 01:13:53.384 2020] Unknown --- 0 --- ---
[Sat Jun 20 01:13:53.396 2020] alerts_throttled 0 --- ---
[Sat Jun 20 01:13:53.396 2020]
[Sat Jun 20 01:13:53.396 2020] Node: netapp-02
[Sat Jun 20 01:13:53.400 2020] Local Enabled: true
[Sat Jun 20 01:13:53.400 2020] Partner Inactive Reason: -
[Sat Jun 20 01:13:53.404 2020]
[Sat Jun 20 01:13:53.404 2020] Alert Type Alert Event Count Takeover Last Received
[Sat Jun 20 01:13:53.416 2020] ------------ -------------------- ------ --------- --------------------
[Sat Jun 20 01:13:53.428 2020] system_down power_loss 0 Yes ---
[Sat Jun 20 01:13:53.436 2020] system_down l2_watchdog_reset 0 Yes ---
[Sat Jun 20 01:13:53.448 2020] system_down power_off_via_rlm 0 Yes ---
[Sat Jun 20 01:13:53.456 2020] system_down power_cycle_via_rlm 0 Yes ---
[Sat Jun 20 01:13:53.468 2020] system_down reset_via_rlm 0 Yes ---
[Sat Jun 20 01:13:53.476 2020] system_down power_off_via_sp 0 Yes ---
[Sat Jun 20 01:13:53.484 2020] system_down power_cycle_via_sp 0 Yes ---
[Sat Jun 20 01:13:53.495 2020] system_down reset_via_sp 0 Yes ---
[Sat Jun 20 01:13:53.503 2020] system_down post_error 0 No ---
[Sat Jun 20 01:13:53.515 2020] system_down abnormal_reboot 0 No ---
[Sat Jun 20 01:13:53.529 2020] system_down loss_of_heartbeat 0 No ---
[Sat Jun 20 01:13:53.533 2020] keep_alive periodic_message 130995 No Sat Jun 20 01:14:49 JST 2020
[Sat Jun 20 01:13:53.547 2020] test test 0 No ---
[Sat Jun 20 01:13:53.559 2020] ID_mismatch --- 0 --- ---
[Sat Jun 20 01:13:53.567 2020] Key_mismatch --- 0 --- ---
[Sat Jun 20 01:13:53.575 2020] Unknown --- 0 --- ---
[Sat Jun 20 01:13:53.587 2020] alerts_throttled 0 --- ---
[Sat Jun 20 01:13:53.599 2020] 2 entries were displayed.
- "
system health alert show
" has SP config error like the following.
Example:
[Sat Jun 20 03:01:44.937 2020] netapp::> system health alert show
[Sat Jun 20 03:01:56.184 2020] Node: netapp-01
[Sat Jun 20 03:01:56.184 2020] Resource: SP Config
[Sat Jun 20 03:01:56.200 2020] Severity: Major
[Sat Jun 20 03:01:56.200 2020] Indication Time: Sat Jun 20 02:56:30 2020
[Sat Jun 20 03:01:56.200 2020] Suppress: false
[Sat Jun 20 03:01:56.216 2020] Acknowledge: false
[Sat Jun 20 03:01:56.216 2020] Probable Cause: Service Processor is not properly configured.
[Sat Jun 20 03:01:56.232 2020] Possible Effect: You might not be able to use the Service Processor to
[Sat Jun 20 03:01:56.232 2020] remotely access, monitor, and troubleshoot your
[Sat Jun 20 03:01:56.256 2020] storage system.
[Sat Jun 20 03:01:56.276 2020] Corrective Actions: 1. Use the "system service-processor network modify" command to configure the network interface of the Service Processor.
[Sat Jun 20 03:01:56.292 2020] 2. Use the "system service-processor image modify -node netapp-01 -autoupdate true" to configure AutoUpdate feature of the Service Processor.
[Sat Jun 20 03:01:56.310 2020] 3. Contact the technical support if the alert persists.
- After setting SP again, SP is temporarily recovered.
Example:
[Sat Jun 20 03:06:35.524 2020] netapp::> system service-processor network modify -node netapp-02 -enable true -address-family IPv4 -ip-address 192.168.132.214 -netmask 255.255.255.0 -gateway 192.168.132.1
[Sat Jun 20 03:07:24.333 2020] netapp::> hwassist show
[Sat Jun 20 03:07:26.325 2020] (storage failover hwassist show)
[Sat Jun 20 03:07:26.325 2020] Node
[Sat Jun 20 03:07:26.325 2020] -----------------
[Sat Jun 20 03:07:26.337 2020] netapp-01
[Sat Jun 20 03:07:26.337 2020] Partner: netapp-02
[Sat Jun 20 03:07:26.357 2020] Hwassist Enabled: true
[Sat Jun 20 03:07:26.357 2020] Hwassist IP: 192.168.132.211
[Sat Jun 20 03:07:26.378 2020] Hwassist Port: 4444
[Sat Jun 20 03:07:26.378 2020] Monitor Status: active
[Sat Jun 20 03:07:26.394 2020] Inactive Reason: -
[Sat Jun 20 03:07:26.394 2020] Corrective Action: -
[Sat Jun 20 03:07:26.394 2020] Keep-Alive Status: healthy
[Sat Jun 20 03:07:26.414 2020] netapp-02
[Sat Jun 20 03:07:26.414 2020] Partner: netapp-01
[Sat Jun 20 03:07:26.414 2020] Hwassist Enabled: true
[Sat Jun 20 03:07:26.430 2020] Hwassist IP: 192.168.132.212
[Sat Jun 20 03:07:26.430 2020] Hwassist Port: 4444
[Sat Jun 20 03:07:26.442 2020] Monitor Status: active
[Sat Jun 20 03:07:26.442 2020] Inactive Reason: -
[Sat Jun 20 03:07:26.462 2020] Corrective Action: -
[Sat Jun 20 03:07:26.462 2020] Keep-Alive Status: healthy
[Sat Jun 20 03:07:26.474 2020] 2 entries were displayed.
- After a while, the error occurs again.
Example:
[Sat Jun 20 03:27:07.648 2020] netapp::> hwassist show
[Sat Jun 20 03:27:09.822 2020] (storage failover hwassist show)
[Sat Jun 20 03:27:09.826 2020] Node
[Sat Jun 20 03:27:09.830 2020] -----------------
[Sat Jun 20 03:27:09.830 2020] netapp-01
[Sat Jun 20 03:27:09.834 2020] Partner: netapp-02
[Sat Jun 20 03:27:09.842 2020] Hwassist Enabled: true
[Sat Jun 20 03:27:09.850 2020] Hwassist IP: 192.168.132.211
[Sat Jun 20 03:27:09.860 2020] Hwassist Port: 4444
[Sat Jun 20 03:27:09.868 2020] Monitor Status: active
[Sat Jun 20 03:27:09.876 2020] Inactive Reason: -
[Sat Jun 20 03:27:09.884 2020] Corrective Action: -
[Sat Jun 20 03:27:09.892 2020] Keep-Alive Status: Error: did not receive hwassist keep alive alerts from partner.
[Sat Jun 20 03:27:09.908 2020] netapp-02
[Sat Jun 20 03:27:09.912 2020] Partner: netapp-01
[Sat Jun 20 03:27:09.920 2020] Hwassist Enabled: true
[Sat Jun 20 03:27:09.928 2020] Hwassist IP: 192.168.132.212
[Sat Jun 20 03:27:09.940 2020] Hwassist Port: 4444
[Sat Jun 20 03:27:09.946 2020] Monitor Status: active
[Sat Jun 20 03:27:09.950 2020] Inactive Reason: -
[Sat Jun 20 03:27:09.960 2020] Corrective Action: -
[Sat Jun 20 03:27:09.968 2020] Keep-Alive Status: healthy
[Sat Jun 20 03:27:09.976 2020] 2 entries were displayed.
[Sat Jun 20 03:27:09.980 2020]
- SP hungs during SP update. "
SP-LATEST CONFIGURATION
" in AutoSupport shows the following output.
Example:
Service Processor Status: Online
Firmware Version: 5.6P2
Mgmt MAC Address: 00:A0:XX:XX:XX:XX
Ethernet Link: down, full duplex, auto-neg complete
Using DHCP: no
IPv4 configuration:
IP Address: unknown
Netmask: unknown
Gateway: unknown
IPv6 configuration: Disabled