What's cf_hwassist_missedKeepAlive timeout and tolerant period?
Applies to
- NetApp AFF and FAS systems
- ONTAP 9
Answer
cf_hwassist_missedKeepAlive
incident is recorded in EMS after 60 seconds, after a hw-assist packet sent- The hw-assist packets are sent by UDP every 180 seconds:
- There is no retransmission of packets if the packets are sent and not received
- If a UDP packet is dropped, blocked, wedged, redirected, etc.. and a node doesn't receive it, the node will just wait for 180 seconds, until the next packet is sent
- So if a
cf_hwassist_recvKeepAlive
event is showing within 120 seconds aftercf_hwassist_missedKeepAlive
, it can be ignored safely
[Nodename-02: cf_hwassist: cf.hwassist.missedKeepAlive:debug]: HW-assisted takeover missing keep-alive messages from HA partner (Nodename-01).
[Nodename-02: cf_hwassist: cf.hwassist.recvKeepAlive:debug]: hw_assist: Received hw_assist KeepAlive alert from partner(Nodename-01).
Additional Information
- For the cause of
cf_hwassist_missedKeepAlive
, because hw-assist configured and transmit with IP and port on e0M which goes through customer network environment, nearly every instance of this type of failure, is due to network dropped packets.- Depending on platform, default hwassist port will be 4444 or 162 (Hwassist IP address is set to 192.0.2.84 and 192.0.2.85 - NetApp Knowledge Base)
- Check the hwassist-health-check-interval by command
aff200-2n-dal-1::> storage failover show -fields hwassist,hwassist-partner-ip,hwassist-partner-port,hwassist-health-check-interval,hwassist-retry-count,hwassist-status
node hwassist hwassist-partner-ip hwassist-partner-port hwassist-health-check-interval hwassist-retry-count hwassist-status
------------- -------- ------------------- --------------------- ------------------------------ -------------------- ---------------
aff200-dal-1a true 10.128.227.184 4444 180 2 active
aff200-dal-1b true 10.128.227.183 4444 180 2 active
2 entries were displayed.