CONTAP-182745: MGWD crashes due to memory shortage
Issue
Node experiences the following MGWD panics:
- mgwd becoming unresponsive on watchdog
- mgwd running out of swap space/memory.
Example panic messages:
Process mgwd unresponsive for 182 seconds (mgwd startup: "(55040)") in process nodewatchdog on release 9.13.0P4 (C)
OOM: out of swap space, process mgwd using 1346 MB in process pageout: dom0 on release 9.13.0P4 (C) (C)
Some additional observations:
- User may lose access to cluster during this time.
- Depending on which node is being upgraded at the time, user applications report offline for one node, while its HA partner is in partial giveback.
- The issue is mostly seen during an ANDU.
Cluster::> storage failover show
Takeover
Node Partner Possible State
-------------- -------------- -------- -------------------------------------
Cluster-01 Cluster-02 true Connected to Cluster-02. Waiting
for cluster applications to come
online on the local node. Offline
applications: mgmt, vifmgr, scsi
blade, clam.
Cluster-02 Cluster-01 true Connected to Cluster-01, Partial
giveback
2 entries were displayed.