CONTAP-182745: mgwd crashes during ANDU from memory shortage
Issue
Node during an ONTAP upgrade panics with either:
- mgwd becoming unresponsive on watchdog
- mgwd running out of swap space/memory.
Example panic messages:
[Process mgwd unresponsive for 182 seconds (mgwd startup: "(55040)") in process nodewatchdog on release 9.13.0P4 (C)]
[OOM: out of swap space, process mgwd using 1346 MB in process pageout: dom0 on release 9.13.0P4 (C) (C)]
Some additional observations:
- User may lose access to cluster during this time.
- Depending on which node is being upgraded at the time, user applications report offline for one node, while its HA partner is in partial giveback.
[Cluster::> storage failover show]
[ Takeover]
[Node Partner Possible State Description]
[-------------- -------------- -------- -------------------------------------]
[Cluster-01]
[ Cluster-02 true Connected to]
[ Cluster-02. Waiting]
[ for cluster applications to come]
[ online on the local node. Offline]
[ applications: mgmt, vifmgr, scsi]
[ blade, clam.]
[Cluster-02]
[ Cluster-01 true Connected to]
[ Cluster-01, Partial giveback ]
[2 entries were displayed.]