CONTAP-297050: mgwd process or entire node may restart unexpecedly due to stuck threads in servprocd
Issue
before the unexpected restart, SP-MGMT-MLOG and/or servprocd.log frequently logs messages similar to:
[kern_servprocd:info:XXXX] 0xXXXXXXXXX: 0: ERR: Servprocd::BmcManager: update_bmc_users: Failed to refresh users on the BMC : User configuration error: Failed to send userlist to BMC
[kern_servprocd:info:XXXX] 0xXXXXXXXXX: 0: ERR: Servprocd::sp_mgmt_user: cpmi_bmc_user_sync: Failed to send the user file to the BMC (-1), retrying...
[kern_servprocd:info:XXXX] 0xXXXXXXXXX: 0: ERR: Servprocd::sp_mgmt_user: cpmi_bmc_user_sync: Failed to send the user file to the BMC (-1)
[kern_servprocd:info:XXXX] 0xXXXXXXXXX: 0: ERR: Servprocd::BmcManager: update_bmc_users: Failed to refresh users on the BMC : User configuration error: Failed to send userlist to BMC
[kern_servprocd:info:XXXX] 0xXXXXXXXXX: 0: ERR: Servprocd::sp_mgmt_user: cpmi_bmc_user_sync: Failed to send the user file to the BMC (-1), retrying.XX
Possible related symptoms:
- ONTAP System Manager might be unresponsive
- Internal process "mgwd" may restart unexpectedly with panic-string:
ucore.panicString: 'mgwd: assertion (_rc == 0) at src/cmr_zr_mixer.cc:XXX failed, raising SIGABRT(6) at RIP 0x81f06ad7a (pid XXXX, uid 0, timestamp XXXXXXXXXXX)' - ONTAP may restart unexpectedly with panic-string:
PANIC: swap_pager_swapoff_object: read from swap failed: 4 in process init on release 9.XX.XPX (C) on Mon Jan 01 00:01:00 EDT 2024