Azure CVO reboots due to long running mlx5dump process
Applies to
- Cloud Volumes ONTAP (CVO)
- Microsoft Azure
Issue
- The EMS logs show that the mlx5dump process is running for a longer duration:
Thu Jun 15 06:38:25 -0400 [cluster1-01: sched_monitor: mgr.stack.longrun.proc:notice]: Long running process: mlx5dump
Thu Jun 15 06:43:28 -0400 [cluster1-01: sched_monitor: sk.hog.runtime:notice]: Process mlx5dump ran for 15569 milliseconds
- The above process triggers a node reboot:
Thu Jun 15 06:43:36 -0400 [cluster1-01: pha_main000: kern.shutdown.initiator:debug]: SK reboot was initiated by "maytag.ko::fm_handleReserved+763".
Thu Jun 15 06:59:16 -0400 [cluster1-01: sfo_status: callhome.reboot.giveback:notice]: Call home for REBOOT (after giveback)