How long is the expected runtime for jobs with cluster or node affinity?
Applies to
- Data ONTAP 8
- ONTAP 9
Answer
- Jobs are asynchronous tasks and typically long-running volume operations, such as copying, moving, or mirroring data.
- Jobs are placed in a job queue and are run when resources are available.
- Some jobs want to start and run to completion without interruption.
- Other jobs initiate some external operation and wait for that operation to complete.
- To avoid wasting a thread, in this latter case doing nothing but waiting, the job can relinquish the thread but tell the Job Manager that there is more work to do.
- The Job Manager periodically reschedules the job to allow it to check on completion of this external operation.
- Some of the default jobs that we expect to see on any NetApp appliance:
::> job show -fields id,name,affinity
id vserver name affinity
-- ---------------- ---------------------- --------
1 fas2554-2n-ams-1 SnapMirror Service Job Node
3 fas2554-2n-ams-1 Certificate Expiry Check Cluster
4 fas2554-2n-ams-1 SP Certificate Expiry Check Job Cluster
5 fas2554-2n-ams-1 SP log collection job Cluster
6 fas2554-2n-ams-1 CLUSTER BACKUP AUTO 8hour Cluster
7 fas2554-2n-ams-1 CLUSTER BACKUP AUTO daily Cluster
8 fas2554-2n-ams-1 CLUSTER BACKUP AUTO weekly Cluster
11 fas2554-2n-ams-1 Auto Balance Aggregate Analyzer Cluster
13 fas2554-2n-ams-1 SnapMirror Service Job Node
14 fas2554-2n-ams-1 Network Recovery Job - 5min Cluster
16 fas2554-2n-ams-1 Network Consistency Diagnostic - weekly Node
21 fas2554-2n-ams-1 Vol Reaper Cluster
22 fas2554-2n-ams-1 Application Snapshot Reaper Cluster
25 fas2554-2n-ams-1 Network Consistency Diagnostic - weekly Node
14 entries were displayed.
Additional Information
- The general rule of thumb is that cluster or node affiliated jobs must not be touched unless instructed by NetApp Support.
- If an issue is suspected with any of those jobs, please contact NetApp Technical Support and reference this article for further assistance.
- Following EMS events can be monitored for job-related errors (for any or a particular job):
mgmtgwd.jobmgr.fetcherr
mgmtgwd.jobmgr.init.err
mgmtgwd.jobmgr.jobcomplete.failure
mgmtgwd.jobmgr.jobrestart
mgmtgwd.jobmgr.nofork
mgmtgwd.jobmgr.private.jobcomplete.failure