How long are cluster- or server-affinity jobs expected to run?
Applies to
- Data ONTAP 8
- ONTAP 9
Answer
Jobs are asynchronous task and typically long-running volume operations, such as copying, moving, or mirroring data. Jobs are placed in a job queue and are run when resources are available.
Some Jobs want to start and run to completion without interruption. Other Jobs initiate some external operation and wait for that operation to complete. To avoid wasting a thread, in this latter case doing nothing but waiting, the Job can relinquish the thread but tell the Job Manager that there is more work to do. The Job Manager periodically reschedules the Job to allow it to check on completion of this external operation.
Some of the default jobs that we expect to see on any NetApp appliance:
::> job show -fields id,name,affinity
id vserver name affinity
-- ---------------- ---------------------- --------
1 fas2554-2n-ams-1 SnapMirror Service Job Node
3 fas2554-2n-ams-1 Certificate Expiry Check Cluster
4 fas2554-2n-ams-1 SP Certificate Expiry Check Job Cluster
5 fas2554-2n-ams-1 SP log collection job Cluster
6 fas2554-2n-ams-1 CLUSTER BACKUP AUTO 8hour Cluster
7 fas2554-2n-ams-1 CLUSTER BACKUP AUTO daily Cluster
8 fas2554-2n-ams-1 CLUSTER BACKUP AUTO weekly Cluster
11 fas2554-2n-ams-1 Auto Balance Aggregate Analyzer Cluster
13 fas2554-2n-ams-1 SnapMirror Service Job Node
14 fas2554-2n-ams-1 Network Recovery Job - 5min Cluster
16 fas2554-2n-ams-1 Network Consistency Diagnostic - weekly Node
21 fas2554-2n-ams-1 Vol Reaper Cluster
22 fas2554-2n-ams-1 Application Snapshot Reaper Cluster
25 fas2554-2n-ams-1 Network Consistency Diagnostic - weekly Node
14 entries were displayed.
Additional Information
The general rule of thumb is that cluster or server(node) affiliated jobs must not be touched unless instructed by NetApp Support. If an issue is suspected with any of those jobs, please contact NetApp Technical Support and reference this article for further assistance.
Following EMS events can be monitored for job-related errors (for any or a particular job):
mgmtgwd.jobmgr.fetcherr
mgmtgwd.jobmgr.init.err
mgmtgwd.jobmgr.jobcomplete.failure
mgmtgwd.jobmgr.jobrestart
mgmtgwd.jobmgr.nofork
mgmtgwd.jobmgr.private.jobcomplete.failure