The Network Recovery Job has two major components: an RDB table in MgmtGateway to record failures, and a Job Manager Job responsible for responding to those failures.
- Network Recovery Table
Errors tracked by the Network Recovery Manager are typically failures that are not immediately recoverable. Either some network glitch must be resolved or some software subsystem must be stabilized before corrective action is possible. During the time between observing a failure and recovering from that failure, it should be tracked in the Error Recovery Table.
- Network Recovery Job
The network recovery job is scheduled to run once every 5 minutes to look for any recovery actions that need to be executed. It is responsible for iterating over the Recovery Table and attempting to apply any recovery actions which are no longer considered in-flight. If the action is successful, then it is removed from the table.