CONTAP-431967: ONTAP FPolicy configuration replication fails and FPolicy Service Manager restarts repeatedly during bulk delete of policies, events, or engines
Issue
- FPolicy configuration replication fails on a node after bulk deletion of a large number of FPolicy policies, events, or engines
- FPolicy Service Manager process restarts repeatedly on the affected node
vserver fpolicy show-enginereturns no entries and displays an RPC error for the affected node- Event log shows repeated occurrences of the following events:
{code}
mgmt.fpolicy.replay.failed: FPolicy configuration replication process failed.
{code}
{code}
spm.fpolicy.process.exit: Fpolicy Service Manager process with ID <pid> exited as a result of signal signal 15. The service will attempt to restart.
{code} vserver fpolicy show-engineoutput:
{code}
Warning: Unable to list entries on node <nodename>. RPC: Remote system error
[from mgwd on node "<nodename>" (VSID: -1) to fpolicy at <address>]
{code}- Internal logs show FPolicy database operations timing out during the replication process:
{code}
Error operating on appcfg_db, <Timeout: Operation "fpolicy_appcfg_policy_status_db_iterator::remove_imp()" took longer than 25 seconds to complete>
{code}
{code}
Error operating on appcfg_db, <System busy: 7 requests on table "fpolicy_appcfg_policy_status_db" have been pending for 2172 seconds. The last completed call took 0 seconds.>
{code}
{code}
[replayInternal][catastrophe] Failed to replay [fpolicy_policy_status_db -> fpolicy_policy_status_db_fsm_mirror, fpolicy_appcfg_policy_status_db]. Clearing out retry-count & continuing...
{code} - FPolicy configuration cache on the node becomes out of sync with the mirror tables
