What is a staging volume and how to use it to troubleshoot issues?
Applies to
ONTAP
Answer
This article describes the usage of a staging volume and how to troubleshoot issues that can occur with staging volumes.
Staging Volumes for Auditing:
As part of Vserver auditing configuration in clustered Data ONTAP 8.2, a new type of system volume is created on each aggregate before auditing can be enabled. These volumes are called staging volumes.
When Auditing is configured for any Storage Virtual Machine (Vserver aka SVM) in the cluster, Data ONTAP creates a system volume called Staging volume of size 2GB on each aggregate in the cluster. These volumes are not owned by any SVM, but are shared among all audit-enabled SVMs with data volumes on that aggregate. SVM administrators will not have access to view/modify any properties of a Staging volume.
Staging volume is a dedicated volume used to temporarily store audit records of data access on a data volume belonging to an SVM on that aggregate. These records will be consolidated into a single log and stored in the path (valid path in the SVM’s namespace) specified in the audit config by a consolidation service which runs for each SVM. The consolidation service for the SVM is responsible for freeing space in the staging volumes by deleting the consumed staging files, which contain audit records for individual SVMs.
What can go wrong for Clustered Data ONTAP Audit due to Staging Volume issues?
- Audit Config creation for a SVM will fail, if any of the aggregates in the cluster do not have 2GB of free space to create staging volumes. In such a scenario, the Cluster administrator should increase the size of the aggregate which does not have sufficient space and the audit config creation should be retried for the SVM. Even if there are some aggregates that do not have data volumes for the SVMs for which auditing is being configured, the Cluster administrator should make sure that such aggregates also have 2GB of free space.
- While creating the Audit config for a SVM, the SVM administrator specifies a valid destination path in the SVM’s namespace, where the final consolidated audit logs in the specified format will be stored. The admin should ensure that the destination path has sufficient space to hold the consolidated audit logs. If the path configured to store final consolidated logs runs out of space, the consolidation service cannot continue; and hence, it will not be able to consume and delete the staging files on the staging volume. The consolidation service will not be able to delete the staging files, but data access for that SVM will generate new audit records. This can lead to staging volumes getting full and will result in client access failure. A faulty audit config of one SVM can result in client access failure for another SVM, since staging volumes are shared.
What needs to be checked when client access fails for an Audit enabled SVM?
- The SVM administrator should first check if the path specified in the audit config to store consolidated logs has sufficient space. If there is no space, the SVM administrator should increase the size of that volume.
- If the destination path has sufficient space and client access fails, the staging volume might be out of space (due to a faulty audit config of another SVM). There is an EMS generated from the auditing sub-system if the audit event cannot be generated due to insufficient space in a staging volume. In this case, the SVM administrator must contact the cluster administrator to check if any of the staging volumes have run out of space. If some staging volume has run out of space, the cluster administrator can increase the size of the staging volume.
- The client access will also fail if the staging volumes have been temporarily set offline (this can be done only by the cluster administrator).
- The consolidated log in the destination path is rotated by the consolidation service when the active log reaches the configured threshold (either it hits the configured log rotate size or log rotate time schedule). There is also a rotate-limit parameter in the audit config, which controls how many rotated audit logs will be retained in the destination path. By default this is set to 0 which means all the rotated logs will be retained. If this is configured to some value, it will make sure that it will retain only the latest log files, depending on the value of rotate-limit. This parameter is useful in making sure that the staging volume is not filled up due to insufficient space in the destination volume.
Operations permitted on staging volumes:
- Staging volumes can be deleted under admin privileges
Note: Staging volumes should never be deleted without the direction of Engineering, as it will leave remnants of the staging volume in the metadata SMF tables. If an aggregate should be removed and is failing to delete because it has volumes on it, it is failing because of the data volumes and not the system volumes. When the aggregate is deleted, it will delete the staging volumes and delete the auditing metadata out of the SMF tables. - Staging volumes can be set offline under admin privileges.
The above actions will result in client access denial when data access occurs on a volume residing on the aggregate whose staging volume has been deleted/ taken offline (if the SVM has auditing enabled).
Additional Information
- Staging volumes are created only once on all the aggregates of the cluster when audit configuration is created for any one of the SVMs in the cluster. If auditing is configured for subsequent SVMs in the cluster, it does not create it again, since staging volumes are already created and they can be shared among SVMs.
- Staging volumes begin with the name MDV_aud_ which says it is a metadata volume used for Auditing. This can be viewed only by the cluster administrators.
- If staging volumes become full, client access failures occur because of guaranteed auditing support in clustered Data ONTAP Audit. This means, if the audit log cannot be generated, the client access is denied.
- If an SVM has not enabled auditing, its client accesses will not be affected when the staging volumes become full/ are taken offline, or are not available.