Skip to main content
NetApp Knowledge Base

Azure CVO panics due to WAFL inconsistencies

Views:
37
Visibility:
not set
Votes:
0
Category:
not set
Specialty:
microsoft_azure
Last Updated:

Applies to

  • Cloud Volumes ONTAP (CVO)
  • Microsoft Azure 

Issue

The root volume for an single node Azure CVO is in a WAFL inconsistent state, causing a node to operate on AUTOROOT pending recovery:
Fri Sep 19 07:12:00 +0000 [CVO_System-02: wafl_exempt13: sk.panic:alert]: Panic String: Unrecoverable metadata block (file -1, block 21198419, fbn 23246, level 0, file type 1) in volume vol0. WAFL inconsistent. Contact NetApp technical support. in SK process wafl_exempt13 on release 9.16.1P5 (C)

Cause

  • CVO systems deployed with Azure page blob disks are susceptible to I/O throttling from Azure, resulting in “AZURE_SERVER_BUSY” responses
  • This throttling can lead to delayed or failed disk operations, causing:
    • Loss of access to critical ONTAP metadata blocks
    • WAFL inconsistencies (especially in root or system volumes)
    • Controller failover issues and node panic events
    • Bad block and checksum verification errors
  • Error seen in server logs:

Mon Sep 15 14:27:08 +0000 [jazsacvo100c-01: OscHighPriThreadPool_126: pha.obj.throttle:notice]: PHA Obj throttled 7393 times on the blob 01RootBlob-cvo_systemin the container blobcontainer due to the "AZURE_SERVER_BUSY".

Solution

  • Contact Azure Support to investigate the issue
  • Redeploy the CVO to migrate the instance to new hardware in Azure's data center
    • If this is related to the Azure CVO being deployed with page blobs, the following can be done to mitigate this in the future:
      • Migrate all data to a new CVO deployment that is using Azure managed disks
        • Managed disk-based CVO systems are not subject to “AZURE_SERVER_BUSY” throttling and provide improved performance and reliability
        • This requires deploying a new CVO instance with managed disks and using SnapMirror or other migration tools to move data

Partner Notes

partnerNotes_text

Additional Information

  • Page blob disks have lower IOPS and throughput limits than managed disks
  • Throttling is more likely under high load or with large datasets
  • Upsizing the VM instance may not resolve the issue, as page blob limits are at the storage account and disk level
  • See: CVO architecture: Page Blob vs Managed Disks

Internal Notes

internalNotes_text

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support