Flexcache volume become full and NFS client requests got hung
Applies to
- ONTAP 9.5 and later
- FlexCache
- NFS
Issue
- Flexcache volume went up to 99% full and caused client requests to hang.
- Operations like "ls" and "cd" on NFS client got hung.
- Origin volume was NOT full and noticed below error in ems logs:
Wed Mar 29 11:27:54 -0700 [nodeA: wafl_exempt10: wafl.vol.full:alert]: Insufficient space on volume vol__0001@vserver:a385f57a-afbd-11ed-91c0-00a098ba0334 to perform operation. 432KB was requested but only 384KB was available.
Wed Mar 29 11:27:55 -0700 [nodeA: wafl_spcd_main: monitor.volume.full:debug]: Volume vol__0001@vserver:a385f57a-afbd-11ed-91c0-00a098ba0334 is full (using or reserving 99% of space and 7% of inodes).
Wed Mar 29 11:27:56 -0700 [nodeA: FgGroupListTimer: fg.space.member.full:alert]: Constituent 1099 in FlexGroup vol (fg-uuid b5a85457-b48e-11ed-948e-00a098dec0b4) is out of space.
Wed Mar 29 11:37:54 -0700 [nodeA: wafl_exempt13: wafl.vol.full:alert]: Insufficient space on volume vol__0001@vserver:a385f57a-afbd-11ed-91c0-00a098ba0334 to perform operation. 424KB was requested but only 380KB was available.
- NFS operations failed:
Wed Mar 29 11:28:12 -0700 [nodeA: kernel: Nblade.dBladeNoResponse.NFS:error]: File operation timed out because there was no response from the data-serving node. Node UUID: 858edac4-7bd1-11ed-a6ec-00a098dec0b4, file operation protocol: NFS, client IP address: 10.1.2.3, RPC procedure: 17.
- sktrace shows below at issue time:
2023-03-29T18:27:55Z 14646667110509780 [13:0] WAFLREMOTE_EXCEPTION: store cache 1089.4389 of origin 2156655294.1853 snapid 0: debt enospc (error 292)
2023-03-29T18:27:55Z 14646667110610864 [13:0] WAFLREMOTE_EXCEPTION: store cache 1089.4389 of origin 2156655294.1850 snapid 0: debt enospc (error 292)
- Ideally, flexcache volume should run a scrub job and evict data when volume is 90% full. But in this case, volume reached to 99%.