What is the AltaVault eviction process?

Last updated
Save as PDF
Share
1. Share
2. Tweet
3. Share

Views:: 875

Visibility:: Public

Votes:: 0

Category:: altavault

Specialty:: legacy

Last Updated:

Applies to

AltaVault
AVA400
AVA800
AVA-v

Answer

AltaVault has a local disk cache that receives backup data as it is encoded and subsequently replicated to the cloud
The eviction process is responsible for maintaining some free space on the disk cache (datastore) and keeps it from filling up entirely
Deployments are typically sized such that the new data ingest rate will rotate the disk cache in about 30 days (default), which allows fast decoding (reading) of recently encoded (written) backup jobs
This local cache feature is a key differentiator for AltaVault and ensures a low recovery time objective (RTO) for the most recent backups
The evicter process uses three watermarks:
- Eviction threshold (evicter.maxpctused):
  - This is the utilization percentage that the evicter works to maintain
  - If the utilization exceeds this value, the evicter will start deleting slab files until it drops below it within a particular percentage
- Eviction alarm (evicter.maxpctused + evicter.alarmwindow):
  - This is the utilization percentage where an alarm will be raised, indicating that the utilization of the local disk cache is too high
- Eviction upperbound (evicter.upperbound):
  - This is the utilization percentage where Altavault will stop accepting write requests to prevent the physical disk from filling up entirely
  - At this level, the filesystem will return a 'no space on filesystem' error to the front-end protocols (OST/SMB/NFS)
When this occurs, log messages similar to the following will be seen

Apr  2 02:34:01 ss3030-dmz-nk2 rfsd[8609]: [megamount/vnode.ERR] (30318) encode failed: no space on filesystem

Apr  2 02:34:01 ss3030-dmz-nk2 rfsd[8609]: [megamount/vnode.ERR] (31432) encode failed: no space on filesystem

Apr  2 02:34:01 ss3030-dmz-nk2 rfsd[8609]: [write_cache.ERR] (30318)  flush failed: no space on filesystem

Apr  2 02:34:01 ss3030-dmz-nk2 rfsd[8609]: [megamount/vnode.ERR] (30318) failed to commit write: no space on filesystem

Apr  2 02:34:01 ss3030-dmz-nk2 rfsd[8609]: [write_cache.ERR] (30318) destructor flush failed :no space on filesystem

Apr  2 02:34:01 ss3030-dmz-nk2 rfsd[8609]: [encoder.ERR] (30318) Encoder 0x299e71018 destructor, aborting

Apr  2 02:34:01 ss3030-dmz-nk2 rfsd[8609]: [write_cache.ERR] (31432)  flush failed: no space on filesystem

Apr  2 02:34:01 ss3030-dmz-nk2 rfsd[8609]: [megamount/vnode.ERR] (31432) failed to commit write: no space on filesystem

Apr  2 02:34:01 ss3030-dmz-nk2 rfsd[8609]: [write_cache.ERR] (31432) destructor flush failed :no space on filesystem

Apr  2 02:34:01 ss3030-dmz-nk2 rfsd[8609]: [encoder.ERR] (31432) Encoder 0x28f726798 destructor, aborting

Apr  2 02:34:01 ss3030-dmz-nk2 rfsd[8609]: [megastore.NOTICE] (30318) aborting transaction 149/14340392 (29 resources)

Apr  2 02:34:01 ss3030-dmz-nk2 rfsd[8609]: [megastore.NOTICE] (31432) aborting transaction 149/14349236 (1668 resources)

Apr  2 02:34:02 ss3030-dmz-nk2 rfsd[8609]: [megastore.NOTICE] (30318) transaction 149/14340392 aborted

Apr  2 02:34:02 ss3030-dmz-nk2 rfsd[8609]: [megamount/vnode.ERR] (30318) error flushing write cache no space on filesystem

The "low space" occurs when the /data partition usage climbs above 93% ( default settings) ie. evicter.maxpctused + evicter.alarmwindow
Please be mindful of Percent used (evicter.pctused) which is the current disk cache utilization and is a measurement rather than a configurable parameter

Additional Information

There are two scenarios known to result in the system going read-only and printing the error messages above:

The rate of new data being written to the AltaVault exceeds the rate at which the evicter can free up space
- Dropping the evicter threshold to a lower value can help by providing more GB of free space for the backup but only if there are periods of low activity where the slower moving evicter can catchup to the data injested by the encoder
If the value assigned to evicter.maxpctused is greater than the evicter.upperbound, which will cause the device to enter read-only mode before the utilization has reached the threshold, where the evicter will cut in and start freeing up space
- This effectively deadlocks the evicter

Note:

The appropriate values will depend upon the size of the local disk cache, and smaller appliances will have more conservative values

Run the following commands to see the evicter parameters in a system dump:

$ grep evicter stats_megastore evicter.alarmwindow: 3 evicter.maxpctused: 90 evicter.pctused: 88 evicter.upperbound: 95

The output is bigger than what is seen above; however, these are the important parameters
They can also be found in output/rfsd.xml, which is the 'running config' of the rfsd service

<evicter max_local_capacity="0" max_used_pct="90" upper_bound_pct="95" evicter_num_threads="512" />

These values are also recorded every 60 seconds in the collect_stats/rfsctl-a.log file and archives:

$ egrep "TIMESTAMP|evicter.pctused" collect_stats/rfsctl-a.log | less TIMESTAMP: 2015-04-14 12:21:00 evicter.pctused: 88 TIMESTAMP: 2015-04-14 12:22:00 evicter.pctused: 88 TIMESTAMP: 2015-04-14 12:23:00 evicter.pctused: 88 TIMESTAMP: 2015-04-14 12:24:00 evicter.pctused: 88 TIMESTAMP: 2015-04-14 12:25:00 evicter.pctused: 88

Run the following commands to display the evicter parameters on a live Altavault (the service has to be running):

av730-rtp # rfsctl exec evicter evicter.alarmwindow: 3 evicter.maxpctused: 90 evicter.pctused: 88 evicter.upperbound: 95

Again, some parameters have been suppressed for the sake of brevity

Evicter verification of cloud data and "Inconsistent Cloud Data alarm"

Before deleting a slab the Evicter will check the slab's md5sum with the value stored in the cloud object
The cloud doesn't calculate the md5sum but just returns the metadata value so we aren't confirming the integrity of the cloud slab but instead confirming the revision
Slabs can be changed and enqueued to be re-uploaded to the cloud and this check ensures the copy in the cloud matches the local slab
We don't want to delete a slab that hasn't yet been replicated to the cloud
If the md5sum check fails an alarm is raised "Inconsistent Cloud Data"
We have some experience with cloud vendors that send empty (null) responses or old data
Evicter can pause replication
If the evicter is not getting prompt responses to the HEAD requests agains the cloud slab objects it will tempoarily pause replication

Example:

Sep 28 03:01:20 altavault01 rfsd[6599]: [evicter.INFO] (8921) Evicter will pause replication Sep 28 03:01:41 altavault01 rfsd[6599]: [evicter.INFO] (8921) Evicter will resume replication

Average Evicted Age

The evicter keeps track of the age of the slabs that are deleted and this statistic is a good way to determine if the local disk cache is too small and is rotating too quickly
- This stat also affects the RTO because restoring files older than the average evicted age will likely be slower as data is download from the cloud
The average evicted age can be viewed in the WebUI under Reports

Example:

$ /support/bin/stats_evicted.py

timestamp    evicted_bytes evicted_age

Jul 01 04:38   3.1 GB 48 day

Jul 01 04:39   3.8 GB 48 day

Jul 01 04:40   3.6 GB 48 day

Jul 01 04:41   3.3 GB 48 day

<snip>

Jul 01 07:18   4.3 GB 47 day

Jul 01 07:19   2.6 GB 47 day

Jul 01 07:20   2.5 GB 47 day

Jul 01 07:21   2.3 GB 47 day

Min: 47 day, Max: 48 day, Avg: 48 day, StdDev: 4 hour

If the average evicted age is less than 30 days (adjustable) the average evicted age alarm will trigger
This value in the WebUI under Altavault: Reports > Eviction, but the graph may not be accurate outside of the 5-minute view as the values can be down-averaged through aggregation
For example, Altavault calculates the average_eviction_age every five seconds. The periods where the evicter is not running will be recorded as 0

avg_evicted_age

30 days

sample interval

5 sec

As the datapoints age, they are 'rolled up' into larger periods and after five minutes the values are averaged into 60-second intervals
- For this example, assume that the evicter runs only for 20 seconds of the 60 seconds
The average value for the aggregated 60-second period would be ((30 days * 20 sec) + (0 * 40 sec)) / 60 sec = 10 days
This is an undesirable side effect of this aggregation method
The avg_evicted_age alarm will trigger by default if the value drops below 30 days (2592000 seconds), which can be adjusted to suit the customer's needs

# show alarm avg_evicted_age

Alarm Id: avg_evicted_age

Alarm Description: Datastore Eviction

Enabled: no

Alarm State: (enabled)

Error threshold: 2592000

Clear threshold: 3024000

Rate limit bucket counts: (email) { 5, 20, 50 }

Rate limit bucket windows: (email) { 3600, 86400, 604800 }

Rate limit bucket counts: (snmp) { 5, 20, 50 }

Rate limit bucket windows: (snmp) { 3600, 86400, 604800 }

Last checked at: 2015/06/16 11:41:39

Last checked value: 4294967295

Last error at:

Last clear at:

The alarm can also be disabled entirely

(config) # alarm avg_evicted_age enable

(config) # no alarm avg_evicted_age enable

For example, to adjust the threshold to rise at 14 days and clear at 15 days

(config) # alarm avg_evicted_age error-threshold 1209600

(config) # alarm avg_evicted_age clear-threshold 1296000

Effect of average eviction age on cloud sparsity

After a file is deleted the slabs that it references are inspected to see if they are referenced by any other files stored on the appliance
If any part of the slab is still referenced then it cannot be deleted but if it is in the local cache and more than 50% of it is unused then the slab can be compacted then reenqueued to be replicated to the cloud where it will overwrite the older, larger version
The important part to remember is that slab compaction can only occur when the slab is in the local cache
Cloud-only slabs can only be deleted once they are 100% unused
- Amazon Glacier uses a larger object called a package, which is a bundle of 64 slabs
Most backup strategies maintain a set of rotating schedules where say daily backups are kept for a month, the weekly backups are kept for a year, monthly backups are kept for 5 years, etc
If the age where backups are expired and deleted is less than the average eviction age then the slabs referenced by them are likely to be local, and therefore available to be compacted
If the average eviction age is less than the age of the typical backup at the time of deletion then there will be some slabs that could have been compacted but were not because they are cloud-only
Cloud objects that hold a lot of unused data is called cloud sparsity and environments that rotate their local cache quickly with a short average eviction age are more likely to develop cloud sparsity
Users of Amazon Glacier are more susceptible to cloud sparsity because of the larger size of the data objects stored in the cloud (packages = 64 slabs)