Skip to main content
NetApp Knowledge Base

What are the important changes to RAID scrub in Data ONTAP 8.3.x or later

Views:
3,643
Visibility:
Public
Votes:
1
Category:
ontap-9
Specialty:
hw
Last Updated:

Applies to

ONTAP 9

Answer

  • General higher CPU and Disk utilization may be observed, especially during night hours.
  • A possible reason can be the change to the RAID scrub schedule introduced in Data ONTAP 8.3.
    • The default RAID scrub schedule changed in Data ONTAP 8.3 - scrubs are run every day.
      • For more information for raid.scrub.schedule , refer to Storage raid-options Commands.
        • Note: If no specific value is defined, the default schedule will apply.
  • This option specifies the weekly schedule (day, time, and duration) for scrubs started automatically.

    • On a non-AFF system, the default schedule is daily at 1 a.m. for the duration of 4 hours except on Sunday when it is 12 hours

    • On an AFF system, the default schedule is weekly at 1 a.m. on Sunday for the duration of 6 hours

  • By default, scrub will run 4 hours every day, thus the overall scrub runtime will be higher and scans will complete more frequently compared to prior ONTAP 8.3 versions of ONTAP. 
    • It is expected behavior that the system will have higher CPU and disk activity during this time.
    • If this is an issue during the week, the schedule can be defined to run at specific times and for specific durations.
Example:
  • Use the storage raid-options show command to check the current settings:
cluster::> storage raid-options show -name raid.scrub.schedule
Node     Option                                Value        Constraint
-------- ------------------------------------- ------------ -----------
cluster-01 raid.scrub.schedule                           none
cluster-02 raid.scrub.schedule                           none
2 entries were displayed.
  • Use the storage raid-options modify command to change the schedule as required:
    • NOTE
      • The following command schedules two weekly RAID scrubs. 
      • The first scrub is for 240 minutes (four hours) every Tuesday starting at 2 a.m. 
      • The second scrub is for eight hours every Saturday starting at 10 p.m.
cluster::> storage raid-options modify -node cluster-01 -name raid.scrub.schedule 240m@tue@2,8h@sat@22
Specified scrub schedule added
  • With the storage raid-options show command, you can verify the change to the schedule:
cluster::> storage raid-options show -name raid.scrub.schedule
Node     Option                                Value        Constraint
-------- ------------------------------------- ------------ -----------
cluster-01 raid.scrub.schedule              240m@tue@2   none
cluster-02 raid.scrub.schedule                           none
2 entries were displayed.
Verification through the Event log:
  • You can also verify and search for the related messages using the event log show command:
    • NOTE: ONTAP 9 requires promoting to diag mode by set diag command.
  • In this example, starting at 1 AM as per the default schedule.
Cluster-01::> event log show -messagename raid.rg.scrub.resume
[?] Tue May 24 01:00:12 CEST [cluster: config_thread: raid.rg.scrub.resume:notice]: /aggr_ssc_dc1_ds11_b_sata_root/plex0
/rg0: resuming scrub at stripe 578657472 (89%% complete)
  • To check for the pausing of scrub, search for a suspend message:
    • In this example, it suspends at 5 AM after 4 hours runtime, as per the default schedule.
Cluster-01::> event log show -messagename raid.rg.scrub.suspend
[?] Tue May 24 05:00:01 CEST [cluster: config_thread: raid.scrub.suspended:notice]: Disk scrub suspended. 
  • To check for the summary, run:
Cluster-01::> event log show -messagename raid.rg.scrub.summary 
[?] Tue May 24 05:00:01 CEST [cluster: config_thread: raid.rg.scrub.summary.lw:notice]: Scrub found 0 RAID write
signature inconsistencies in /aggr_ssc_dc1_ds11_b_sata_data_01/plex0/rg0.

Additional Information

statit will show greads during this time, and the disks may show 100% busy, but this is normal due to RAID scrubs being background activity at disk level:

Cluster::> set advanced
Cluster::*> node run -node node_1 -command statit -b
Cluster::*> node run -node node_1 -command statit -e
...
                       Disk Statistics (per second)
        ut% is the percent of time the disk was busy.
        xfers is the number of data-transfer commands issued per second.
        xfers = ureads + writes + cpreads + greads + gwrites
        chain is the average number of 4K blocks per command.
        usecs is the average disk round-trip time per 4K block.

disk             ut%  xfers  ureads--chain-usecs writes--chain-usecs cpreads-chain-usecs greads--chain-usecs gwrites-chain-usecs
/data_aggr1/plex0/rg0:    
0a.00.4           79 280.18    0.00   ....     .   0.00   ....     .   0.00   ....     . 280.18  64.00   60   0.00   ....     .
0a.00.18          84 280.18    0.00   ....     .   0.00   ....     .   0.00   ....     . 280.18  64.00   67   0.00   ....     .
0a.00.10          82 280.18    0.00   ....     .   0.00   ....     .   0.00   ....     . 280.18  64.00   64   0.00   ....     .
0a.00.19          87 280.18    0.00   ....     .   0.00   ....     .   0.00   ....     . 280.18  64.00   71   0.00   ....     .
0a.00.12          86 280.18    0.00   ....     .   0.00   ....     .   0.00   ....     . 280.18  64.00   69   0.00   ....     .
0a.00.17          90 280.18    0.00   ....     .   0.00   ....     .   0.00   ....     . 280.18  64.00   75   0.00   ....     .
0a.00.16          91 280.18    0.00   ....     .   0.00   ....     .   0.00   ....     . 280.18  64.00   77   0.00   ....     .
0a.00.2           91 280.18    0.00   ....     .   0.00   ....     .   0.00   ....     . 280.18  64.00   76   0.00   ....     .
0a.00.3           92 280.18    0.00   ....     .   0.00   ....     .   0.00   ....     . 280.18  64.00   78   0.00   ....     .
0a.00.5           94 280.18    0.00   ....     .   0.00   ....     .   0.00   ....     . 280.18  64.00   82   0.00   ....     .
0a.00.6           95 280.18    0.00   ....     .   0.00   ....     .   0.00   ....     . 280.18  64.00   85   0.00   ....     .
0a.00.7           95 280.18    0.00   ....     .   0.00   ....     .   0.00   ....     . 280.18  64.00   87   0.00   ....     .
0a.00.13          96 280.18    0.00   ....     .   0.00   ....     .   0.00   ....     . 280.18  64.00   89   0.00   ....     .
0a.00.15          96 280.18    0.00   ....     .   0.00   ....     .   0.00   ....     . 280.18  64.00   92   0.00   ....     .

 

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.