Skip to main content
NetApp Knowledge Base

Maintenance Center Frequently Asked Questions

Views:
4,567
Visibility:
Public
Votes:
10
Category:
ontap-9
Specialty:
hw
Last Updated:

Applies to

  • ONTAP 9
  • Data ONTAP 8.2 (7-mode)

Answer

What is the Maintenance Center?

The purpose of the Maintenance Center is to improve storage reliability by reducing the number of unnecessary disk returns to NetApp due to transient errors.

The Maintenance Center provides a new disk diagnostics capability built into Data ONTAP. The Maintenance Center automatically manages disk failures through a systematic failure-verification process while the failing disk is still in the customer‘s system. A disk is identified by the current health management system as being a potential failure. Instead of the disk being failed and an AutoSupport Return Merchandise Authorization(RMA) case being generated, the disk is removed from the current aggregate and sent to the Maintenance Center. User-data is migrated from the disk onto a spare, through reconstruction or Rapid RAID recovery, depending on the type of errors being received. The process occurs without user intervention and only a few messages are sent to the console reporting the action.

Once in the Maintenance Center, the disk is tested in the background, without disrupting the other operations of the system. If the transient errors can be repaired, the disk will be returned to the spares pool. If not, the disk is failed. In many cases, the testing provided can correct errors that would have previously caused a drive to be failed, or would have caused system interruption, for example, a WAFL hang panic.

What are the key customer benefits of the Maintenance Center?

The Maintenance Center improves the customer experience with NetApp disk drives by significantly reducing the number of unnecessary disk returns. Customers will have lower lifetime management costs stemming from fewer component failures and increased system reliability.

How does a drive get selected to go into the Maintenance Center?

Data ONTAP has a defined set of errors and thresholds, which are used to select disks for maintenance.This set of thresholds and errors may vary between releases as they are modified based on new information.Disks that receive errors, which are known fatal errors, will not go into maintenance testing and will be failed.

Currently the list includes:

  • A significant number of recovered or un-recovered disk errors in a short time
  • A large number of recovered or un-recovered disk errors over several days
  • Repeated recovered or un-recovered disk errors at the same location
  • Repeated disk command timeouts on one disk
  • Disk reported hardware errors that are not fatal
  • Health triggers which are based on recommendations from disk drive manufacturers to warn of potential problems
  • The errors and error thresholds will evolve with new disk technologies and information gathered from the current release.
Why is the Maintenance Center set to On by default?

The Maintenance Center is a key supportability feature of Data ONTAP and enhances NetApp storage reliability. Therefore, it is set to On by default.

How does the customer know when a disk enters the Maintenance Center?

When a disk enters the Maintenance Center, an Event Management System (EMS) event is posted. There is another EMS event when a disk completes testing successfully, fails testing, or when testing is aborted. All Maintenance Center EMS events have a syslog message. The CLI commands ‘vol status -r‘ and ‘sysconfig -r‘ show disks that are in the Maintenance Center. The ‘disk maint status‘ command can be used to list drives that are being maintenance tested and to display test progress.

Can I turn off the Maintenance Center feature and what is the impact?

Yes, the following command can be executed:

options disk.maint_center.enable off

Please see the Disk performance and health section of the Storage Management Guide for more details. The Maintenance Center improves overall disk reliability. When the Maintenance Center is turned off, a problematic disk will be automatically failed instead of being tested.

Will the Maintenance Center affect the performance of my NetApp appliance?

The Maintenance Center has a very minimal performance impact on the NetApp appliance.Many of the Maintenance Center diagnostics tests are executed directly by the drive instead of requiring CPU resources from the NetApp appliance.

How many NetApp devices can be in the Maintenance Center at a time?

The Maintenance Center supports concurrent diagnostics of up to 84 disks. You can limit of number of disks running Maintenance Center tests with the following command:

options disk.maint_center.max_disks max_disks

where max_disks can be from one to 84.

For optimal Maintenance Center operation, does NetApp recommend a minimum number of disks in the spares pool?

NetApp recommends a minimum of 2 disks in the spares pool. The current release of the Maintenance Center will continue to operate and test drives even if this minimum is not met. Future releases of the Maintenance Center will prevent drives from entering the Maintenance Center if the spares pool minimum is not met.

How often can the same drive enter the Maintenance Center?

Once. The first time a disk exhibits transient errors, it enters the Maintenance Center, and it is marked accordingly. If it is returned to the spares pool and subsequently exhibits transient errors then the disk is failed and an ASUP is sent for a replacement disk. The current rule is only one visit to the Maintenance Center for each disk.

What type of data does the Maintenance Center collect?

The Maintenance Center does not collect any customer data. The Maintenance Center collects only NetApp disk-specific information such as:

  • Reasons that the disk was sent to Maintenance Center
  • Disk serial number
  • Tests that were run and the results
  • Test time duration
  • Test output and whether specific errors were detected, such as medium errors
What is the relationship between AutoSupport (ASUP) and the Maintenance Center?

AutoSupport is a notification tool that is built into Data ONTAP which enables you to set up specific notifications to both yourself and the NetApp Global Support Center. The Maintenance Center uses AutoSupport to transport its findings back to NetApp as a part of the weekly data log.

Where can I get more information about the Maintenance Center?

Please see the ONTAP release notes and the Storage Management Guide for more information about the Maintenance Center.

What is a maintenance disks pool?

Maintenance disks pool refers to disks being tested by Maintenance center. Sysconfig -r output may show maintenance disks with some disks being tested.

How long will it take before Maintenance Center makes a decision to either return the disk to service or fail it out and generate a support case for disk replacement?

The maintenance center will fail the drive on the first test that fails. If it fails the first test, the drive will be failed out and an ASUP generated. If all the tests run successfully, then a drive will return to the spare pool at the end of the cycle. This time depends on the size and type of the disk. However, the time is aproximately equal to 2 1/2 times the zeroing time for a disk.

Will a disk in the maintenance center have any effect on an ANDU (Automated Non Disruptive Upgrade) of OnTap?

The maintenance center disk is not considered a failed disk by the OS,  and as a result, will not cause a veto  during the required storage failover giveback processes performed during a Ontap 9 upgrade.

Additional Information

additionalInformation_text

 

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.

 

  • Was this article helpful?