BIOS updates for memory reliability and the PPR feature
Applies to
- Platforms:
- AFF A900 / FAS9500
- AFF A800 / AFF C800
- AFF A700 / FAS9000
- AFF A700s
- AFF A400 / AFF C400 / FAS8300 / FAS8700
- Post Package Repair (PPR)
Answer
What products include the PPR feature?
Products | BIOS Version | Bundled with ONTAP | RFE Report |
---|---|---|---|
AFF A700, FAS9000 | 10.9 | For BIOS 10.9+, ONTAP support is also required: 9.5P15, 9.6P12, 9.7P8, 9.8 and later |
1278330 |
AFF A700s | 12.8 | 9.5P15, 9.6P12, 9.7P9, 9.8 and later | 1354656 |
AFF A800 | 13.10 | 9.5P18, 9.6P15, 9.7P14, 9.8P4 and later | 1371369 |
AFF A400, FAS8700, FAS8300 | 16.3 | 9.7P12, 9.8P2 and later | 1373545 |
AFF A900, FAS9500 | 18.3 | 9.10.1RC2, 9.10.1 and later | N/A |
What are the BIOS update and Post Package Repair (PPR) enhancements for?
Recent BIOS updates address various memory event handling functions on a per-platform basis. NetApp systems use different Intel CPU chipsets and therefore, each platform has its own BIOS update content.
NetApp is introducing Post Package Repair (PPR) into its products to improve the overall operational experience. PPR is a new memory capability which works in conjunction with newly created features added to ONTAP. These features allow NetApp to leverage PPR-enabled memory and proactively address memory issues, reducing the need to replace DIMMs when memory errors have been detected. In addition, NetApp is also adopting new BIOS updates to improve handling of memory-related errors (correctable and uncorrectable ECC errors).
- NetApp’s use of newer memory technologies beginning with DDR4 include PPR capability.
- When combined with a PPR-enabled controller and operating system, the system can map out a bad memory row and utilize a spare row on the DIMM.
Why are these updates important and why should I upgrade?
NetApp’s newest systems have drastically increased in memory capacity and memory speed over older models. NetApp’s newer systems use DDR4 memory and have anywhere from 4x to 12x the memory of older systems, but memory quality has remained at a steady-state level. Due to the greater number of DIMM modules in the system, system mean time between failure (MTBF) decreases, with potentially higher levels of system maintenance for memory issues.
Upgrading the system’s BIOS will help to incrementally reduce the need to replace DIMMs, reducing the need to address memory-related failures on the system.
- As Intel updates its BIOS to add additional memory testing or memory error handling fixes, NetApp tests these fixes and provide them on the NetApp Support site.
- BIOS updates are platform-specific, and each revision carries incremental improvements, fixes, or new features - such as PPR functionality. NetApp provides regular updates to improve the overall system experience.
- Initial PPR functionality is enabled based on the platform (see platform-specific functionality). Future updates will add additional failure mode detection capabilities and further reduce the need to replace DIMMs.
How will the PPR feature change the behavior of my systems?
- When a uncorrectable memory error is encountered, the system will panic.
- In an HA configuration, the partner will take over and continue to provide services.
- When the system reboots from BIOS, it will begin a PPR memory test.
The PPR test can take several minutes for the system to test the memory and display the results on the system console.
What action is needed once the PPR test has completed?
- Replacement not required - If PPR can detect the problematic memory segment, it will repair it.
- If the system can recover, it will provide messaging around the event.
PPR:Sequence PASS.
- No further action would be required.
- If the system can recover, it will provide messaging around the event.
- Replacement required - If the memory fails or cannot be repaired, the system will not boot ONTAP and a DIMM replacement will be required.
- If the same DIMM experiences a 2nd UECC error and panic, you can choose to replace the DIMM. Contact NetApp to order a DIMM replacement
What is being planned in future BIOS/PPR updates?
Future updates will add additional failure mode detection capabilities, to further reduce the need to replace memory DIMMs.
Additional Information
For general information on troubleshooting uncorrectable ECC memory errors, see: How to troubleshoot uncorrectable memory errors on AFF and FAS systems