AIQUM upgrade hangs during certificate processing due to expired intermediate CA certificate in truststore
Applies to
- Active IQ Unified Manager 9.6+ (AIQUM)
- All OS platforms
- Upgrades between any major/minor versions
Issue
- AIQUM upgrade hangs indefinitely during the "Setting Keystore and Truststore using keystoresetup JEP" phase
- The upgrade process appears frozen — no progress, no error displayed on screen
- CPU usage remains elevated (Java process spinning)
jep.logshows repeated processing of the same certificate alias with no termination:- Linux/OVA:
/var/log/ocum/jep.log - Windows:
\ProgramData\NetApp\OnCommandAppData\ocum\log\jep.log
- Linux/OVA:
INFO [main] [com.netapp.jeps.keystoresetup.Main] Fetching aliases of parent certificate for certificate of alias <alias_name>- The log entry repeats for the same alias indefinitely without progressing to the next certificate or completing
- If the upgrade is killed and retried, it hangs at the same point
Cause
- The AIQUM upgrade process runs
keystoresetup.jarwhich iterates through all certificates inserver.truststoreto rebuild the certificate chain hierarchy - The internal method
AliasUtils.fetchAliasesOfCertificateChain()validates each certificate and follows the chain to its root CA - When an expired intermediate CA certificate is encountered, the validation returns an
EXPIREDstatus but the loop logic only breaks onNO_TRUST_ANCHOR— it continues iterating indefinitely on expired certificates - This is a known defect (burt 1519525 / CPE-276) that has not been fixed
- The expired certificate was likely imported into the truststore when a cluster with a certificate signed by that intermediate CA was added to AIQUM, and the intermediate CA has since expired
Solution
- Revert to the pre-upgrade VM snapshot, or kill the hung upgrade process:
- Linux/OVA:
kill $(pgrep -f keystoresetup) - Windows: End the
java.exeprocess runningkeystoresetupvia Task Manager
- Linux/OVA:
- List all certificates in the truststore and identify expired ones:
Linux/OVA (AIQUM below 9.14):keytool -list -v -keystore /opt/netapp/essentials/jboss/server/onaro/cert/server.truststore -storepass changeit | grep -A2 "Valid from"
Linux/OVA (AIQUM 9.14+):. /opt/netapp/essentials/bin/commonfunctions.sh && set_javahome
CS=/opt/netapp/essentials/jboss/standalone/data/jbossCredStore.cs
TOKEN=$(grep 'truststore.token' /opt/netapp/essentials/conf/server.properties | sed 's/^truststore.token=//')
PASS=$(java -jar /opt/netapp/ocum/lib/jeps/credentialstore.jar -p "$CS" -a jboss -t "$TOKEN")
keytool -list -v -keystore /opt/netapp/essentials/jboss/server/onaro/cert/server.truststore -storepass "$PASS" | grep -B5 -A2 "Valid from"
Windows (AIQUM below 9.14):keytool -list -v -keystore "C:\Program Files\NetApp\ocum\jboss\server\onaro\cert\server.truststore" -storepass changeit | findstr /C:"Valid from"
Windows (AIQUM 9.14+):
Extract the truststore password from the credential store using the samecredentialstore.jarapproach with Windows paths:"C:\Program Files\NetApp\ocum\jboss\bin\jboss-cli.bat"or consult What are the notable log files and their respective locations for AIQUM for path details - Identify expired intermediate CA certificates (not self-signed, not leaf certs) — look for certificates where:
- The
Valid from ... untilend date is in the past - The certificate is a CA certificate (Issuer differs from Subject, or has BasicConstraints: CA:TRUE)
- The
- Remove each expired intermediate CA certificate by alias:
Linux/OVA:keytool -delete -alias "<alias_name>" -keystore /opt/netapp/essentials/jboss/server/onaro/cert/server.truststore -storepass "$PASS"
Windows:keytool -delete -alias "<alias_name>" -keystore "C:\Program Files\NetApp\ocum\jboss\server\onaro\cert\server.truststore" -storepass "<password>"
Note: Only remove expired intermediate CA certificates, not the root CA or leaf certificates for active clusters - Verify the removal:
Linux/OVA:keytool -list -keystore /opt/netapp/essentials/jboss/server/onaro/cert/server.truststore -storepass "$PASS" | grep -i "<alias_name>"
Windows:keytool -list -keystore "C:\Program Files\NetApp\ocum\jboss\server\onaro\cert\server.truststore" -storepass "<password>" | findstr /I "<alias_name>"
Should return no results - Retry the upgrade
Note: After upgrade completes, if any cluster that was using the removed intermediate CA has since renewed its certificate with a valid chain, re-adding or rediscovering the cluster will import the new valid chain automatically.
Partner Notes
- This issue can be identified pre-upgrade by checking for expired certificates in the truststore before starting the upgrade process
- Pre-flight check:
keytool -list -v -keystore server.truststore -storepass <password> 2>/dev/null | grep -A1 "until:" | grep "until:" | while read line; do exp_date=$(echo "$line" | sed 's/.*until: //'); if [ $(date -d "$exp_date" +%s) -lt $(date +%s) ]; then echo "EXPIRED: $line"; fi; done
Additional Information
- What are the notable log files and their respective locations for AIQUM — complete log file path reference for all platforms
- Active IQ Unified Manager upgrade fails due to corrupt keystores — if upgrade fails (not hangs) with "Response Code: 72"
- AIQ Unified Manager: acquisition fails for all clusters after upgrade — if acquisition fails after a completed upgrade
- Expired intermediate CA certificates are typically imported when ONTAP clusters using certificates signed by an enterprise PKI are added to AIQUM — when the intermediate CA expires, the certificate remains in the AIQUM truststore
Internal Notes
- Defect: burt 1519525 / CPE-276 (2022, never fixed)
- Root cause in code:
AliasUtils.fetchAliasesOfCertificateChain()inkeystoresetup.jar:- Bug 1: The loop condition only breaks on
PKIXRevocationChecker.Option.NO_TRUST_ANCHOR— anEXPIREDresult does NOT break the loop, causing infinite iteration - Bug 2: The
resultvariable holding the validated certificate is never reset tonullbetween loop iterations — stale data from a previous iteration is reused, preventing the loop from naturally terminating
- Bug 1: The loop condition only breaks on
- Cases affected:
- 2010685358 (Alarm.com) — 9.16P2 → 9.18 upgrade hang, expired ADC-HQ-SubCA intermediate cert
- 2010601779 (MSCI) — same defect, different customer
- The hang can consume 100% of a CPU core indefinitely — the only exit is to kill the process
- JEP.log will show a single "Fetching aliases of parent certificate for certificate of alias X" line repeated — this is the indicator of the infinite loop vs. a legitimate slow operation
- This is distinct from the "Response Code: 72" error in related KBs — that error occurs when keystoresetup encounters a corrupt keystore file and exits with an error. The infinite loop scenario never exits or produces an error code.
