Skip to main content
NetApp Knowledge Base

AIQUM upgrade hangs during certificate processing due to expired intermediate CA certificate in truststore

Views:
80
Visibility:
Public
Votes:
0
Category:
active-iq-unified-manager
Specialty:
om
Last Updated:

Applies to

  • Active IQ Unified Manager 9.6+ (AIQUM)
  • All OS platforms
  • Upgrades between any major/minor versions

Issue

  • AIQUM upgrade hangs indefinitely during the "Setting Keystore and Truststore using keystoresetup JEP" phase
  • The upgrade process appears frozen — no progress, no error displayed on screen
  • CPU usage remains elevated (Java process spinning)
  • jep.log shows repeated processing of the same certificate alias with no termination:
    • Linux/OVA: /var/log/ocum/jep.log
    • Windows: \ProgramData\NetApp\OnCommandAppData\ocum\log\jep.log
INFO [main] [com.netapp.jeps.keystoresetup.Main] Fetching aliases of parent certificate for certificate of alias <alias_name>
  • The log entry repeats for the same alias indefinitely without progressing to the next certificate or completing
  • If the upgrade is killed and retried, it hangs at the same point

Cause

  • The AIQUM upgrade process runs keystoresetup.jar which iterates through all certificates in server.truststore to rebuild the certificate chain hierarchy
  • The internal method AliasUtils.fetchAliasesOfCertificateChain() validates each certificate and follows the chain to its root CA
  • When an expired intermediate CA certificate is encountered, the validation returns an EXPIRED status but the loop logic only breaks on NO_TRUST_ANCHOR — it continues iterating indefinitely on expired certificates
  • This is a known defect (burt 1519525 / CPE-276) that has not been fixed
  • The expired certificate was likely imported into the truststore when a cluster with a certificate signed by that intermediate CA was added to AIQUM, and the intermediate CA has since expired

Solution

  1. Revert to the pre-upgrade VM snapshot, or kill the hung upgrade process:
    • Linux/OVA: kill $(pgrep -f keystoresetup)
    • Windows: End the java.exe process running keystoresetup via Task Manager
  2. List all certificates in the truststore and identify expired ones:

    Linux/OVA (AIQUM below 9.14):
    keytool -list -v -keystore /opt/netapp/essentials/jboss/server/onaro/cert/server.truststore -storepass changeit | grep -A2 "Valid from"

    Linux/OVA (AIQUM 9.14+):
    . /opt/netapp/essentials/bin/commonfunctions.sh && set_javahome
    CS=/opt/netapp/essentials/jboss/standalone/data/jbossCredStore.cs
    TOKEN=$(grep 'truststore.token' /opt/netapp/essentials/conf/server.properties | sed 's/^truststore.token=//')
    PASS=$(java -jar /opt/netapp/ocum/lib/jeps/credentialstore.jar -p "$CS" -a jboss -t "$TOKEN")
    keytool -list -v -keystore /opt/netapp/essentials/jboss/server/onaro/cert/server.truststore -storepass "$PASS" | grep -B5 -A2 "Valid from"


    Windows (AIQUM below 9.14):
    keytool -list -v -keystore "C:\Program Files\NetApp\ocum\jboss\server\onaro\cert\server.truststore" -storepass changeit | findstr /C:"Valid from"

    Windows (AIQUM 9.14+):
    Extract the truststore password from the credential store using the same credentialstore.jar approach with Windows paths:
    "C:\Program Files\NetApp\ocum\jboss\bin\jboss-cli.bat" or consult What are the notable log files and their respective locations for AIQUM for path details
  3. Identify expired intermediate CA certificates (not self-signed, not leaf certs) — look for certificates where:
    • The Valid from ... until end date is in the past
    • The certificate is a CA certificate (Issuer differs from Subject, or has BasicConstraints: CA:TRUE)
  4. Remove each expired intermediate CA certificate by alias:
    Linux/OVA:
    keytool -delete -alias "<alias_name>" -keystore /opt/netapp/essentials/jboss/server/onaro/cert/server.truststore -storepass "$PASS"
    Windows:
    keytool -delete -alias "<alias_name>" -keystore "C:\Program Files\NetApp\ocum\jboss\server\onaro\cert\server.truststore" -storepass "<password>"
    Note: Only remove expired intermediate CA certificates, not the root CA or leaf certificates for active clusters
  5. Verify the removal:
    Linux/OVA:
    keytool -list -keystore /opt/netapp/essentials/jboss/server/onaro/cert/server.truststore -storepass "$PASS" | grep -i "<alias_name>"
    Windows:
    keytool -list -keystore "C:\Program Files\NetApp\ocum\jboss\server\onaro\cert\server.truststore" -storepass "<password>" | findstr /I "<alias_name>"
    Should return no results
  6. Retry the upgrade

Note: After upgrade completes, if any cluster that was using the removed intermediate CA has since renewed its certificate with a valid chain, re-adding or rediscovering the cluster will import the new valid chain automatically.

Partner Notes

  • This issue can be identified pre-upgrade by checking for expired certificates in the truststore before starting the upgrade process
  • Pre-flight check: keytool -list -v -keystore server.truststore -storepass <password> 2>/dev/null | grep -A1 "until:" | grep "until:" | while read line; do exp_date=$(echo "$line" | sed 's/.*until: //'); if [ $(date -d "$exp_date" +%s) -lt $(date +%s) ]; then echo "EXPIRED: $line"; fi; done

Additional Information

Internal Notes

  • Defect: burt 1519525 / CPE-276 (2022, never fixed)
  • Root cause in code: AliasUtils.fetchAliasesOfCertificateChain() in keystoresetup.jar:
    • Bug 1: The loop condition only breaks on PKIXRevocationChecker.Option.NO_TRUST_ANCHOR — an EXPIRED result does NOT break the loop, causing infinite iteration
    • Bug 2: The result variable holding the validated certificate is never reset to null between loop iterations — stale data from a previous iteration is reused, preventing the loop from naturally terminating
  • Cases affected:
    • 2010685358 (Alarm.com) — 9.16P2 → 9.18 upgrade hang, expired ADC-HQ-SubCA intermediate cert
    • 2010601779 (MSCI) — same defect, different customer
  • The hang can consume 100% of a CPU core indefinitely — the only exit is to kill the process
  • JEP.log will show a single "Fetching aliases of parent certificate for certificate of alias X" line repeated — this is the indicator of the infinite loop vs. a legitimate slow operation
  • This is distinct from the "Response Code: 72" error in related KBs — that error occurs when keystoresetup encounters a corrupt keystore file and exits with an error. The infinite loop scenario never exits or produces an error code.

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.