Skip to main content
NetApp Knowledgebase

What are the common causes of High Availability "Takeover Impossible" events?

Views:
1,091
Visibility:
Public
Votes:
0
Category:
data-ontap-8
Specialty:
core
Last Updated:

Applies to

  • Clustered Data ONTAP 8
  • Data ONTAP 7 and earlier

Answer

Data ONTAP will not attempt a partner takeover when it can determine prior to the takeover attempt that the takeover will fail.

The resulting ASUP will automatically open a customer support case with "TAKEOVER IMPOSSIBLE" in the symptom field.  The case symptom text will be of the form:

CLTFLT: Cluster Notification from (PARTNER DOWN, TAKEOVER IMPOSSIBLE) ERROR

This article describes how to diagnose five common causes of takeover impossible events, and the actions required to correct the issues found.  The focus is on remote diagnosis from ASUP logs, primarily the MESSAGES and CLUSTER-MONITOR logs.

Approximately 70% of NetApp FAS3000, FAS3100 and FAS6000 systems are deployed as High Availability (HA) configurations.  Proper configuration of HA systems requires installing all necessary HA hardware, enabling cluster software licenses, setting HA related options, and more.

Data ONTAP will not attempt a partner takeover when it can determine prior to the takeover attempt that the takeover will fail.   The resulting ASUP will automatically open a customer support case with "TAKEOVER IMPOSSIBLE" in the symptom field. The case symptom text will be of the form:

CLTFLT: Cluster Notification from (PARTNER DOWN, TAKEOVER IMPOSSIBLE) ERROR 

Hourly alert messages will be posted to the console in many instances if the HA system is not configured properly and takeover by the partner system is not possible.  The messages will be of the form: "statd:ALERT Cluster is licensed but takeover of partner is disabled."  This article describes several common messages and actions required to correct the configuration issues.  The focus is on remote diagnosis from ASUP logs, primarily the MESSAGES and CLUSTER-MONITOR logs.

Five common types of statd:ALERT messages are described below:

Cluster is licensed but takeover of partner is disabled
The ASUP MESSAGES log will have hourly messages of the form:

[: statd:ALERT]: Cluster is licensed but takeover of partner is disabled.

The most common reason that systems report this message is the takeover functionality has been disabled manually by an operator.  An operator has entered cf disable from the console command line.  Entering cf enable will re-enable takeover and clear the hourly ALERT message.

To confirm that takeover has been disabled by the operator, check the ASUP CLUSTER-MONITOR log.  The fifth entry in the log begins with "takeoverByPartner".  If takeover has been manually disabled, the entry will contain the text string:

"NVRAM_DOWN,CLUSTER_DISABLE"

Example:
===== CLUSTER MONITOR =====
cf: Current monitor status (28Jun2009 00:00:02):
partner 'NetApp1' VIA Interconnect is up (link 0 up, link 1 up)
state UP, time 90788045660, event CHECK_FSM, elem ChkMbValid (12)
mirrorConsistencyRequired TRUE
takeoverByPartner 0x2041
      <<< look here

Cluster is licensed but takeover of partner is disabled due to reason: interconnect error

The ASUP MESSAGES log will have hourly entries of the form: 

[: statd:ALERT]: Cluster is licensed but takeover of partner is disabled due to reason : interconnect error 

The interconnect link status is shown as the second line in the CLUSTER-MONITOR log.  In the examples below, the interconnect is not present or both links are down.

===== CLUSTER MONITOR =====
cf: Current monitor status (28Jun2009 00:00:01):
partner 'NetApp1', Interconnect not present  <<< look here

===== CLUSTER MONITOR =====
cf: Current monitor status (28Jun2009 00:00:02):
partner 'NetApp1', VIA Interconnect is down (link 0 down, link 1 down)   
<<< look here

Another common abnormal condition shows the "partner" as "unknown".

===== CLUSTER MONITOR =====
cf: Current monitor status (28Jun2009 00:00:02):
partner 'unknown', VIA Interconnect is down (link 0 down, link 1 down)
   <<< look here

The corrective action required is to verify the interconnect cables/links are connected and active.  When the partner is reported as 'unknown', verify that the partner filer/platform is present and active.  If no partner system is present, then likely the system was once part of a HA pair, and was improperly reconfigured as standalone.  See the documentation (Removing an active/active configuration) for more information about how to properly split a cluster and clear the 'unknown' partner messages.

Cluster is licensed but takeover of partner is disabled due to reason: partner mailbox disks not accessible or invalid

The ASUP MESSAGES log will have hourly entries of the form:

[ statd:ALERT]: Cluster is licensed but takeover of partner is disabled due to reason : partner mailbox disks not accessible or invalid

The status of the mailbox disks is shown approximately 15 lines from the top of the CLUSTER-MONITOR log.  A normal entry will show the disk paths for all of the mailbox disks.  An example below is provided for illustration.  The disk identifiers (4a.17, 4a.29, 8b.34, 8b.35 in the example) will vary depending on the system configuration.

mailbox disks:
Disk 4a.17 is a primary mailbox disk
Disk 4a.29 is a primary mailbox disk
Disk 8b.34 is a partner mailbox disk
Disk 8b.35 is a partner mailbox disk

Two common abnormal conditions:

  1. No partner disk entries.  Instead, log contains 'No partner disks attached!'

    mailbox disks:
    Disk 8a.20 is a local mailbox disk
    Disk 8a.19 is a local mailbox disk
    No partner disks attached! 
       <<< look here 

  2. Some partner disks shown with path as '?.?'.

    mailbox disks:
    Disk 4a.17 is a primary mailbox disk
    Disk 4a.29 is a primary mailbox disk
    Disk ?.? is a partner mailbox disk
        <<< look here
    Disk ?.? is a partner mailbox disk  <<< look here

To correct these fault conditions, first check that the partner system is present and active.  Then check the FC adapters in the filers/platforms and shelf cabling to each of the mailbox disk shelves.

If the problem continues, check if 'partner-sysid' shows a correct partner-sysid.

CFE> printenv
Variable Name        Value
-------------------- --------------------------------------------------
BOOT_CONSOLE         rlm0a
fcal-host-id         7
partner-sysid        0101183784


Then attempt the following steps, this should be done on both HA controllers:

  1. Disable clustering by typing cf disable.
  2. Reboot
  3. Press Ctrl-C during the boot sequence to go to the special boot menu.
  4. Select option 5 to go into Maintenance mode.
  5. Type: mailbox destroy local
  6. Type: mailbox destroy partner
  7. Type: halt
  8. Reboot the head.
  9. Type: cf enable
  10. Type: ic stats error -v

Note: Possible stale mailbox instance on local/remote site results with the following message on the storage system: [ds-dt01terra: fmmbx_instanceWorke:info]: missing lock disks, possibly stale mailbox. After reassigning the drives during an upgrade, no mailbox disks were visible. Missing mailbox disks. The local and remote instance of mailbox disks need to be re-initialized. Perform the steps 1 to 10 above, on both the nodes.

A useful tool to help in the diagnosis of disk pathing issues is Config Advisor (WireGauge renamed), which is available from the NOW ToolChest.

WireGauge can be run remotely by entering an ASUP ID.  (Enter an ASUP ID by selecting "File > Get ASUP").  Comparing WireGauge results from both HA partners will often indicate the cause of the mailbox disk path issue.

Cluster is licensed but takeover of partner is disabled due to reason: CFO not licensed

The ASUP MESSAGES log will have hourly entries of the form:

[: statd:ALERT]: Cluster is licensed but takeover of partner is disabled due to reason : CFO not licensed

If the CLUSTER-MONITOR log contains the following message, the cluster license is not enabled.

===== CLUSTER MONITOR =====
Clustered failover is now unlicensed
cf: option 'monitor' requires that cluster licensing is enabled

Re-enabling the cluster license will clear this error.  See Enabling licenses for more details.

A common cause is the system was once part of a High Availability pair, and was improperly reconfigured as standalone.  See Removing an active/active configuration for more information about how to properly split a High Availability pair.

Cluster is licensed but takeover of partner is disabled due to reason: unsynchronized log

The ASUP MESSAGES log will have hourly entries of the form:

[: statd:ALERT]: Cluster is licensed but takeover of partner is disabled due to reason : unsynchronized log 

This is usually associated with problems with the interconnect cabling.

First, verify that the interconnect cables are not cross-connected.  On FAS3000 and FAS6000 systems, the two interconnect ports are on the NVRAM card.  Verify Port 0 is connected to Port 0, and Port 1 to Port 1, on each system in the HA pair.

In some instances, momentarily unplugging and reseating each interconnect cable will clear this error.  Breaking and reestablishing the interconnect link will force the logs to re-synchronize. 

Changes in High Availability 'Takeover Impossible' events in Data ONTAP 8.x

  1. There are additional EMS messages that describe reason for takeover impossible. The messages start with 'ha.takeoverImp'.
    ha.takeoverImpIC:warning]: Takeover of the partner node is impossible because of interconnect errors.
    ha.takeoverImpNotDef:warning]: Takeover of the partner node is impossible due to reason status of backup mailbox is uncertain.
    ha.takeoverImpNotDef:warning]: Takeover of the partner node is impossible due to reason partner booting.
    ha.takeoverImpUnsync:warning]: Takeover of the partner node is impossible due to lack of partner NVRAM data.
    ha.takeoverImpNotDef:warning]: Takeover of the partner node is impossible due to reason partner halted in notakeover mode.
  2. The hourly takeover disabled message changed in Data ONTAP 8. See the following link:
    Syslog Translator
     
    • Controller Failover is licensed but takeover of partner is disabled due to reason : Controller Failover not initialized
    • Controller Failover is licensed but takeover of partner is disabled due to reason : Controller Failover not licensed
    • Controller Failover is licensed but takeover of partner is disabled due to reason : interconnect error
    • Controller Failover is licensed but takeover of partner is disabled due to reason : local halt in progress
    • Controller Failover is licensed but takeover of partner is disabled due to reason : NVRAM size mismatch
    • Controller Failover is licensed but takeover of partner is disabled due to reason : partner booting
    • Controller Failover is licensed but takeover of partner is disabled due to reason : partner halted in notakeover mode
    • Controller Failover is licensed but takeover of partner is disabled due to reason : partner mailbox disks not accessible or invalid
    • Controller Failover is licensed but takeover of partner is disabled due to reason : status of backup mailbox is uncertain
    • Controller Failover is licensed but takeover of partner is disabled due to reason : takeover disabled by partner
    • Controller Failover is licensed but takeover of partner is disabled due to reason : unsynchronized log
    • Controller Failover is licensed but takeover of partner is disabled due to reason : version mismatch
    • Controller Failover is licensed but takeover of partner is disabled due to reason : waiting for partner to recover
    • Controller Failover is licensed but takeover of partner is disabled: partner identification not accessible or invalid
  3. The name of one of the ASUP logs changed in 8.X to CF-MONITOR.

 Related Links:

Additional Information

additionalInformation_text