Skip to main content
NetApp Knowledge Base

Unable to giveback after motherboard replaced on FAS62xx/FAS80xx due to no disks attached

Views:
1,475
Visibility:
Public
Votes:
2
Category:
fas-systems
Specialty:
hw
Last Updated:

Applies to

  • FAS62xx
  • FAS80xx
  • AFF8080
  • Motherboard replacement
  • NVRAM replacement
  • non-partitioned drives

Issue

  • Unable to perform giveback due to no root volume found caused by HA interconnect ports down on takeover node.
WARNING: there do not appear to be any disks attached to the system.
No root volume found.
Rebooting... (press ctrl-c during boot to break reboot loop)
  • Interconnect links down on takeover node, it is likely the NVRAM card went into a hung state on takeover node. 
  • Controller-IOXM (CI) setup, physical ports show as down on both ends (loopback shows both interconnect links are down on the card).

 

  • After takeover,you might get the following messages from EMS on the takeover node


Wed Dec 06 12:37:27 GMT [n2: ib_nap_tx_2: connectx.shoutTimeout:debug]: Node advertisement send timed out on Port ib0b.
Wed Dec 06 12:37:29 GMT [n2: ib_nap_tx_1: connectx.shoutTimeout:debug]: Node advertisement send timed out on Port ib0a.
Wed Dec 06 12:37:37 GMT [n2: cfdisk_config: cf.diskinventory.sendFailed:debug]: params: {'errorCode': '1', 'reason': 'HA Interconnect down'}
Wed Dec 06 12:37:40 GMT [n2: ib_nap_tx_2: connectx.shout.portDisabled:critical]: Node advertisement send timed out on Port ib0b. ConnectX registers have been dumped to the /etc/ConnectX_regdump file.
Wed Dec 06 12:37:40 GMT [n2: mlx4_intr_handler: mlx4.link.statusChange:info]: InfiniBand port ib0b: Link down.
Wed Dec 06 12:37:41 GMT [n2: ib_nap_tx_2: ems.engine.suppressed:debug]: Event 'rdma.rdr.opFailed' suppressed 5 times in last 29618503 seconds.
Wed Dec 06 12:37:41 GMT [n2: ib_nap_tx_2: rdma.rdr.opFailed:debug]: RDR operation get_entity_property failed on error 7005.
Wed Dec 06 12:37:42 GMT [n2: ib_nap_tx_1: connectx.shout.portDisabled:critical]: Node advertisement send timed out on Port ib0a. ConnectX registers have been dumped to the /etc/ConnectX_regdump file.
Wed Dec 06 12:37:42 GMT [n2: mlx4_intr_handler: mlx4.link.statusChange:info]: InfiniBand port ib0a: Link down.
Wed Dec 06 12:37:44 GMT [n2: ib_mad2_wq: ems.engine.suppressed:debug]: Event 'ic.rdma.qpDisconnected' suppressed 4 times in last 29618502 seconds.
Wed Dec 06 12:37:44 GMT [n2: ib_mad2_wq: ic.rdma.qpDisconnected:debug]: kstat is disconnected.

  • When attempting to perform a giveback, the takeover node is not showing the partner as waiting for giveback:


Example:

7-mode: (partner in takeover, but not showing Waiting for Giveback):

n2(takeover)> cf status
n1 has taken over n2.

Cluster-mode:


n2
               n1  false    In takeover
               
n1
               n2  -        Unknown
 <---- Should be "Waiting for giveback"

  • Checking the interconnect, notice the interconnect shows down

7-mode:

n2*> ic status
        Link 0: down
        Link 1: down
        IC RDMA connection : down

 

Cluster-mode:


cluster::*> storage failover interconnect show-link local
Node          Port Number      Link State
------------------------------------------------------------------------------
n2
              0                down
              1                down
2 entries were displayed.

 

 

  • Physically if the controllers are in a controller-IOXM (CI) setup, the physical HA interconnect links will show no link light. If you do a loop back on the HA interconnect ports (cable from port 0 to port 1 on the same controller) while the down node is waiting for giveback, you get lights on the down controller, but no lights on the takeover node.

 

  • Try to manually bring the interconnect port up, but receive the following error


7-mode:


n2(takeover)*> ic link on 0
Error: Failed to perform requested operation on port 0 due to an internal error.
The port has been disabled. To re-enable the port, reboot the system.


 

Cluster-mode:


cluster::*> interconnect link on -node n2 -link 0
  (system ha interconnect link on)
Error: command failed: Failed to perform requested operation on link 0 due to
       an internal error. The port has been disabled. To re-enable the port,
       reboot the system.

 

  • If the above error is observed on the takeover node, there is a likely chance the NVRAM card went into a hung state.

 

 

 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.