Skip to main content
NetApp Knowledgebase

Unable to giveback after motherboard replaced on FAS62xx/FAS80xx due to no disks attached

Views:
462
Visibility:
Public
Votes:
1
Category:
fas-systems
Specialty:
hw
Last Updated:

Applies to

  • FAS62xx
  • FAS80xx
  • Motherboard replacement
  • NVRAM replacement
  • non-partitioned drives

Issue

  • Unable to perform giveback due to no root volume found caused by HA interconnect ports down on takeover node.
WARNING: there do not appear to be any disks attached to the system.
No root volume found.
Rebooting... (press ctrl-c during boot to break reboot loop)
  • Interconnect links down on takeover node, it is likely the NVRAM card went into a hung state on takeover node. 
  • Controller-IOXM (CI) setup, physical ports show as down on both ends (loopback shows both interconnect links are down on the card).

 

  • After takeover,you might get the following messages from EMS on the takeover node


Wed Dec 06 12:37:27 GMT [n2: ib_nap_tx_2: connectx.shoutTimeout:debug]: Node advertisement send timed out on Port ib0b.
Wed Dec 06 12:37:29 GMT [n2: ib_nap_tx_1: connectx.shoutTimeout:debug]: Node advertisement send timed out on Port ib0a.
Wed Dec 06 12:37:37 GMT [n2: cfdisk_config: cf.diskinventory.sendFailed:debug]: params: {'errorCode': '1', 'reason': 'HA Interconnect down'}
Wed Dec 06 12:37:40 GMT [n2: ib_nap_tx_2: connectx.shout.portDisabled:critical]: Node advertisement send timed out on Port ib0b. ConnectX registers have been dumped to the /etc/ConnectX_regdump file.
Wed Dec 06 12:37:40 GMT [n2: mlx4_intr_handler: mlx4.link.statusChange:info]: InfiniBand port ib0b: Link down.
Wed Dec 06 12:37:41 GMT [n2: ib_nap_tx_2: ems.engine.suppressed:debug]: Event 'rdma.rdr.opFailed' suppressed 5 times in last 29618503 seconds.
Wed Dec 06 12:37:41 GMT [n2: ib_nap_tx_2: rdma.rdr.opFailed:debug]: RDR operation get_entity_property failed on error 7005.
Wed Dec 06 12:37:42 GMT [n2: ib_nap_tx_1: connectx.shout.portDisabled:critical]: Node advertisement send timed out on Port ib0a. ConnectX registers have been dumped to the /etc/ConnectX_regdump file.
Wed Dec 06 12:37:42 GMT [n2: mlx4_intr_handler: mlx4.link.statusChange:info]: InfiniBand port ib0a: Link down.
Wed Dec 06 12:37:44 GMT [n2: ib_mad2_wq: ems.engine.suppressed:debug]: Event 'ic.rdma.qpDisconnected' suppressed 4 times in last 29618502 seconds.
Wed Dec 06 12:37:44 GMT [n2: ib_mad2_wq: ic.rdma.qpDisconnected:debug]: kstat is disconnected.

  • When attempting to perform a giveback, the takeover node is not showing the partner as waiting for giveback:


Example:

7-mode: (partner in takeover, but not showing Waiting for Giveback):

n2(takeover)> cf status
n1 has taken over n2.

Cluster-mode:


n2
               n1  false    In takeover
               
n1
               n2  -        Unknown
 <---- Should be "Waiting for giveback"

  • Checking the interconnect, notice the interconnect shows down

7-mode:

n2*> ic status
        Link 0: down
        Link 1: down
        IC RDMA connection : down

 

Cluster-mode:


cluster::*> storage failover interconnect show-link local
Node          Port Number      Link State
------------------------------------------------------------------------------
n2
              0                down
              1                down
2 entries were displayed.

 

 

  • Physically if the controllers are in a controller-IOXM (CI) setup, the physical HA interconnect links will show no link light. If you do a loop back on the HA interconnect ports (cable from port 0 to port 1 on the same controller) while the down node is waiting for giveback, you get lights on the down controller, but no lights on the takeover node.

 

  • Try to manually bring the interconnect port up, but receive the following error


7-mode:


n2(takeover)*> ic link on 0
Error: Failed to perform requested operation on port 0 due to an internal error.
The port has been disabled. To re-enable the port, reboot the system.


 

Cluster-mode:


cluster::*> interconnect link on -node n2 -link 0
  (system ha interconnect link on)
Error: command failed: Failed to perform requested operation on link 0 due to
       an internal error. The port has been disabled. To re-enable the port,
       reboot the system.

 

  • If the above error is observed on the takeover node, there is a likely chance the NVRAM card went into a hung state.

 

CUSTOMER EXCLUSIVE CONTENT

Registered NetApp customers get unlimited access to our dynamic Knowledge Base.

New authoritative content is published and updated each day by our team of experts.

Current Customer or Partner?

Sign In for unlimited access

New to NetApp?

Learn more about our award-winning Support