Skip to main content
NetApp Knowledge Base

Network interface failover policies - behavior and uses

Views:
8,595
Visibility:
Public
Votes:
22
Category:
ontap-9
Specialty:
core
Last Updated:

Applies to

ONTAP 9

Answer

Components of ONTAP interface failover

Broadcast domains, failover groups, and failover policies work together to determine which port will take over when the node or port on which a network interface is configured fails.

Broadcast Domain
  • A Broadcast Domain is an ONTAP grouping of ports on the same Ethernet broadcast domain.
    • Commonly referred to as a LAN or a VLAN, a broadcast domain provides data link layer connectivity between all ports in the domain.
  • Any Ethernet broadcast frame sent from one port is seen by all other ports of the domain.
    • By default, a network switch has one broadcast domain and all devices connected to it will be on the same broadcast domain.
    • Each broadcast domain should be grouped based on the VLAN of the switch port
  • By explicitly defining a broadcast domain grouping, ONTAP is able to validate reachability and prevent interfaces from migrating to an unreachable port.     
Failover Group
  • Failover groups define the ports within a broadcast domain that provide interface failover coverage for each other.
  • Each broadcast domain has one failover group that includes all its ports.
  • Failover of network interfaces utilizes GARP (Gratuitous ARP) to update other devices in the broadcast domain.
    • Failover Groups therefore cannot be larger than the broadcast domain, as GARP only functions within the same broadcast domain.
    • Failover Groups may be smaller than the broadcast domain, such as a failover group containing ports that have the same link speed within a broadcast domain.
Failover Policy
  • A failover policy dictates how a LIF uses the ports of a failover group when a node or port goes down.
  • Consider the failover policy as a type of filter that is applied to a failover group. The failover targets for a LIF (the set of ports to which a LIF can failover) is determined by applying the LIF's failover policy to the LIF's failover group in the broadcast domain. 
  • It dictates which among the targets within the failover group are selected as possible targets on a given LIF failover and the order in which the target list is traversed.
Failover policies:
  • local-only: This type denotes that the targets should be restricted to the local or home node of the interface. If you want to confirm that no I/O is accessed by using a remote path, NetApp recommends this type as a best practice.
  • sfo-partner-only: This type denotes that the target ports should be from the home node and its storage failover (SFO) partner only, excluding any other nodes in the cluster.
  • broadcast-domain-wide: This type denotes that all ports owned by the same broadcast domain are candidates for failover. If maximum LIF availability is the most important consideration, NetApp recommends this type as a best practice.
  • system-defined: This policy is the default for LIFs of type data. This policy prioritizes failover to ports that are not on the storage failover (SFO) partner.  In a 4-node cluster, for example, LIFs homed to ports on node 1 would fail over to node 3.
  • disabled: This type denotes that failover has been disabled. Do not disable failover capabilities unless it is your intent to disable failover.
Default policies by interface type:
LIF Type Default failover policy Description
Cluster local-only LIF fails over to ports on the same node only.
Intercluster local-only LIF fails over to ports on the same node only.
Node management local-only LIF fails over to ports on the same node only.
Cluster management broadcast-domain-wide LIF fails over to ports in the same broadcast domain
NAS data system-defined LIF fails over to one other node that is not the SFO partner.
SAN data (iSCSI LIF in ONTAP 9.11.1 or later on All-Flash SAN Array (ASA) platforms) sfo-partner-only LIF fails over to ports on its SFO partner.
SAN data (all other instances) disabled LIF does not fail over to another port.
BGP disabled LIF does not fail over to another port.

Additional Information

  • network interface create
    [-failover-policy {system-defined|local-only|sfo-partner-only|disabled|broadcast-domain-wide}] - Failover Policy

    Use this parameter to specify the failover policy for the LIF.

    • system-defined - The system determines appropriate failover targets for the LIF. The default behavior is that failover targets are chosen from the LIF’s current hosting node and also from one other non-partner node when possible.

    • local-only - The LIF fails over to a port on the local or home node of the LIF.

    • sfo-partner-only - The LIF fails over to a port on the home node or SFO partner only.

    • broadcast-domain-wide - The LIF fails over to a port in the same broadcast domain as the home port.

    • disabled - Failover is disabled for the LIF.

    The failover policy for cluster logical interfaces is local-only and cannot be changed. The default failover policy for data logical interfaces is system-defined. This value can be changed.

  • A failover group is automatically created when a new broadcast domain is created. This automatically created failover group is associated with the broadcast domain and contains all the ports that belong to the broadcast domain. The failover group is named the same as the broadcast domain.  Any subsequent failover groups associated with the broadcast domain can be created manually.
  • Manually created failover groups must be associated with a broadcast domain. All ports that are added to a failover group must be in the same broadcast domain. Removing a port from a broadcast domain removes the port from all failover groups in the broadcast domain. When the last port in the failover group is removed, the failover group is deleted. Deleting a broadcast domain automatically deletes all its failover groups.
  • When ports are added to or removed from a broadcast domain, they are also added to or removed from the broadcast domain's auto-generated failover group.
  • As failover policies ultimately determine LIF behavior, these must be carefully considered on a case-by-case basis if a non-default configuration is desired.
  • Bug ID 1295599: Manual takeover will return an error if lif migration fails.
  • Bug ID 1182625: Allow more failover targets on additional node in case of two down nodes scenario.
  • Wikipedia: Broadcast domain

 

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.