Skip to main content
NetApp Knowledge Base

AFF-A320: Takeover disabled (unsynchronized log) with Pause Frames and Rx Bus Overruns on Cluster/HA ports

Views:
120
Visibility:
Public
Votes:
0
Category:
ontap-9
Specialty:
HW
Last Updated:

Applies to

  • AFF A320
  • Cisco cluster switches
  • ONTAP 9
  • Priority Flow Control (PFC)

Issue

  • The below alerts are frequently seen in the event/EMS logs:

Tue Oct 31 10:17:51 +0530 [Node-01: irq191: e0d: mirror.stream.qp.error:debug]: params: {'mirror': 'HA Partner', 'qp_name': 'WAFL', 'error': 'NVMM_ERR_MIRROR_COMPLETION'}
Tue Oct 31 10:17:51 +0530 [Node-01: irq191: e0d: mirror.stream.qp.error:debug]: params: {'mirror': 'HA Partner', 'qp_name': 'WAFL', 'error': 'NVMM_ERR_STREAM'}
Tue Oct 31 10:17:51 +0530 [Node-01: mcc_cfd_rnic: mirror.stream.qp.error:debug]: params: {'mirror': 'HA Partner', 'qp_name': 'RAID', 'error': 'NVMM_ERR_STREAM'}
Tue Oct 31 10:17:51 +0530 [Node-01: mcc_cfd_rnic: mirror.stream.qp.error:debug]: params: {'mirror': 'HA Partner', 'qp_name': 'MISC', 'error': 'NVMM_ERR_STREAM'}
Tue Oct 31 10:17:51 +0530 [Node-01: nvmm_error: nvmm.mirror.offlined:debug]: params: {'mirror': 'HA_PARTNER'}
Tue Oct 31 10:17:52 +0530 [Node-01: cf_main: cf.fsm.takeoverByPartnerDisabled:debug]: Failover monitor: takeover of Node-01 by Node-02 disabled (unsynchronized log).
Tue Oct 31 10:17:54 +0530 [Node-01: nvmm_mirror_sync: nvmm.mirror.state.change:debug]: mirror of sysid 1, partner_type HA Partner, changed state from NVMM_MIRROR_LAYOUT_SYNCING to NVMM_MIRROR_LAYOUT_SYNCED and took 1 msecs.
Tue Oct 31 10:17:54 +0530 [Node-01: nvmm_mirror_sync: nvmm.mirror.state.change:debug]: mirror of sysid 1, partner_type HA Partner, changed state from NVMM_MIRROR_LAYOUT_SYNCED to NVMM_MIRROR_SYNCING_START and took 0 msecs.
Tue Oct 31 10:17:54 +0530 [Node-01: nvmm_mirror_sync: nvmm.mirror.aborting:debug]: mirror of sysid 1, partner_type HA Partner and mirror state NVMM_MIRROR_SYNCING_START is aborted because of reason NVMM_ERR_STREAM_MAP.
Tue Oct 31 10:17:54 +0530 [Node-01: nvmm_error: nvmm.mirror.aborting:debug]: mirror of sysid 1, partner_type HA Partner and mirror state NVMM_MIRROR_OFFLINE is aborted because of reason NVMM_ABORT_SYNCING_MIRROR.
Tue Oct 31 10:17:55 +0530 [Node-01: ib_cm_13: rdma.rlib.connected:debug]: wafl:HA:A QP is now connected.
Tue Oct 31 10:17:55 +0530 [Node-01: ib_cm_13: rdma.rlib.connected:debug]: raid:HA:A QP is now connected.
Tue Oct 31 10:17:55 +0530 [Node-01: ib_cm_13: rdma.rlib.connected:debug]: misc:HA:A QP is now connected.
Tue Oct 31 10:17:55 +0530 [Node-01: ib_cm_12: rdma.rlib.connected:debug]: wafl:HA:A QP is now connected.
Tue Oct 31 10:17:55 +0530 [Node-01: ib_cm_12: rdma.rlib.connected:debug]: raid:HA:A QP is now connected.
Tue Oct 31 10:17:55 +0530 [Node-01: ib_cm_12: rdma.rlib.connected:debug]: misc:HA:A QP is now connected.
Tue Oct 31 10:17:55 +0530 [Node-01: nvmm_mirror_sync: nvmm.mirror.state.change:debug]: mirror of sysid 1, partner_type HA Partner, changed state from NVMM_MIRROR_SYNCING_START to NVMM_MIRROR_CP1_START and took 27 msecs.
Tue Oct 31 10:17:55 +0530 [Node-01: nvmm_mirror_sync: nvmm.mirror.state.change:debug]: mirror of sysid 1, partner_type HA Partner, changed state from NVMM_MIRROR_CP1_START to NVMM_MIRROR_WAFL_INIT and took 8 msecs.
Tue Oct 31 10:17:55 +0530 [Node-01: nvmm_mirror_sync: nvmm.mirror.state.change:debug]: mirror of sysid 1, partner_type HA Partner, changed state from NVMM_MIRROR_WAFL_INIT to NVMM_MIRROR_CP2_FINISH and took 15 msecs.
Tue Oct 31 10:17:56 +0530 [Node-01: nvmm_mirror_sync: nvmm.mirror.state.change:debug]: mirror of sysid 1, partner_type HA Partner, changed state from NVMM_MIRROR_CP2_FINISH to NVMM_MIRROR_WAFL_HEADER and took 426 msecs.
Tue Oct 31 10:17:56 +0530 [Node-01: nvmm_mirror_sync: nvmm.mirror.state.change:debug]: mirror of sysid 1, partner_type HA Partner, changed state from NVMM_MIRROR_WAFL_HEADER to NVMM_MIRROR_SYNCING_OTHER and took 12 msecs.
Tue Oct 31 10:17:56 +0530 [Node-01: nvmm_mirror_sync: nvmm.mirror.state.change:debug]: mirror of sysid 1, partner_type HA Partner, changed state from NVMM_MIRROR_SYNCING_OTHER to NVMM_MIRROR_ONLINE and took 133 msecs.
Tue Oct 31 10:17:56 +0530 [Node-01: nvmm_mirror_sync: nvmm.mirror.onlined:debug]: params: {'mirror': 'HA_PARTNER'}
Tue Oct 31 10:17:57 +0530 [Node-01: cf_main: cf.fsm.takeoverByPartnerEnabled:debug]: Failover monitor: takeover of Node-01 by Node-02 enabled

  • Ifstat output shows pause frames and bus overruns:

::> system node run -node <node_name> -command ifstat <port>

   -- interface  e0a  (17 days, 9 hours, 10 minutes, 28 seconds) --              
   
   RECEIVE
   Total frames:    42258m | Frames/second:   28138  | Total bytes:       198t
   Bytes/second:      132m | Total errors:        0  | Errors/minute:       0 
   Total discards:   2906k | Discards/minute:   116  | Multi/broadcast: 16226k
   Non-primary u/c:     0  | Errored frames:      0  | Unsupported Op:      0 
   CRC errors:          0  | Runt frames:         0  | Fragment:            0 
   Long frames:         0  | Jabber:              0  | Length errors:       0 
   Alignment errors:    0  | No buffer:           0  | Pause:               0 
   Jumbo:           23587m | Error symbol:        0  | Bus overruns:     2906k
   Queue drops:         0  | LRO segments:    24581m | LRO bytes:         197t
   LRO6 segments:       0  | LRO6 bytes:          0  | Bad UDP cksum:       0 
   Bad UDP6 cksum:      0  | Bad TCP cksum:       0  | Bad TCP6 cksum:      0 
   Mcast v6 solicit:    0  | Lagg errors:         0  | Lacp errors:         0 
   Lacp PDU errors:     0 
   TRANSMIT
   Total frames:    34805m | Frames/second:   23176  | Total bytes:       123t
   Bytes/second:    82286k | Total errors:        0  | Errors/minute:       0 
   Total discards:      0  | Queue overflow:      0  | Multi/broadcast:  3130k
   Collisions:          0  | Pause:             523k | Jumbo:           32575m
   Cfg Up to Downs:     4  | TSO segments:     1744m | TSO bytes:         101t
   TSO6 segments:       0  | TSO6 bytes:          0  | HW UDP cksums:    1748k
   HW UDP6 cksums:      0  | HW TCP cksums:   25077m | HW TCP6 cksums:      0 
   Mcast v6 solicit:    0  | Lagg drops:          0  | Lagg no buffer:      0 
   Lagg no entries:     0 
   DEVICE
   Mcast addresses:     7  | Rx MBuf Sz:       9216 
   LINK INFO
   Speed:             100G | Duplex:            full | Flowcontrol:       none
   Media state:     active | Up to downs:          3 | HW assist:        5655 

  • PFC on switch ports connected to the cluster/HA ports on controller shows the operational status (Oper) as "Off":

SW-01# show interface priority-flow-control
slot  1
=======
============================================================
Port               Mode Oper(VL bmap)  RxPPP      TxPPP     
============================================================
Ethernet1/1        Auto Off           0          0          
Ethernet1/2        Auto Off           0          0          
Ethernet1/3        Auto Off           0          0          
Ethernet1/4        Auto Off           0          0          
Ethernet1/5        Auto Off           0          0          

SW-01# show interface priority-flow-control detail                                                               
Ethernet1/4                                                                                                      
Admin Mode: Auto                                                                                                 
Oper Mode: Off                                                                                                   
VL bitmap:                                                                                                       
Total Rx PFC Frames: 0                                                                                           
Total Tx PFC Frames: 0                                                                                           
---- ------------------------------------------------------------------------------------------------------------
|  Priority0  |  Priority1  |  Priority2  |  Priority3  |  Priority4  |  Priority5  |  Priority6  |  Priority7  |
-----------------------------------------------------------------------------------------------------------------
Rx  |0            |0            |0            |0            |0            |0            |0            |0         -----------------------------------------------------------------------------------------------------------------
Tx  |0            |0            |0            |0            |0            |0            |0            |0         
 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.