Motherboard status degraded after ONTAP upgrade
Applies to
- ONTAP 9
- Cluster Network Switch
Issue
- Health check after ONTAP upgrade shows Motherboard status degraded.
::> system health status show
Status
---------------
degraded
::> system health subsystem show
Subsystem Health
----------------- ------------------
SAS-connect ok
Environment ok
Memory ok
Service-Processor ok
Switch-Health ok
CIFS-NDO ok
Motherboard degraded
IO ok
MetroCluster ok
MetroCluster_Node ok
FHM-Switch ok
FHM-Bridge ok
SAS-connect_Cluster ok
13 entries were displayed.
- We see NodeIfInErrorsWarnAlert health alert reported for e0c on nodes 1 and 2.
::> system health alert show
Node: node2
Alert ID: NodeIfInErrorsWarnAlert
Resource: e0c
Severity: Major
Indication Time: Thu Mar 27 18:33:07 2025
Suppress: false
Acknowledge: false
Probable Cause: The percentage of inbound packet errors of node
"node2" on interface "e0c" is above the
warning threshold.
Possible Effect: Communication from this node to the cluster might be
degraded
Corrective Actions: 1) Migrate any cluster LIF that uses this connection to another port connected to a cluster switch.
For example, if cluster LIF "clus1" is on port e0a and the other LIF is on e0b,
run the following command to move "clus1" to e0b:
"network interface migrate -vserver vs1 -lif clus1 -sourcenode node1 -destnode node1 -dest-port e0b"
2) Replace the network cable with a known-good cable.
If errors are corrected, stop. No further action is required.
Otherwise, continue to Step 3.
3) Move the network cable to another port on the node (if available).
Migrate the cluster LIF to the new port.
If errors are corrected, contact technical support to troubleshoot the original node port.
Otherwise, continue to Step 4.
4) Move the network cable to another available cluster switch port.
Migrate the cluster LIF back to the original port.
If errors are corrected, contact technical support to troubleshoot the original switch port.
If errors persist, contact technical support for
further assistance.
Node: node1
Alert ID: NodeIfInErrorsWarnAlert
Resource: e0c
Severity: Major
Indication Time: Thu Mar 27 18:33:01 2025
Suppress: false
Acknowledge: false
Probable Cause: The percentage of inbound packet errors of node
"node1" on interface "e0c" is above the
warning threshold.
Possible Effect: Communication from this node to the cluster might be
degraded
Corrective Actions: 1) Migrate any cluster LIF that uses this connection to another port connected to a cluster switch.
For example, if cluster LIF "clus1" is on port e0a and the other LIF is on e0b,
run the following command to move "clus1" to e0b:
"network interface migrate -vserver vs1 -lif clus1 -sourcenode node1 -destnode node1 -dest-port e0b"
2) Replace the network cable with a known-good cable.
If errors are corrected, stop. No further action is required.
Otherwise, continue to Step 3.
3) Move the network cable to another port on the node (if available).
Migrate the cluster LIF to the new port.
If errors are corrected, contact technical support to troubleshoot the original node port.
Otherwise, continue to Step 4.
4) Move the network cable to another available cluster switch port.
Migrate the cluster LIF back to the original port.
If errors are corrected, contact technical support to troubleshoot the original switch port.
If errors persist, contact technical support for
further assistance.
2 entries were displayed
- NodeIfInErrorsWarnAlert errors are reported due to increase in CRC errors on cluster port e0c of node node1 and node2.
EMS
The percentage of inbound packet errors of node "node1" on interface "e0c" is above the warning threshold.
The percentage of inbound packet errors of node "node2" on interface "e0c" is above the warning threshold.
[node1: vifmgr: vifmgr.cluscheck.ctdpktloss:alert]: Continued packet loss when pinging from cluster lif node2_clus2 (node node2) to cluster lif node1 (node node1).
[node1: vifmgr: callhome.clus.net.degraded:alert]: Call home for CLUSTER NETWORK DEGRADED: Large MTU Packet Loss - Ping failures detected between node2 ( 169.XXX.XX.217 ) on node2 and node1_clus1 ( 169.XXX.XX.173 ) on node1
ifconfig -v
node2
-- interface e0c (16 hours, 4 minutes, 52 seconds) --
RECEIVE
Total frames: 354m | Frames/second: 6130 | Total bytes: 499g
Bytes/second: 8631k | Total errors: 32176k | Errors/minute: 33348
Total discards: 4 | Discards/minute: 0 | Multi/broadcast: 1545k
Non-primary u/c: 0 | Errored frames: 0 | Unsupported Op: 0
CRC errors: 28157k | Runt frames: 0 | Fragment: 111k
Long frames: 221 | Jabber: 4 | Length errors: 0
Alignment errors: 0 | No buffer: 0 | Pause: 0
Jumbo: 36975k | Error symbol: 3907k | Bus overruns: 4
Queue drops: 0 | LRO segments: 291m | LRO bytes: 481g
LRO6 segments: 0 | LRO6 bytes: 0 | Bad UDP cksum: 0
Bad UDP6 cksum: 0 | Bad TCP cksum: 0 | Bad TCP6 cksum: 0
Mcast v6 solicit: 0 | Lagg errors: 0 | Lacp errors: 0
Lacp PDU errors: 0
node1
-- interface e0c (8 hours, 25 minutes, 21 seconds) --
RECEIVE
Total frames: 157m | Frames/second: 5185 | Total bytes: 130g
Bytes/second: 4288k | Total errors: 14338k | Errors/minute: 28374
Total discards: 0 | Discards/minute: 0 | Multi/broadcast: 114k
Non-primary u/c: 0 | Errored frames: 0 | Unsupported Op: 0
CRC errors: 13012k | Runt frames: 0 | Fragment: 59563
Long frames: 365 | Jabber: 0 | Length errors: 0
Alignment errors: 0 | No buffer: 0 | Pause: 0
Jumbo: 5423k | Error symbol: 1266k | Bus overruns: 0
Queue drops: 0 | LRO segments: 112m | LRO bytes: 121g
LRO6 segments: 0 | LRO6 bytes: 0 | Bad UDP cksum: 0
Bad UDP6 cksum: 0 | Bad TCP cksum: 0 | Bad TCP6 cksum: 0
Mcast v6 solicit: 0 | Lagg errors: 0 | Lacp errors: 0
Lacp PDU errors: 0