Storage port goes down after node reboot
Applies to
- AFF A900
- Cisco Nexus 9336 switch with RCF1.11 installed
- ONTAP 9
- X91153A
Issue
- EMS raises the following alert after rebooting node and storage port goes down randomly.
Health Monitor process nchm:NoPathToNSMA_Alert…
- From
sysconfig -a,
System Storage Configuration changed fromQuad-Path HA
toSingle-Path HA
and e2a is auto-unknown-fd-down.
Node: node-1
NetApp Release 9.14.1P4: Thu Apr 18 11:59:10 EDT 2024
System ID: XXXXXXXXXX (node-1); partner ID: XXXXXXXXXX (node-2)
System Serial Number: XXXXXXXXXXXX (node-1)
System Rev: C0
System Storage Configuration: Single-Path HA
System ACP Connectivity: Inband Active
SAS2/SAS3 Mixed Stack Support: all
All-Flash Optimized: true
Capacity Optimized: false
All SAN Array: false
Backplane Part Number: 111-02392
Backplane Rev: C0
Backplane Serial Number: XXXXXXXXXXXX
slot 0: System Board 2.2 GHz (System Board XXIX B4)
Model Name: AFF-A900
Part Number: 111-04824
Revision: B4
Serial Number: XXXXXXXXXXXX
BIOS version: 18.9
Loader version: 8.1.0
Boot Flash: Primary
Processors: 64
・
・
・
slot 2: Dual 40G/100G/200G Ethernet Controller CX6
e2a MAC Address: XX:00:XX:11:XX:XX (auto-unknown-fd-down)
QSFP Vendor: Amphenol
QSFP Part Number: 112-00574
QSFP Serial Number: APFXXXXXXXXXXXX
e2b MAC Address: XX:22:XX:33:XX:XX (auto-unknown-fd-down)
QSFP Vendor:
QSFP Part Number:
QSFP Serial Number:
Device Type: CX6 PSID(NAP0000000018)
Firmware Version: 20.30.1004
Part Number: 111-04739
Hardware Revision: A2
- EMS logs show link up/down on storage ports e2a/e10b upon reboot.
[?] Thu May 09 10:12:45 +0900 [node-1: kernel: netif.linkDown:info]: Ethernet e2a: Link down, check cable.
[?] Thu May 09 10:23:07 +0900 [node-1: kernel: netif.linkDown:info]: Ethernet e10b: Link down, check cable.
[?] Thu May 09 10:12:45 +0900 [node-2: kernel: netif.linkDown:info]: Ethernet e2a: Link down, check cable.
[?] Thu May 09 10:23:07 +0900 [node-2: kernel: netif.linkDown:info]: Ethernet e10b: Link down, check cable.
- Switch logs show an error "unsupported" from the connected port, but Hardware Universe confirms that it is supported.
[2024-05-09 19:15:36.280] 2024 May 9 01:27:10 switch02 %ETHPORT-5-IF_HARDWARE: Interface Ethernet1/11, hardware type changed to 100G
[2024-05-09 19:15:36.280] 2024 May 9 01:27:10 switch02 %ETHPORT-3-IF_UNSUPPORTED_TRANSCEIVER: Transceiver on interface Ethernet1/11 is not supported
[2024-05-09 19:15:36.311] 2024 May 9 01:27:11 switch02 %ETHPORT-5-IF_HARDWARE: Interface Ethernet1/12, hardware type changed to 100G
[2024-05-09 19:15:36.311] 2024 May 9 01:27:11 switch02 %ETHPORT-3-IF_UNSUPPORTED_TRANSCEIVER: Transceiver on interface Ethernet1/12 is not supported
[2024-05-09 19:21:17.984] 2024-05-09T09:02:43.200667000+00:00 [M 1] [ethpm] E_DEBUG Ifindex (Ethernet1/12)0x1a000000, SFP security check: unsupported vendor id 0x54
[2024-05-09 19:21:18.001] 2024-05-09T09:02:37.697065000+00:00 [M 1] [ethpm] E_DEBUG Ifindex (Ethernet1/11)0x1a000000, SFP security check: unsupported vendor id 0x54
storage port show
shows that one of the storage ports goes down randomly after reboot.
Speed VLAN
Node Port Type Mode (Gb/s) State Status ID
------------------ ---- ----- ------- ------ -------- --------- ----
node-1
e10a ENET network - - - -
e10b ENET storage 100 enabled online 30
e11a ENET network - - - -
e11b ENET network - - - -
e2a ENET storage 0 enabled offline 30
e2b ENET network - - - -
node-2
e10a ENET network - - - -
e10b ENET storage 100 enabled online 30
e11a ENET network - - - -
e11b ENET network - - - -
e2a ENET storage 100 enabled online 30
e2b ENET network - - - -
12 entries were displayed.
- No errors/discards seen in ifstat but we see
Up to down
indicating 8 times flapping occured.
-- interface e2a (1 day, 20 hours, 12 minutes, 38 seconds) --
RECEIVE
Total frames: 5938 | Frames/second: 0 | Total bytes: 1721k
Bytes/second: 11 | Total errors: 0 | Errors/minute: 0
Total discards: 0 | Discards/minute: 0 | Multi/broadcast: 4773
Non-primary u/c: 0 | Errored frames: 0 | Unsupported Op: 0
CRC errors: 0 | Runt frames: 0 | Fragment: 0
Long frames: 0 | Jabber: 0 | Length errors: 0
Alignment errors: 0 | No buffer: 0 | Pause: 0
Jumbo: 0 | Error symbol: 0 | Bus overruns: 0
Queue drops: 0 | LRO segments: 0 | LRO bytes: 0
LRO6 segments: 887 | LRO6 bytes: 122k | Bad UDP cksum: 0
Bad UDP6 cksum: 0 | Bad TCP cksum: 0 | Bad TCP6 cksum: 0
Mcast v6 solicit: 3691 | Lagg errors: 0 | Lacp errors: 0
Lacp PDU errors: 0
TRANSMIT
Total frames: 2420 | Frames/second: 0 | Total bytes: 242k
Bytes/second: 2 | Total errors: 0 | Errors/minute: 0
Total discards: 0 | Queue overflow: 0 | Multi/broadcast: 950
Collisions: 0 | Pause: 0 | Jumbo: 0
Cfg Up to Downs: 2 | TSO segments: 0 | TSO bytes: 0
TSO6 segments: 0 | TSO6 bytes: 0 | HW UDP cksums: 0
HW UDP6 cksums: 105k | HW TCP cksums: 0 | HW TCP6 cksums: 1413
Mcast v6 solicit: 106k | Lagg drops: 0 | Lagg no buffer: 0
Lagg no entries: 0
DEVICE
Mcast addresses: 7 | Rx MBuf Sz: 9216
LINK INFO
Speed: 100G | Duplex: full | Flowcontrol: none
Media state: active | Up to downs: 8 | HW assist: 514k