Unexpected LIF down on RoCE port
Applies to
- ONTAP 9.13.1 and later
- NFS over RDMA / RoCE
- Mellanox/NVIDIA CX5 / CX6 / CX6-LX 10/25GbE or 40/100GbE NICs
Issue
- If more than 127 NFS data LIF are already on a single RoCE capable port:
- LIF failover or migrate may result in LIFs going into operational down state without error
- LIF create succeeds but LIF is operationally down and error is logged in vifmgr.log
clustershell::> network interface create -vserver vs0 -lif vs0_test -service-policy default-data-files -address 10.75.140.127 -netmask 255.255.255.0 -home-node node-02 -home-port e4a Info: LIF "vs0_test" on Vserver "vs0" was created successfully but could not be successfully configured on either its home port or any of its failover targets. The LIF's operational status will be reported as "down" until one or more failover targets becomes available. Use the "network interface show -vserver vs0 -lif vs0_test -failover" command to review the LIF's current failover configuration.
- Error in vifmgr.log
Example:
(03/26/2024 16:41:03): > [Net::LifStackAdapter::installLif] vserverId=3, lifId=1278, address=10.95.86.122, portName=e3a, lifProtocols=0x1
(03/26/2024 16:41:03): > [SkStackMgr::addLif] PARAM lifId 1278, portName e3a, address 10.95.86.122, ipspaceId 4294967295, vserverId 3, lifUuid 98cd9a48-ea28-11ee-ad09-d039eaa9ecf3, isMccRequest false, lifProtocols 0x001, serviceMask 0x000000013D000804, homeNode perfqa-vino-03
(03/26/2024 16:41:03): > [NbladeWriter::addLif] PARAM: lifId: 1278, address 10.95.86.122, netmask 255.255.255.0, ipspaceId: 4294967295, vserverId: 3, portName: e3a, isMccRequest: false, protocolMask: 00000001, serviceMask: 0x000000013D000804, homeNode^I: perfqa-vino-03(ccdcca33-ea25-11ee-ad09-d039eaa9ecf3)
(03/26/2024 16:41:03): > [NbladeWriter::nitroPcpRpcCall] procNum=3, isIdemp=false
(03/26/2024 16:41:03): > [DelayTracker::add_sample] ENTRY: object=nblade, delay_ms=53
(03/26/2024 16:41:03): < [DelayTracker::add_sample] EXIT: object=nblade, state=NORMAL
(03/26/2024 16:41:03): < [NbladeWriter::nitroPcpRpcCall] elapsed time: 0s)
(03/26/2024 16:41:03): [NbladeWriter::ScopedNitroRequest::sendRequest] RPC for procedure 3 completed, but returned error: NbladeWriter Error type unknown: 12046
(03/26/2024 16:41:03): < [NbladeWriter::ScopedNitroRequest::sendRequest]
(03/26/2024 16:41:03): < [NbladeWriter::addLif] retval: NbladeWriter Error type unknown: 12046
(03/26/2024 16:41:03): [SkStackMgr::addLif] Unexpected error adding the LIF to the stack: NbladeWriter Error type unknown: 12046
(03/26/2024 16:41:03): < [SkStackMgr::addLif] complete, returning Unexpected error "NbladeWriter Error type unknown: 12046" encountered as a result of adding the LIF.
(03/26/2024 16:41:03): [Net::LifStackAdapter::installLif] Failed to add the requested LIF: Unexpected error "NbladeWriter Error type unknown: 12046" encountered as a result of adding the LIF.
(03/26/2024 16:41:03): [Net::AbortableHandle::commit] Caught an unexpected exception: Unexpected error "NbladeWriter Error type unknown: 12046" encountered as a result of adding the LIF.
(03/26/2024 16:41:03): ERR{ commit() at src/framework/objects/base/AbortableHandle.cc:65 }
- Ports is on a NIC with RoCE offload capabilities (e.g. Mellanox/NVIDIA CX5 / CX6 / CX6-LX)
Example:
::> network port show -node node-02 -fields rdma-protocols node port rdma-protocols -------- ---- -------------- node-02 e0M - node-02 e1a roce node-02 e1b roce node-02 e3a roce node-02 e3b roce node-02 e3c roce node-02 e3d roce 7 entries were displayed.
- NFS server has RDMA enabled (default in ONTAP 9.10.1 and later)
Note: To determine, use vserver nfs show -fields rdma