Skip to main content
NetApp Knowledge Base

CONTAP-497562: Unexpected reboot when a large number of cluster sessions are rapidly created and destroyed

Views:
8
Visibility:
Public
Votes:
0
Category:
ontap-9
Specialty:
core
Last Updated:

Issue

A rare race condition in cluster connection handling can cause an unexpected reboot if a delayed response to a closed connection is matched to a new session with the same identifier. The likelihood of this issue increases during rapid session creation and teardown, such as after a network outage.
Example Backtrace:
#0 0xffffffff8a7fd701 in dumpcore (pmsg=0xffffffff93f0abc0 <kmod_dumper.buf> "page fault (supervisor read data, page not present) on VA 0x57649 cs:rip 0x20:0xffffffff844491b3 rflags 0x10206 in process CSM_BTLS on release 9.15.1P2 (C)", force=0) at prod/common/core/core_dumper.c:2066
#1 0xffffffff8a807f98 in kmod_dumper (kdpanic=<optimized out>) at prod/kmod/common/core/kmod_core.c:502
#2 0xffffffff80a0fddc in call_on_new_stack (func=0xfffffe01d1522c00, arg1=<optimized out>, new_sp=4) at src/core_stack.c:100
#3 dumpcore_on_dumpstack (dumpcoord=0xffffffff88f69290 <ontap_dumpcore>, arg1=<optimized out>) at src/core_stack.c:135
#4 0xffffffff804d5590 in doadump (textdump=<optimized out>) at ../../../../src/sys/kern/kern_shutdown.c:677
#5 0xffffffff804d53ee in kern_reboot (howto=260) at ../../../../src/sys/kern/kern_shutdown.c:854
#6 0xffffffff804d5e29 in vpanic (fmt=0xffffffff80dfa62d "page fault (%s %s %s, %s) on VA %#lx cs:rip %#lx:%#lx rflags %#lx", ap=0xfffffe0370a5ccd0) at ../../../../src/sys/kern/kern_shutdown.c:1513
#7 0xffffffff804d5652 in panic (fmt=0xffffffff93f02fd0 <trap_regs> "") at ../../../../src/sys/kern/kern_shutdown.c:1276
#8 0xffffffff80be4340 in trap_fatal (frame=0xfffffe0370a5ce90, eva=357961) at ../../../../src/sys/amd64/amd64/trap.c:1159
#9 0xffffffff804a73fd in trap (frame=<optimized out>) at ../../../../src/sys/amd64/amd64/trap.c:398
#10 0xffffffff80c6daef in <signal handler called> () at ../../../../src/sys/amd64/amd64/exception.S:317
#11 SpinNPSessionInt::isInterCluster (this=0x0) at src/include/Session/session.h:1469
#12 CsmActiveOpenUser::activeOpenRsp (this=0xfffff80c20a1a4a8, errorCode=CSM_EXISTS, connection=0xfffff81aaaafa828) at src/Csm/CsmActiveOpenUser.cc:349
#13 0xffffffff84465fb4 in CtActiveOpen::activeOpenRsp (this=0xfffff80938751828, errorCode=CSM_EXISTS) at src/Ct/CtConnection.cc:364
#14 0xffffffff84469ac6 in CtState::triggerActiveOpenErrorRsp (conn=<optimized out>, error=CSM_EXISTS, this=<optimized out>) at src/Ct/CtState.cc:151
#15 CtBindSent::rcv (this=<optimized out>, conn=<optimized out>, gbuf=<optimized out>, ctHdr=..., recordLen=20, gItemLen=<optimized out>, err=0xfffffe0370a5d1e4) at src/Ct/CtState.cc:336
#16 0xffffffff847166e3 in non-virtual thunk to CtConnection::loRcv(GbufTable_t*, GbufTable_t*) () at src/Ct/CtConnection.cc:1099
#17 0xffffffff8447a3f9 in CtLoSocket::Connection::receiveMbufs (this=0xfffff80f512e0828, m=0xfffff80c24281e00) at src/CtLo/CtLoSocket.cc:3611
#18 0xffffffff8446edc8 in CtLoBtlsFilter::recvProcB (this=0xfffff8089a10baa8, bytes=20) at src/CtLo/CtLoBtlsFilter.cc:212
#19 0xffffffff844deaae in BTLS::handleUpcalls (this=<optimized out>, upcalls=10) at src/BTLS.cc:351
#20 0xffffffff844e38f8 in TLS::ThreadPool::threadBody (this=0xfffff80245117100, me=0xfffff80297ea3300) at src/TLS.cc:558
#21 0xffffffff84714ac0 in util::Thread::_body (p=0xfffff80297ea3300) at common_util/Thread.cc:181
#22 0xffffffff805a8486 in fork_exit (callout=0xffffffff84714ab0 <util::Thread::_body(void*)>, arg=0xfffff80297ea3300, frame=0xfffffe0370a5d480) at ../../../../src/sys/kern/kern_fork.c:1315

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.