Data access problems and csm.badconnection events after upgrade to 9.12.1
Applies to
- ONTAP 9.12.1
- Cluster Session Manager (CSM)
- Cluster Peering Policy
Issue
After upgrade to 9.12.1:
- During last giveback, the state from
storage failover show
output remains inwaiting for partner lock synchronization
for very long. - SAN connections can be intermittent
- All NAS clients that access data via a LIF on a node that is different from the node owing the addressed volume, fail to access data.
- NFS
showmount
using a data LIF on one node and vserver root volume on another node timeout. - NFS mounts using a data LIF on one node and
junction-path
pointing towards a volume on a different node fail/timeout. - Output from
event log show
indicates bad cross-node communication and csm errors, examples:
3/31/2023 11:40:12 node-02 DEBUG hamsg.connectFail: remoteID="9b824c9e921411ed9866d039eaa500fc", status="10", scope="5", scope_err="68"
3/31/2023 11:40:04 node-02 ALERT csm.badConnection: ONTAP received a CSM connection with unrecognizable content at local address 169.254.87.48 local port 7700, from remote address 169.254.86.166 remote port 53376, via IPspace -2.
3/31/2023 11:40:04 node-02 DEBUG ems.engine.suppressed: Event 'csm.badConnection' suppressed 11935 times in last 121 seconds.
3/31/2023 11:39:54 node-01 DEBUG ems.engine.suppressed: Event 'csm.stickyState' suppressed 4 times in last 259 seconds.
3/31/2023 11:39:32 node-02 DEBUG csm.stickyState: localBladeUUID="node-02:dblade", remoteBladeUUID="9b824c9e-9214-11ed-9866-d039eaa500fc", uniquifier="-87206161", filename="src/Csm/CSMImpl.cc", lineno="1145"DEBUG csm.stickyState: localBladeUUID="node-02:dblade", remoteBladeUUID="9b824c9e-9214-11ed-9866-d039eaa500fc", uniquifier="-87206161", filename="src/Csm/CSMImpl.cc", lineno="1145"