Skip to main content
NetApp Knowledge Base

Site expansion stuck on Cassandra rebuild

Views:
40
Visibility:
Public
Votes:
0
Category:
storagegrid
Specialty:
sgrid
Last Updated:

Applies to

StorageGRID

Issue

Site expansion stuck as a result of expansion process picking a sub-optimal site for Cassandra nodetool rebuild.

Site expansion stuck on Cassandra rebuild

Cassandra system.log shows:

 INFO [RMI TCP Connection(22710)-127.0.0.1] 2024-03-07 05:30:08,752 RangeStreamer.java (line 127) Rebuild: range (7543867250329265734,7544102375703298946] exists on /<IP_3H> for keyspace accounts
 INFO [RMI TCP Connection(22710)-127.0.0.1] 2024-03-07 05:30:08,753 RangeStreamer.java (line 127) Rebuild: range (7543867250329265734,7544102375703298946] exists on /<IP_3I> for keyspace accounts
 WARN [RMI TCP Connection(22710)-127.0.0.1] 2024-03-07 05:30:08,753 StorageService.java (line 1503) Parameter error while rebuilding node
java.lang.IllegalStateException: Unable to find sufficient sources for streaming range (-6636921090683170249,-6636701783084431689] in keyspace accounts
at org.apache.cassandra.dht.RangeStreamer.handleSourceNotFound(RangeStreamer.java:306)
at org.apache.cassandra.dht.RangeStreamer.getRangeFetchMap(RangeStreamer.java:285)
at org.apache.cassandra.dht.RangeStreamer.addRanges(RangeStreamer.java:129)
at org.apache.cassandra.service.StorageService.rebuild(StorageService.java:1429)
at org.apache.cassandra.service.StorageService.rebuild(StorageService.java:1343)
at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:72)
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:276)
at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1468)
at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76)
at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1309)
at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1401)
at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:829)
at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:357)
at sun.rmi.transport.Transport$1.run(Transport.java:200)
at sun.rmi.transport.Transport$1.run(Transport.java:197)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:573)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:834)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:688)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:687)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
 INFO [RMI TCP Connection(22716)-127.0.0.1] 2024-03-07 05:30:20,425 StorageService.java (line 1402) starting rebuild for (All keyspaces), (All tokens), RESET_NO_SNAPSHOT,  included DCs: group20

The nodetool status command, run from expansion site (site 4 / group40) shows nodes in site 2 (group20) as "DS" (Down/Stopped)
However, nodes in site 2 (group20) are up and running, and accessible from other sites.

Datacenter: group10
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving/Stopped
--  Address       Load       Tokens       Owns (effective)  Host ID    Rack
UN  <IP_1A>       1.77 TiB   256          50.9%             <UUID_1A>    unknown
UN  <IP_1B>       1.52 TiB   256          49.0%             <UUID_1B>    unknown
UN  <IP_1C>       1.48 TiB   256          49.2%             <UUID_1C>    unknown
UN  <IP_1D>       1.68 TiB   256          49.5%             <UUID_1D>    unknown
UN  <IP_1E>       1.77 TiB   256          51.4%             <UUID_1E>    unknown
UN  <IP_1F>       1.56 TiB   256          49.9%             <UUID_1F>    unknown

Datacenter: group20
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving/Stopped
--  Address       Load       Tokens       Owns (effective)  Host ID    Rack
DS  <IP_2A>       3.06 TiB   256          100.0%            <UUID_2A>    unknown
DS  <IP_2B>       3.13 TiB   256          100.0%            <UUID_2B>    unknown
DS  <IP_2C>       2.99 TiB   256          100.0%            <UUID_2C>    unknown

Datacenter: group30
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving/Stopped
--  Address       Load       Tokens       Owns (effective)  Host ID    Rack
UN  <IP_3A>       1.18 TiB   256          33.4%             <UUID_3A>    unknown
UN  <IP_3B>       1.03 TiB   256          33.6%             <UUID_3B>    unknown
UN  <IP_3C>       1.1 TiB    256          34.8%             <UUID_3C>    unknown
UN  <IP_3D>       1.03 TiB   256          31.4%             <UUID_3D>    unknown
UN  <IP_3E>       1.02 TiB   256          34.1%             <UUID_3E>    unknown
UN  <IP_3F>      964.94 GiB  256          32.1%             <UUID_3F>    unknown
UN  <IP_3G>       1.02 TiB   256          34.7%             <UUID_3G>    unknown
UN  <IP_3H>      969.99 GiB  256          31.6%             <UUID_3H>    unknown
UN  <IP_3I>       1.1 TiB    256          34.3%             <UUID_3I>    unknown

Datacenter: group40
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving/Stopped
--  Address       Load       Tokens       Owns (effective)  Host ID    Rack
UN  <IP_4A>       10.35 GiB  256          37.1%             <UUID_4A>    unknown
UN  <IP_4B>       7.62 GiB   256          35.7%             <UUID_4B>    unknown
UN  <IP_4C>       4.83 GiB   256          39.9%             <UUID_4C>    unknown
UN  <IP_4D>       11.75 GiB  256          40.1%             <UUID_4D>    unknown
UN  <IP_4E>       10.69 GiB  256          38.8%             <UUID_4E>    unknown
UN  <IP_4F>       6.12 GiB   256          35.2%             <UUID_4F>    unknown
UN  <IP_4G>       8.06 GiB   256          37.0%             <UUID_4G>    unknown
UN  <IP_4H>       14.86 GiB  256          36.0%             <UUID_4H>    unknown

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.