7 Replies Latest reply on May 11, 2006 7:00 AM by deepa2006

ReplicationException when starting a new replicated cache se

ecan72 Feb 24, 2006 3:38 AM

Hi,

I get this error when a new treecache service is started and after it has joined an existing service: (started on the same machine as the other service)

org.jboss.cache.ReplicationException: rsp=sender=xxx.xxx.xxx.xx:1166, retval=null, received=false, suspected=false
at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:3290)
at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:3311)
at org.jboss.cache.interceptors.ReplicationInterceptor.handleReplicatedMethod(ReplicationInterceptor.java:122)
at org.jboss.cache.interceptors.ReplicationInterceptor.invoke(ReplicationInterceptor.java:87)
at org.jboss.cache.TreeCache.invokeMethod(TreeCache.java:4124)
at org.jboss.cache.TreeCache.put(TreeCache.java:2868)
at org.jboss.cache.TreeCache.put(TreeCache.java:2809)

The cluster configuration for the service looks like this:

UDP(ip_mcast=true;mcast_addr=229.1.2.10;mcast_port=5556;ip_ttl=2;loopback=true;mcast_recv_buf_size=80000;mcast_send_buf_size=150000;ucast_recv_buf_size=80000;ucast_send_buf_size=150000;enable_bundling=false;bind_to_all_interfaces=true):PING(down_thread=false;num_initial_members=3;timeout=1500;up_thread=false):MERGE2(down_thread=false;max_interval=10000;min_interval=5000):FD(max_tries=2;timeout=1000):VERIFY_SUSPECT(down_thread=false;timeout=1500):pbcast.NAKACK(down_thread=false;gc_lag=50;max_xmit_size=60000;retransmit_timeout=300,600,1200,2400,4800;use_mcast_xmit=true):UNICAST(down_thread=false;timeout=300,600,1200,2400,3600):pbcast.STABLE(desired_avg_gossip=5000;down_thread=false;max_bytes=250000;stability_delay=1000):pbcast.GMS(down_thread=false;join_retry_timeout=2000;join_timeout=3000;print_local_addr=true;shun=true):FC(down_thread=false;max_credits=1000000;min_threshold=0.10):FRAG(down_thread=false;frag_size=60000;up_thread=true):COMPRESS(compression_level=3;down_thread=false;min_size=500;up_thread=true):pbcast.STATE_TRANSFER(down_thread=false;up_thread=false)

1. Re: ReplicationException when starting a new replicated cach

belaban Feb 24, 2006 7:22 AM (in response to ecan72)

Is this from an official release ? Because I fixed this in CVS head ca 2 weeks ago
Actions
2. Re: ReplicationException when starting a new replicated cach

ecan72 Feb 24, 2006 8:01 AM (in response to ecan72)

This is from the jboss cache release 1.2.4_SP1_FINAL and jgroups 2.2.8

/Eric
Actions
3. Re: ReplicationException when starting a new replicated cach

sreepraveen_2000 Feb 24, 2006 3:39 PM (in response to ecan72)

I am using jboss cache release 1.2.4_SP1_FINAL and jgroups 2.2.9 (beta).

I faced the same problem many times, here are couple of scenarios

1) When ever I start all the containers at the same time I get this ReplicationException. I could stop this exception by changing the way the containers are deployed by adding some delay between the deployments.
2) If I take down one container and start it again after they accumulate a cache state of 10MB or more. I see a warning message ?WARN STABLE: ResumeTask resumed message garbage collection - this should be done by a RESUME_STABLE event; check why this event was not received (or increase max_suspend_time for large state transfers)? and it throws a ReplicationException. After, going through forums I found a post (http://www.jboss.com/index.html?module=bb&op=viewtopic&t=72678) where bela suggested to use new version of jgroups and replace FD with FD_SOCK. None of the changes suggested helped me in resolving this ReplicationException problem.
3) Now that I have changed from FD to FD_SOCK, If I take down one container the view in all the other containers does not reflect for some time (because it?s a socket failure detection the timeout is in minutes) and if I start the same container again the new view contains more nodes than the actual containers. And any replication message to these old containers throw ReplicationException.

Any help in resolving these problems would be greatly appreciated.
I see that some of the problems regarding ReplicationException are fixed, if so when will the fixed version be relased?
Actions
4. Re: ReplicationException when starting a new replicated cach

belaban Feb 27, 2006 3:00 AM (in response to ecan72)

This has been fixed. It will be part of 1.3, which will be released by end of March
Actions
5. Re: ReplicationException when starting a new replicated cach

ecan72 Feb 27, 2006 3:05 AM (in response to ecan72)

If I need the these changes earlier can I get it from the CVS repository?

Do you suggest us to use FD_SOCK instead of FD in the jgroups config?
Actions
6. Re: ReplicationException when starting a new replicated cach

belaban Feb 27, 2006 3:47 AM (in response to ecan72)

Yes, 1.3 is fairly stable and we will release a beta this or next week. Here's info (http://wiki.jboss.org/wiki/Wiki.jsp?page=FDVersusFD_SOCK) on FD_SOCK versus FD
Actions
7. Re: ReplicationException when starting a new replicated cach

deepa2006 May 11, 2006 7:00 AM (in response to ecan72)

We are using JCache jboss-cache-dist-1.3.0.SP1.zip / JGroups JGroups-2.2.9.1.bin.zip and Websphere App server.
We have defined 3 caches E1,E2,E3 on startup. The server is started and the cache is build. When a new treecache service is

started on another machine with the same config file, we would expect it to identify the caches already setup, but it doesnt

and throws the following exception. Also we have tried to use <FD_SOCK> without luck. After a few tries the network doesnt

identify the other service at all.

Here is the configuration setting:

<UDP mcast_addr="228.1.2.3" mcast_port="48866"
ip_ttl="64" ip_mcast="true"
mcast_send_buf_size="150000" mcast_recv_buf_size="80000"
ucast_send_buf_size="150000" ucast_recv_buf_size="80000"
loopback="true" enable_bundling="false" bind_to_all_interfaces="true"/>
<PING timeout="2000" num_initial_members="3"
up_thread="false" down_thread="false"/>
<MERGE2 min_interval="10000" max_interval="20000"/>
<FD_SOCK down_thread="false" up_thread="false"/>
<FD timeout="100000" max_tries="3" shun="true" down_thread="false" up_thread="false"/>

<VERIFY_SUSPECT timeout="1500"
up_thread="false" down_thread="false"/>
<pbcast.NAKACK gc_lag="50" retransmit_timeout="600,1200,2400,4800"
max_xmit_size="8192" up_thread="false" down_thread="false"/>
<UNICAST timeout="600,1200,2400" window_size="100" min_threshold="10"
down_thread="false"/>
<pbcast.STABLE desired_avg_gossip="20000"
up_thread="false" down_thread="false"/>
<FRAG frag_size="8192"
down_thread="false" up_thread="false"/>
<pbcast.GMS join_timeout="5000" join_retry_timeout="2000"
shun="true" print_local_addr="true"/>
<pbcast.STATE_TRANSFER up_thread="true" down_thread="true"/>

Exception :
org.jboss.cache.ReplicationException: rsp=sender=xxx.xxx.xxx.xx:1166, retval=null, received=false, suspected=false
at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:3290)
at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:3311)
at org.jboss.cache.interceptors.ReplicationInterceptor.handleReplicatedMethod(ReplicationInterceptor.java:122)
at org.jboss.cache.interceptors.ReplicationInterceptor.invoke(ReplicationInterceptor.java:87)
at org.jboss.cache.TreeCache.invokeMethod(TreeCache.java:4124)
at org.jboss.cache.TreeCache.put(TreeCache.java:2868)
at org.jboss.cache.TreeCache.put(TreeCache.java:2809)
Actions

Go to original post