Is this from an official release ? Because I fixed this in CVS head ca 2 weeks ago
This is from the jboss cache release 1.2.4_SP1_FINAL and jgroups 2.2.8
I am using jboss cache release 1.2.4_SP1_FINAL and jgroups 2.2.9 (beta).
I faced the same problem many times, here are couple of scenarios
1) When ever I start all the containers at the same time I get this ReplicationException. I could stop this exception by changing the way the containers are deployed by adding some delay between the deployments.
2) If I take down one container and start it again after they accumulate a cache state of 10MB or more. I see a warning message ?WARN STABLE: ResumeTask resumed message garbage collection - this should be done by a RESUME_STABLE event; check why this event was not received (or increase max_suspend_time for large state transfers)? and it throws a ReplicationException. After, going through forums I found a post (http://www.jboss.com/index.html?module=bb&op=viewtopic&t=72678) where bela suggested to use new version of jgroups and replace FD with FD_SOCK. None of the changes suggested helped me in resolving this ReplicationException problem.
3) Now that I have changed from FD to FD_SOCK, If I take down one container the view in all the other containers does not reflect for some time (because it?s a socket failure detection the timeout is in minutes) and if I start the same container again the new view contains more nodes than the actual containers. And any replication message to these old containers throw ReplicationException.
Any help in resolving these problems would be greatly appreciated.
I see that some of the problems regarding ReplicationException are fixed, if so when will the fixed version be relased?
This has been fixed. It will be part of 1.3, which will be released by end of March
If I need the these changes earlier can I get it from the CVS repository?
Do you suggest us to use FD_SOCK instead of FD in the jgroups config?
We are using JCache jboss-cache-dist-1.3.0.SP1.zip / JGroups JGroups-220.127.116.11.bin.zip and Websphere App server.
We have defined 3 caches E1,E2,E3 on startup. The server is started and the cache is build. When a new treecache service is
started on another machine with the same config file, we would expect it to identify the caches already setup, but it doesnt
and throws the following exception. Also we have tried to use <FD_SOCK> without luck. After a few tries the network doesnt
identify the other service at all.
Here is the configuration setting:
<UDP mcast_addr="18.104.22.168" mcast_port="48866"
loopback="true" enable_bundling="false" bind_to_all_interfaces="true"/>
<PING timeout="2000" num_initial_members="3"
<MERGE2 min_interval="10000" max_interval="20000"/>
<FD_SOCK down_thread="false" up_thread="false"/>
<FD timeout="100000" max_tries="3" shun="true" down_thread="false" up_thread="false"/>
<pbcast.NAKACK gc_lag="50" retransmit_timeout="600,1200,2400,4800"
max_xmit_size="8192" up_thread="false" down_thread="false"/>
<UNICAST timeout="600,1200,2400" window_size="100" min_threshold="10"
<pbcast.GMS join_timeout="5000" join_retry_timeout="2000"
<pbcast.STATE_TRANSFER up_thread="true" down_thread="true"/>
org.jboss.cache.ReplicationException: rsp=sender=xxx.xxx.xxx.xx:1166, retval=null, received=false, suspected=false