7 Replies Latest reply on May 11, 2006 7:00 AM by Deepa Iyer

    ReplicationException when starting a new replicated cache se

    Eric Piper Newbie

      Hi,

      I get this error when a new treecache service is started and after it has joined an existing service: (started on the same machine as the other service)

      org.jboss.cache.ReplicationException: rsp=sender=xxx.xxx.xxx.xx:1166, retval=null, received=false, suspected=false
      at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:3290)
      at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:3311)
      at org.jboss.cache.interceptors.ReplicationInterceptor.handleReplicatedMethod(ReplicationInterceptor.java:122)
      at org.jboss.cache.interceptors.ReplicationInterceptor.invoke(ReplicationInterceptor.java:87)
      at org.jboss.cache.TreeCache.invokeMethod(TreeCache.java:4124)
      at org.jboss.cache.TreeCache.put(TreeCache.java:2868)
      at org.jboss.cache.TreeCache.put(TreeCache.java:2809)

      The cluster configuration for the service looks like this:

      UDP(ip_mcast=true;mcast_addr=229.1.2.10;mcast_port=5556;ip_ttl=2;loopback=true;mcast_recv_buf_size=80000;mcast_send_buf_size=150000;ucast_recv_buf_size=80000;ucast_send_buf_size=150000;enable_bundling=false;bind_to_all_interfaces=true):PING(down_thread=false;num_initial_members=3;timeout=1500;up_thread=false):MERGE2(down_thread=false;max_interval=10000;min_interval=5000):FD(max_tries=2;timeout=1000):VERIFY_SUSPECT(down_thread=false;timeout=1500):pbcast.NAKACK(down_thread=false;gc_lag=50;max_xmit_size=60000;retransmit_timeout=300,600,1200,2400,4800;use_mcast_xmit=true):UNICAST(down_thread=false;timeout=300,600,1200,2400,3600):pbcast.STABLE(desired_avg_gossip=5000;down_thread=false;max_bytes=250000;stability_delay=1000):pbcast.GMS(down_thread=false;join_retry_timeout=2000;join_timeout=3000;print_local_addr=true;shun=true):FC(down_thread=false;max_credits=1000000;min_threshold=0.10):FRAG(down_thread=false;frag_size=60000;up_thread=true):COMPRESS(compression_level=3;down_thread=false;min_size=500;up_thread=true):pbcast.STATE_TRANSFER(down_thread=false;up_thread=false)

        • 1. Re: ReplicationException when starting a new replicated cach
          Bela Ban Master

          Is this from an official release ? Because I fixed this in CVS head ca 2 weeks ago

          • 2. Re: ReplicationException when starting a new replicated cach
            Eric Piper Newbie

            This is from the jboss cache release 1.2.4_SP1_FINAL and jgroups 2.2.8

            /Eric

            • 3. Re: ReplicationException when starting a new replicated cach
              sree praveen Newbie

              I am using jboss cache release 1.2.4_SP1_FINAL and jgroups 2.2.9 (beta).

              I faced the same problem many times, here are couple of scenarios

              1) When ever I start all the containers at the same time I get this ReplicationException. I could stop this exception by changing the way the containers are deployed by adding some delay between the deployments.
              2) If I take down one container and start it again after they accumulate a cache state of 10MB or more. I see a warning message ?WARN STABLE: ResumeTask resumed message garbage collection - this should be done by a RESUME_STABLE event; check why this event was not received (or increase max_suspend_time for large state transfers)? and it throws a ReplicationException. After, going through forums I found a post (http://www.jboss.com/index.html?module=bb&op=viewtopic&t=72678) where bela suggested to use new version of jgroups and replace FD with FD_SOCK. None of the changes suggested helped me in resolving this ReplicationException problem.
              3) Now that I have changed from FD to FD_SOCK, If I take down one container the view in all the other containers does not reflect for some time (because it?s a socket failure detection the timeout is in minutes) and if I start the same container again the new view contains more nodes than the actual containers. And any replication message to these old containers throw ReplicationException.

              Any help in resolving these problems would be greatly appreciated.
              I see that some of the problems regarding ReplicationException are fixed, if so when will the fixed version be relased?

              • 4. Re: ReplicationException when starting a new replicated cach
                Bela Ban Master

                This has been fixed. It will be part of 1.3, which will be released by end of March

                • 5. Re: ReplicationException when starting a new replicated cach
                  Eric Piper Newbie

                  If I need the these changes earlier can I get it from the CVS repository?

                  Do you suggest us to use FD_SOCK instead of FD in the jgroups config?

                  • 6. Re: ReplicationException when starting a new replicated cach
                    Bela Ban Master

                    Yes, 1.3 is fairly stable and we will release a beta this or next week. Here's info (http://wiki.jboss.org/wiki/Wiki.jsp?page=FDVersusFD_SOCK) on FD_SOCK versus FD

                    • 7. Re: ReplicationException when starting a new replicated cach
                      Deepa Iyer Newbie

                      We are using JCache jboss-cache-dist-1.3.0.SP1.zip / JGroups JGroups-2.2.9.1.bin.zip and Websphere App server.
                      We have defined 3 caches E1,E2,E3 on startup. The server is started and the cache is build. When a new treecache service is

                      started on another machine with the same config file, we would expect it to identify the caches already setup, but it doesnt

                      and throws the following exception. Also we have tried to use <FD_SOCK> without luck. After a few tries the network doesnt

                      identify the other service at all.

                      Here is the configuration setting:


                      <UDP mcast_addr="228.1.2.3" mcast_port="48866"
                      ip_ttl="64" ip_mcast="true"
                      mcast_send_buf_size="150000" mcast_recv_buf_size="80000"
                      ucast_send_buf_size="150000" ucast_recv_buf_size="80000"
                      loopback="true" enable_bundling="false" bind_to_all_interfaces="true"/>
                      <PING timeout="2000" num_initial_members="3"
                      up_thread="false" down_thread="false"/>
                      <MERGE2 min_interval="10000" max_interval="20000"/>
                      <FD_SOCK down_thread="false" up_thread="false"/>
                      <FD timeout="100000" max_tries="3" shun="true" down_thread="false" up_thread="false"/>

                      <VERIFY_SUSPECT timeout="1500"
                      up_thread="false" down_thread="false"/>
                      <pbcast.NAKACK gc_lag="50" retransmit_timeout="600,1200,2400,4800"
                      max_xmit_size="8192" up_thread="false" down_thread="false"/>
                      <UNICAST timeout="600,1200,2400" window_size="100" min_threshold="10"
                      down_thread="false"/>
                      <pbcast.STABLE desired_avg_gossip="20000"
                      up_thread="false" down_thread="false"/>
                      <FRAG frag_size="8192"
                      down_thread="false" up_thread="false"/>
                      <pbcast.GMS join_timeout="5000" join_retry_timeout="2000"
                      shun="true" print_local_addr="true"/>
                      <pbcast.STATE_TRANSFER up_thread="true" down_thread="true"/>


                      Exception :
                      org.jboss.cache.ReplicationException: rsp=sender=xxx.xxx.xxx.xx:1166, retval=null, received=false, suspected=false
                      at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:3290)
                      at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:3311)
                      at org.jboss.cache.interceptors.ReplicationInterceptor.handleReplicatedMethod(ReplicationInterceptor.java:122)
                      at org.jboss.cache.interceptors.ReplicationInterceptor.invoke(ReplicationInterceptor.java:87)
                      at org.jboss.cache.TreeCache.invokeMethod(TreeCache.java:4124)
                      at org.jboss.cache.TreeCache.put(TreeCache.java:2868)
                      at org.jboss.cache.TreeCache.put(TreeCache.java:2809)