11 Replies Latest reply on Nov 2, 2007 3:17 PM by Tony Lauro

    Initial State Transfer Failure

    Tony Lauro Newbie

      Hi - I'm new to JBossCache and we are having an initial state transfer failure. Once the clustered caches are up - they talk fine and new objects or updates replicate perfectly. The problem is that if one box goes down, when it comes back up, the cache coming up doesn't get the old data from the other node. New data replicates or any objects that get updated. We have a pretty strict firewall setup with 48866 open for multicast traffic. We're using JBoss 4.0.4 and JBossCache 2.x.

      Here's our config and some relevant logging. I wonder if our Eviction Policy is the problem.

      <?xml version="1.0" encoding="UTF-8"?>

      <!-- ===================================================================== -->
      <!-- Service to deploy and configure a replicating cache -->
      <!-- This is the default file used for installations. -->
      <!-- ===================================================================== -->


      <!-- ==================================================================== -->
      <!-- Defines TreeCache configuration -->
      <!-- ==================================================================== -->



      jboss:service=Naming
      jboss:service=TransactionManager
      jboss.aop:service=AspectDeployer

      <!--
      Configure the TransactionManager
      -->
      org.jboss.cache.BatchModeTransactionManagerLookup

      <!--
      Isolation level : SERIALIZABLE
      REPEATABLE_READ (default)
      READ_COMMITTED
      READ_UNCOMMITTED
      NONE
      -->
      REPEATABLE_READ

      <!--
      Valid modes are LOCAL
      REPL_ASYNC
      REPL_SYNC
      INVALIDATION_ASYNC
      INVALIDATION_SYNC
      -->
      REPL_ASYNC

      <!--
      Just used for async repl: use a replication queue
      -->
      false

      <!--
      Replication interval for replication queue (in ms)
      -->
      0

      <!--
      Max number of elements which trigger replication
      -->
      0

      <!-- Name of cluster. Needs to be the same for all clusters, in order
      to find each other
      -->
      TreeCache-Cluster

      <!-- JGroups protocol stack properties. Can also be a URL,
      e.g. file:/home/bela/default.xml

      -->



      <UDP mcast_addr="228.1.2.3" mcast_port="48866"
      ip_ttl="64" ip_mcast="true"
      mcast_send_buf_size="150000" mcast_recv_buf_size="80000"
      ucast_send_buf_size="150000" ucast_recv_buf_size="80000"
      loopback="false" bind_addr="192.168.181.190" />
      <PING timeout="2000" num_initial_members="3"
      up_thread="false" down_thread="false"
      initial_hosts="192.168.181.191[48866]"/>
      <MERGE2 min_interval="10000" max_interval="20000"/>
      <FD_SOCK/>
      <VERIFY_SUSPECT timeout="1500"
      up_thread="false" down_thread="false"/>
      <pbcast.NAKACK gc_lag="50" retransmit_timeout="600,1200,2400,4800"
      max_xmit_size="8192" up_thread="false" down_thread="false"/>
      <UNICAST timeout="600,1200,2400" window_size="100" min_threshold="10"
      down_thread="false"/>
      <pbcast.STABLE desired_avg_gossip="20000"
      up_thread="false" down_thread="false"/>
      <FRAG frag_size="8192"
      down_thread="false" up_thread="false"/>
      <pbcast.GMS join_timeout="5000" join_retry_timeout="2000"
      shun="true" print_local_addr="true"/>
      <pbcast.STATE_TRANSFER up_thread="true" down_thread="true"/>



      <!--
      Whether or not to fetch state on joining a cluster
      NOTE this used to be called FetchStateOnStartup and has been renamed to be more descriptive.
      -->
      true

      <!--
      The max amount of time (in milliseconds) we wait until the
      initial state (ie. the contents of the cache) are retrieved from
      existing members in a clustered environment
      -->
      20000

      <!--
      Number of milliseconds to wait until all responses for a
      synchronous call have been received.
      -->
      20000

      <!-- Max number of milliseconds to wait for a lock acquisition -->
      15000

      <!-- Eviction policy defines how long nodes in defined regions stay -->
      <!-- in cache before being removed. In this case a cache entry can -->
      <!-- stay in cache w/out any activity for the specified number of -->
      <!-- seconds. BAM will update timestamp on the record every time -->
      <!-- an interium accounting record is received. Likewise the record -->
      <!-- will be removed when a accounting stop record is received. -->
      <!-- So, this policy is used to remove nodes in cases where an -->
      <!-- accounting stop record is lost. -->
      <!-- -->
      <!-- NOTE: the timeToLiveSeconds must be greater than the time -->
      <!-- used to send interium accouning records. Also /SessionKeys -->
      <!-- and Sessions must have a timeToLiveSeconds > /SessionKeyIndex -->
      <!-- as the indexs have to be removed before the acutal data to -->
      <!-- avoid a race condition. -->



      5

      <!-- See notes above -->

      0
      3900


      <!-- See notes above, time must be > than /SessionKeyuIndex -->

      0
      3901


      <!-- See notes above, time must be > than /sessionKeyIndex -->

      0
      3902


      <!-- Default Eviction policy for all other Nodes in cache -->

      0
      0




      <!--
      Indicate whether to use marshalling or not. Set this to true if you are running under a scoped
      class loader, e.g., inside an application server. Default is "false".
      -->
      false
      false



      2007-11-01 08:39:39,009 DEBUG [org.jgroups.conf.ClassConfigurator] mapping is:
      1: class org.jgroups.stack.IpAddress
      2: class org.jgroups.protocols.CAUSAL$CausalHeader
      3: class org.jgroups.protocols.FD$FdHeader
      4: class org.jgroups.protocols.FD_PID$FdHeader
      5: class org.jgroups.protocols.FD_PROB$FdHeader
      6: class org.jgroups.protocols.FD_SOCK$FdHeader
      7: class org.jgroups.protocols.FragHeader
      8: class org.jgroups.protocols.MERGE$MergeHeader
      9: class org.jgroups.protocols.NakAckHeader
      10: class org.jgroups.protocols.PARTITIONER$PartitionerHeader
      11: class org.jgroups.protocols.PerfHeader
      12: class org.jgroups.protocols.PIGGYBACK$PiggybackHeader
      13: class org.jgroups.protocols.PingHeader
      14: class org.jgroups.protocols.TcpHeader
      15: class org.jgroups.protocols.TOTAL$Header
      16: class org.jgroups.protocols.TOTAL_OLD$TotalHeader
      17: class org.jgroups.protocols.TOTAL_TOKEN$TotalTokenHeader
      18: class org.jgroups.protocols.TOTAL_TOKEN$RingTokenHeader
      19: class org.jgroups.protocols.TunnelHeader
      20: class org.jgroups.protocols.UdpHeader
      21: class org.jgroups.protocols.UNICAST$UnicastHeader
      22: class org.jgroups.protocols.VERIFY_SUSPECT$VerifyHeader
      23: class org.jgroups.protocols.WANPIPE$WanPipeHeader
      24: class org.jgroups.protocols.pbcast.GMS$GmsHeader
      25: class org.jgroups.protocols.pbcast.NakAckHeader
      26: class org.jgroups.protocols.pbcast.PbcastHeader
      27: class org.jgroups.protocols.pbcast.STABLE$StableHeader
      28: class org.jgroups.protocols.pbcast.STATE_TRANSFER$StateHeader
      29: class org.jgroups.protocols.SMACK$SmackHeader
      30: class org.jgroups.Message
      31: class org.jgroups.View
      32: class org.jgroups.ViewId
      33: class org.jgroups.util.List
      34: interface org.jgroups.Address
      35: class [Lorg.jgroups.Address;
      36: class org.jgroups.protocols.PingRsp
      37: class [Ljava.lang.Object;
      38: class java.util.Vector
      39: class org.jgroups.protocols.pbcast.JoinRsp
      40: class org.jgroups.protocols.pbcast.Digest
      41: class java.util.Hashtable
      50: class org.jgroups.service.lease.LeaseRequestHeader
      51: class org.jgroups.service.lease.LeaseResponseHeader
      52: class org.jgroups.service.lease.DenyResponseHeader
      53: class org.jgroups.protocols.COMPRESS$CompressHeader
      54: class org.jgroups.protocols.FC$FcHeader
      55: class org.jgroups.protocols.WanPipeAddress

      2007-11-01 08:39:39,024 DEBUG [org.jgroups.protocols.AUTOCONF] frag_size=64000
      2007-11-01 08:39:39,061 DEBUG [org.jgroups.protocols.pbcast.GMS] changed role to org.jgroups.protocols.pbcast.ClientGmsImpl
      2007-11-01 08:39:39,091 DEBUG [org.jboss.cache.marshall.VersionAwareMarshaller] Initialised with version 1.4.0 and versionInt 14
      2007-11-01 08:39:39,091 DEBUG [org.jboss.cache.marshall.VersionAwareMarshaller] Using default marshaller class org.jboss.cache.marshall.TreeCacheMarshaller140
      2007-11-01 08:39:39,091 DEBUG [org.jboss.cache.aop.TreeCacheAop] Created jboss.cache:service=TomcatClusteringCache
      2007-11-01 08:39:39,091 DEBUG [org.jboss.cache.aop.TreeCacheAop] Starting jboss.cache:service=TomcatClusteringCache
      2007-11-01 08:39:39,173 DEBUG [org.jgroups.protocols.UDP] creating sockets and starting threads
      2007-11-01 08:39:39,173 INFO [org.jgroups.protocols.UDP] sockets will use interface 192.168.181.191
      2007-11-01 08:39:39,187 INFO [org.jgroups.protocols.UDP] socket information:
      local_addr=192.168.181.191:37990, mcast_addr=228.1.2.3:48866, bind_addr=/192.168.181.191, ttl=64
      sock: bound to 192.168.181.191:37990, receive buffer size=80000, send buffer size=131071
      mcast_recv_sock: bound to 192.168.181.191:48866, send buffer size=131071, receive buffer size=80000
      mcast_send_sock: bound to 192.168.181.191:37991, send buffer size=131071, receive buffer size=80000
      2007-11-01 08:39:39,188 DEBUG [org.jgroups.protocols.PING] adding my address (192.168.181.191:37990) to initial_hosts; initial_hosts=[[192.168.181.190:48866 ] [192.168.181.191:37990 ] ]
      2007-11-01 08:39:39,190 INFO [STDOUT]
      -------------------------------------------------------
      GMS: address is 192.168.181.191:37990
      -------------------------------------------------------
      2007-11-01 08:39:39,190 DEBUG [org.jgroups.protocols.UDP] created unicast receiver thread
      2007-11-01 08:39:41,193 DEBUG [org.jgroups.protocols.PingWaiter] initial mbrs are []
      2007-11-01 08:39:41,194 DEBUG [org.jgroups.protocols.pbcast.ClientGmsImpl] initial_mbrs are []
      2007-11-01 08:39:41,194 DEBUG [org.jgroups.protocols.pbcast.ClientGmsImpl] no initial members discovered: creating group as first member
      2007-11-01 08:39:41,200 DEBUG [org.jgroups.protocols.pbcast.GMS] [local_addr=192.168.181.191:37990] view is [192.168.181.191:37990|0] [192.168.181.191:37990]
      2007-11-01 08:39:41,204 INFO [org.jboss.cache.TreeCache] TreeCache local address is 192.168.181.191:37990
      2007-11-01 08:39:41,206 DEBUG [org.jgroups.protocols.pbcast.STATE_TRANSFER] GET_STATE: first member (no state)
      2007-11-01 08:39:41,206 INFO [org.jboss.cache.TreeCache] viewAccepted(): [192.168.181.191:37990|0] [192.168.181.191:37990]
      2007-11-01 08:39:41,212 DEBUG [org.jboss.cache.TreeCache] transferred state is null (may be first member in cluster)
      2007-11-01 08:39:41,216 DEBUG [org.jgroups.protocols.pbcast.GMS] 192.168.181.191:37990 changed role to org.jgroups.protocols.pbcast.CoordGmsImpl
      2007-11-01 08:39:41,216 DEBUG [org.jgroups.protocols.pbcast.GMS] 192.168.181.191:37990 changed role to org.jgroups.protocols.pbcast.CoordGmsImpl
      2007-11-01 08:39:41,217 DEBUG [org.jgroups.protocols.pbcast.ClientGmsImpl] created group (first member). My view is [192.168.181.191:37990|0], impl is org.jgroups.protocols.pbcast.CoordGmsImpl
      2007-11-01 08:39:41,217 DEBUG [org.jgroups.protocols.FD_SOCK] VIEW_CHANGE received: [192.168.181.191:37990]
      2007-11-01 08:39:41,217 DEBUG [org.jgroups.protocols.FD_SOCK] first member; cache is empty
      2007-11-01 08:39:41,217 DEBUG [org.jgroups.protocols.MERGE2] merge task started as I'm the coordinator
      2007-11-01 08:39:41,217 INFO [org.jboss.cache.TreeCache] State could not be retrieved (we are the first member in group)
      2007-11-01 08:39:41,217 DEBUG [org.jboss.cache.eviction.RegionManager] Starting eviction timer
      2007-11-01 08:39:41,218 INFO [org.jboss.cache.TreeCache] parseConfig(): PojoCacheConfig is empty
      2007-11-01 08:39:41,218 DEBUG [org.jboss.cache.aop.TreeCacheAop] Started jboss.cache:service=TomcatClusteringCache
      2007-11-01 08:39:41,235 DEBUG [org.jboss.cache.invalidation.InvalidationManager] Creating jboss.cache:service=InvalidationManager
      2007-11-01 08:39:41,235 DEBUG [org.jboss.cache.invalidation.InvalidationManager] Created jboss.cache:service=InvalidationManager
      2007-11-01 08:39:41,236 DEBUG [org.jboss.cache.invalidation.InvalidationManager] Starting jboss.cache:service=InvalidationManager
      2007-11-01 08:39:41,236 DEBUG [org.jboss.cache.invalidation.InvalidationManager] Starting Invalidation Manager jboss.cache:service=InvalidationManager
      2007-11-01 08:39:41,236 DEBUG [org.jboss.cache.invalidation.InvalidationManager] Started jboss.cache:service=InvalidationManager
      2007-11-01 08:39:48,625 DEBUG [org.jgroups.protocols.pbcast.CoordGmsImpl] sender=192.168.181.190:37863, merge_id=[192.168.181.190:37863|1193927988632]
      2007-11-01 08:39:48,626 DEBUG [org.jgroups.protocols.pbcast.CoordGmsImpl] response=GmsHeader[MERGE_RSP]: view=[192.168.181.191:37990|0] [192.168.181.191:37990], digest=[192.168.181.191:37990: [0 : 0], merge_rejected=false, merge_id=[192.168.181.190:37863|1193927988632]
      2007-11-01 08:39:48,630 WARN [org.jgroups.protocols.pbcast.NAKACK] 192.168.181.191:37990] discarded message from non-member 192.168.181.190:37863
      2007-11-01 08:39:48,632 DEBUG [org.jgroups.protocols.FD_SOCK] [SUSPECT] hdr=SUSPECT, mbrs=[192.168.181.191:37990]
      2007-11-01 08:39:48,632 DEBUG [org.jgroups.protocols.FD_SOCK] member is 192.168.181.191:37990
      2007-11-01 08:39:48,633 DEBUG [org.jgroups.protocols.pbcast.GMS] mcasting view {[192.168.181.190:37863|3] [192.168.181.190:37863, 192.168.181.191:37990]} (2 mbrs)

      2007-11-01 08:39:48,633 DEBUG [org.jgroups.protocols.pbcast.CoordGmsImpl] view=[192.168.181.190:37863|3] [192.168.181.190:37863, 192.168.181.191:37990], digest=[192.168.181.190:37863: [433 : 486, 192.168.181.191:37990: [0 : 0]
      2007-11-01 08:39:48,633 DEBUG [org.jgroups.protocols.pbcast.GMS] [local_addr=192.168.181.191:37990] view is [192.168.181.190:37863|3] [192.168.181.190:37863, 192.168.181.191:37990]
      2007-11-01 08:39:48,633 DEBUG [org.jgroups.protocols.FD_SOCK] VIEW_CHANGE received: [192.168.181.190:37863, 192.168.181.191:37990]
      2007-11-01 08:39:48,634 DEBUG [org.jgroups.protocols.FD_SOCK] determinePingDest()=192.168.181.190:37863, pingable_mbrs=[192.168.181.190:37863, 192.168.181.191:37990]
      2007-11-01 08:39:48,634 INFO [org.jboss.cache.TreeCache] viewAccepted(): [192.168.181.190:37863|3] [192.168.181.190:37863, 192.168.181.191:37990]
      2007-11-01 08:39:48,634 DEBUG [org.jgroups.protocols.pbcast.GMS] 192.168.181.191:37990 changed role to org.jgroups.protocols.pbcast.ParticipantGmsImpl
      2007-11-01 08:39:48,937 DEBUG [org.jgroups.protocols.FD_SOCK] could not create socket to 192.168.181.190:37863; suspecting 192.168.181.190:37863
      2007-11-01 08:39:48,937 DEBUG [org.jgroups.protocols.FD_SOCK] suspecting 192.168.181.190:37863 (own address is 192.168.181.191:37990)
      2007-11-01 08:39:48,937 DEBUG [org.jgroups.protocols.FD_SOCK] mbr=192.168.181.190:37863 (size=1)
      2007-11-01 08:39:48,938 DEBUG [org.jgroups.protocols.FD_SOCK] determinePingDest()=null, pingable_mbrs=[192.168.181.191:37990]
      2007-11-01 08:39:48,938 DEBUG [org.jgroups.protocols.FD_SOCK] pinger thread terminated
      2007-11-01 08:39:48,938 DEBUG [org.jgroups.protocols.FD_SOCK] [SUSPECT] hdr=SUSPECT, mbrs=[192.168.181.190:37863]
      2007-11-01 08:39:48,939 DEBUG [org.jgroups.protocols.FD_SOCK] member is 192.168.181.190:37863
      2007-11-01 08:39:58,068 DEBUG [org.jgroups.protocols.pbcast.ParticipantGmsImpl] view=[192.168.181.190:37863|3] [192.168.181.190:37863, 192.168.181.191:37990]
      2007-11-01 08:39:58,068 DEBUG [org.jgroups.protocols.pbcast.GMS] [local_addr=192.168.181.191:37990] view is [192.168.181.190:37863|3] [192.168.181.190:37863, 192.168.181.191:37990]
      2007-11-01 08:39:58,068 DEBUG [org.jgroups.protocols.pbcast.GMS] [192.168.181.191:37990] received view <= current view; discarding it (current vid: [192.168.181.190:37863|3], new vid: [192.168.181.190:37863|3])
      2007-11-01 08:39:58,627 DEBUG [org.jgroups.protocols.pbcast.CoordGmsImpl] cancelling merge (merge_id=[192.168.181.190:37863|1193927988632])
      2007-11-01 08:40:01,234 DEBUG [org.jboss.cache.eviction.BaseEvictionAlgorithm] Adding element /SessionKeyIndex/IP_ADDR/70.1.2.3 for a node that doesn't exist yet. Process as an add.
      2007-11-01 08:40:01,235 DEBUG [org.jboss.cache.eviction.BaseEvictionAlgorithm] Adding element /SessionKeyIndex/DEVICE_IP_LIST/10.10.10.10 for a node that doesn't exist yet. Process as an add.
      2007-11-01 08:40:01,235 DEBUG [org.jboss.cache.eviction.BaseEvictionAlgorithm] Adding element /SessionKeyIndex/USER_NAME/testuser2 for a node that doesn't exist yet. Process as an add.
      2007-11-01 08:40:01,235 DEBUG [org.jboss.cache.eviction.BaseEvictionAlgorithm] Adding element /SessionKeyIndex/SERVICE_LIST/internet2 for a node that doesn't exist yet. Process as an add.
      2007-11-01 08:40:01,235 DEBUG [org.jboss.cache.eviction.BaseEvictionAlgorithm] Adding element /Sessions/42E2E6FCD2EC4345 for a node that doesn't exist yet. Process as an add.
      2007-11-01 08:40:01,235 DEBUG [org.jboss.cache.eviction.BaseEvictionAlgorithm] Adding element /SessionKeys/42E2E6FCD2EC4345 for a node that doesn't exist yet. Process as an add.

        • 1. Re: Initial State Transfer Failure
          Tony Lauro Newbie

          Sorry the config file didn't get posted correctly


          <!-- ==================================================================== -->
          <!-- Defines TreeCache configuration -->
          <!-- ==================================================================== -->



          jboss:service=Naming
          jboss:service=TransactionManager
          jboss.aop:service=AspectDeployer

          <!--
          Configure the TransactionManager
          -->
          org.jboss.cache.BatchModeTransactionManagerLookup

          <!--
          Isolation level : SERIALIZABLE
          REPEATABLE_READ (default)
          READ_COMMITTED
          READ_UNCOMMITTED
          NONE
          -->
          REPEATABLE_READ

          <!--
          Valid modes are LOCAL
          REPL_ASYNC
          REPL_SYNC
          INVALIDATION_ASYNC
          INVALIDATION_SYNC
          -->
          REPL_ASYNC

          <!--
          Just used for async repl: use a replication queue
          -->
          false

          <!--
          Replication interval for replication queue (in ms)
          -->
          0

          <!--
          Max number of elements which trigger replication
          -->
          0

          <!-- Name of cluster. Needs to be the same for all clusters, in order
          to find each other
          -->
          TreeCache-Cluster

          <!-- JGroups protocol stack properties. Can also be a URL,
          e.g. file:/home/bela/default.xml

          -->



          <UDP mcast_addr="228.1.2.3" mcast_port="48866"
          ip_ttl="64" ip_mcast="true"
          mcast_send_buf_size="150000" mcast_recv_buf_size="80000"
          ucast_send_buf_size="150000" ucast_recv_buf_size="80000"
          loopback="false" bind_addr="192.168.181.190" />
          <PING timeout="2000" num_initial_members="3"
          up_thread="false" down_thread="false"
          initial_hosts="192.168.181.191[48866]"/>
          <MERGE2 min_interval="10000" max_interval="20000"/>
          <FD_SOCK/>
          <VERIFY_SUSPECT timeout="1500"
          up_thread="false" down_thread="false"/>
          <pbcast.NAKACK gc_lag="50" retransmit_timeout="600,1200,2400,4800"
          max_xmit_size="8192" up_thread="false" down_thread="false"/>
          <UNICAST timeout="600,1200,2400" window_size="100" min_threshold="10"
          down_thread="false"/>
          <pbcast.STABLE desired_avg_gossip="20000"
          up_thread="false" down_thread="false"/>
          <FRAG frag_size="8192"
          down_thread="false" up_thread="false"/>
          <pbcast.GMS join_timeout="5000" join_retry_timeout="2000"
          shun="true" print_local_addr="true"/>
          <pbcast.STATE_TRANSFER up_thread="true" down_thread="true"/>



          <!--
          Whether or not to fetch state on joining a cluster
          NOTE this used to be called FetchStateOnStartup and has been renamed to be more descriptive.
          -->
          true

          <!--
          The max amount of time (in milliseconds) we wait until the
          initial state (ie. the contents of the cache) are retrieved from
          existing members in a clustered environment
          -->
          20000

          <!--
          Number of milliseconds to wait until all responses for a
          synchronous call have been received.
          -->
          20000

          <!-- Max number of milliseconds to wait for a lock acquisition -->
          15000

          <!-- Eviction policy defines how long nodes in defined regions stay -->
          <!-- in cache before being removed. In this case a cache entry can -->
          <!-- stay in cache w/out any activity for the specified number of -->
          <!-- seconds. BAM will update timestamp on the record every time -->
          <!-- an interium accounting record is received. Likewise the record -->
          <!-- will be removed when a accounting stop record is received. -->
          <!-- So, this policy is used to remove nodes in cases where an -->
          <!-- accounting stop record is lost. -->
          <!-- -->
          <!-- NOTE: the timeToLiveSeconds must be greater than the time -->
          <!-- used to send interium accouning records. Also /SessionKeys -->
          <!-- and Sessions must have a timeToLiveSeconds > /SessionKeyIndex -->
          <!-- as the indexs have to be removed before the acutal data to -->
          <!-- avoid a race condition. -->



          5

          <!-- See notes above -->

          0
          3900


          <!-- See notes above, time must be > than /SessionKeyuIndex -->

          0
          3901


          <!-- See notes above, time must be > than /sessionKeyIndex -->

          0
          3902


          <!-- Default Eviction policy for all other Nodes in cache -->

          0
          0




          <!--
          Indicate whether to use marshalling or not. Set this to true if you are running under a scoped
          class loader, e.g., inside an application server. Default is "false".
          -->
          false
          false


          • 2. Re: Initial State Transfer Failure
            Tony Lauro Newbie

            3rd try's a charm:



            &server>
            <!-- ==================================================================== -->
            <!-- Defines TreeCache configuration -->
            <!-- ==================================================================== -->



            jboss:service=Naming
            jboss:service=TransactionManager
            jboss.aop:service=AspectDeployer

            <!--
            Configure the TransactionManager
            -->
            org.jboss.cache.BatchModeTransactionManagerLookup

            <!--
            Isolation level : SERIALIZABLE
            REPEATABLE_READ (default)
            READ_COMMITTED
            READ_UNCOMMITTED
            NONE
            -->
            REPEATABLE_READ

            <!--
            Valid modes are LOCAL
            REPL_ASYNC
            REPL_SYNC
            INVALIDATION_ASYNC
            INVALIDATION_SYNC
            -->
            REPL_ASYNC

            <!--
            Just used for async repl: use a replication queue
            -->
            false

            <!--
            Replication interval for replication queue (in ms)
            -->
            0

            <!--
            Max number of elements which trigger replication
            -->
            0

            <!-- Name of cluster. Needs to be the same for all clusters, in order
            to find each other
            -->
            TreeCache-Cluster

            <!-- JGroups protocol stack properties. Can also be a URL,
            e.g. file:/home/bela/default.xml

            -->



            <UDP mcast_addr="228.1.2.3" mcast_port="48866"
            ip_ttl="64" ip_mcast="true"
            mcast_send_buf_size="150000" mcast_recv_buf_size="80000"
            ucast_send_buf_size="150000" ucast_recv_buf_size="80000"
            loopback="false" bind_addr="192.168.181.190" />
            <PING timeout="2000" num_initial_members="3"
            up_thread="false" down_thread="false"
            initial_hosts="192.168.181.191[48866]"/>
            <MERGE2 min_interval="10000" max_interval="20000"/>
            <FD_SOCK/>
            <VERIFY_SUSPECT timeout="1500"
            up_thread="false" down_thread="false"/>
            <pbcast.NAKACK gc_lag="50" retransmit_timeout="600,1200,2400,4800"
            max_xmit_size="8192" up_thread="false" down_thread="false"/>
            <UNICAST timeout="600,1200,2400" window_size="100" min_threshold="10"
            down_thread="false"/>
            <pbcast.STABLE desired_avg_gossip="20000"
            up_thread="false" down_thread="false"/>
            <FRAG frag_size="8192"
            down_thread="false" up_thread="false"/>
            <pbcast.GMS join_timeout="5000" join_retry_timeout="2000"
            shun="true" print_local_addr="true"/>
            <pbcast.STATE_TRANSFER up_thread="true" down_thread="true"/>



            <!--
            Whether or not to fetch state on joining a cluster
            NOTE this used to be called FetchStateOnStartup and has been renamed to be more descriptive.
            -->
            true

            <!--
            The max amount of time (in milliseconds) we wait until the
            initial state (ie. the contents of the cache) are retrieved from
            existing members in a clustered environment
            -->
            20000

            <!--
            Number of milliseconds to wait until all responses for a
            synchronous call have been received.
            -->
            20000

            <!-- Max number of milliseconds to wait for a lock acquisition -->
            15000

            <!-- Eviction policy defines how long nodes in defined regions stay -->
            <!-- in cache before being removed. In this case a cache entry can -->
            <!-- stay in cache w/out any activity for the specified number of -->
            <!-- seconds. BAM will update timestamp on the record every time -->
            <!-- an interium accounting record is received. Likewise the record -->
            <!-- will be removed when a accounting stop record is received. -->
            <!-- So, this policy is used to remove nodes in cases where an -->
            <!-- accounting stop record is lost. -->
            <!-- -->
            <!-- NOTE: the timeToLiveSeconds must be greater than the time -->
            <!-- used to send interium accouning records. Also /SessionKeys -->
            <!-- and Sessions must have a timeToLiveSeconds > /SessionKeyIndex -->
            <!-- as the indexs have to be removed before the acutal data to -->
            <!-- avoid a race condition. -->



            5

            <!-- See notes above -->

            0
            3900


            <!-- See notes above, time must be > than /SessionKeyuIndex -->

            0
            3901


            <!-- See notes above, time must be > than /sessionKeyIndex -->

            0
            3902


            <!-- Default Eviction policy for all other Nodes in cache -->

            0
            0




            <!--
            Indicate whether to use marshalling or not. Set this to true if you are running under a scoped
            class loader, e.g., inside an application server. Default is "false".
            -->
            false
            false




            • 4. Re: Initial State Transfer Failure
              Tony Lauro Newbie

              and now i'm totally embarrassed. thank you!

              <server>
               <!-- ==================================================================== -->
               <!-- Defines TreeCache configuration -->
               <!-- ==================================================================== -->
              
               <mbean code="org.jboss.cache.aop.TreeCacheAop"
               name="jboss.cache:service=TomcatClusteringCache">
              
               <depends>jboss:service=Naming</depends>
               <depends>jboss:service=TransactionManager</depends>
               <depends>jboss.aop:service=AspectDeployer</depends>
              
               <!--
               Configure the TransactionManager
               -->
               <attribute name="TransactionManagerLookupClass">org.jboss.cache.BatchModeTransactionManagerLookup</attribute>
              
               <!--
               Isolation level : SERIALIZABLE
               REPEATABLE_READ (default)
               READ_COMMITTED
               READ_UNCOMMITTED
               NONE
               -->
               <attribute name="IsolationLevel">REPEATABLE_READ</attribute>
              
               <!--
               Valid modes are LOCAL
               REPL_ASYNC
               REPL_SYNC
               INVALIDATION_ASYNC
               INVALIDATION_SYNC
               -->
               <attribute name="CacheMode">REPL_ASYNC</attribute>
              
               <!--
               Just used for async repl: use a replication queue
               -->
               <attribute name="UseReplQueue">false</attribute>
              
               <!--
               Replication interval for replication queue (in ms)
               -->
               <attribute name="ReplQueueInterval">0</attribute>
              
               <!--
               Max number of elements which trigger replication
               -->
               <attribute name="ReplQueueMaxElements">0</attribute>
              
               <!-- Name of cluster. Needs to be the same for all clusters, in order
               to find each other
               -->
               <attribute name="ClusterName">TreeCache-Cluster</attribute>
              
               <!-- JGroups protocol stack properties. Can also be a URL,
               e.g. file:/home/bela/default.xml
               <attribute name="ClusterProperties"></attribute>
               -->
              
               <attribute name="ClusterConfig">
               <config>
               <UDP mcast_addr="228.1.2.3" mcast_port="48866"
               ip_ttl="64" ip_mcast="true"
               mcast_send_buf_size="150000" mcast_recv_buf_size="80000"
               ucast_send_buf_size="150000" ucast_recv_buf_size="80000"
               loopback="false" bind_addr="192.168.181.190" />
               <PING timeout="2000" num_initial_members="3"
               up_thread="false" down_thread="false"
               initial_hosts="192.168.181.191[48866]"/>
               <MERGE2 min_interval="10000" max_interval="20000"/>
               <FD_SOCK/>
               <VERIFY_SUSPECT timeout="1500"
               up_thread="false" down_thread="false"/>
               <pbcast.NAKACK gc_lag="50" retransmit_timeout="600,1200,2400,4800"
               max_xmit_size="8192" up_thread="false" down_thread="false"/>
               <UNICAST timeout="600,1200,2400" window_size="100" min_threshold="10"
               down_thread="false"/>
               <pbcast.STABLE desired_avg_gossip="20000"
               up_thread="false" down_thread="false"/>
               <FRAG frag_size="8192"
               down_thread="false" up_thread="false"/>
               <pbcast.GMS join_timeout="5000" join_retry_timeout="2000"
               shun="true" print_local_addr="true"/>
               <pbcast.STATE_TRANSFER up_thread="true" down_thread="true"/>
               </config>
               </attribute>
              
               <!--
               Whether or not to fetch state on joining a cluster
               NOTE this used to be called FetchStateOnStartup and has been renamed to be more descriptive.
               -->
               <attribute name="FetchInMemoryState">true</attribute>
              
               <!--
               The max amount of time (in milliseconds) we wait until the
               initial state (ie. the contents of the cache) are retrieved from
               existing members in a clustered environment
               -->
               <attribute name="InitialStateRetrievalTimeout">20000</attribute>
              
               <!--
               Number of milliseconds to wait until all responses for a
               synchronous call have been received.
               -->
               <attribute name="SyncReplTimeout">20000</attribute>
              
               <!-- Max number of milliseconds to wait for a lock acquisition -->
               <attribute name="LockAcquisitionTimeout">15000</attribute>
              
               <!-- Eviction policy defines how long nodes in defined regions stay -->
               <!-- in cache before being removed. In this case a cache entry can -->
               <!-- stay in cache w/out any activity for the specified number of -->
               <!-- seconds. BAM will update timestamp on the record every time -->
               <!-- an interium accounting record is received. Likewise the record -->
               <!-- will be removed when a accounting stop record is received. -->
               <!-- So, this policy is used to remove nodes in cases where an -->
               <!-- accounting stop record is lost. -->
               <!-- -->
               <!-- NOTE: the timeToLiveSeconds must be greater than the time -->
               <!-- used to send interium accouning records. Also /SessionKeys -->
               <!-- and Sessions must have a timeToLiveSeconds > /SessionKeyIndex -->
               <!-- as the indexs have to be removed before the acutal data to -->
               <!-- avoid a race condition. -->
              
               <attribute name="EvictionPolicyConfig">
               <config>
               <attribute name="wakeUpIntervalSeconds">5</attribute>
              
               <!-- See notes above -->
               <region name="/SessionKeyIndex" policyClass="org.jboss.cache.eviction.LRUPolicy">
               <attribute name="maxNodes">0</attribute>
               <attribute name="timeToLiveSeconds">3900</attribute>
               </region>
              
               <!-- See notes above, time must be > than /SessionKeyuIndex -->
               <region name="/SessionKeys" policyClass="org.jboss.cache.eviction.LRUPolicy">
               <attribute name="maxNodes">0</attribute>
               <attribute name="timeToLiveSeconds">3901</attribute>
               </region>
              
               <!-- See notes above, time must be > than /sessionKeyIndex -->
               <region name="/Sessions" policyClass="org.jboss.cache.eviction.LRUPolicy">
               <attribute name="maxNodes">0</attribute>
               <attribute name="timeToLiveSeconds">3902</attribute>
               </region>
              
               <!-- Default Eviction policy for all other Nodes in cache -->
               <region name="/_default_" policyClass="org.jboss.cache.eviction.LRUPolicy">
               <attribute name="maxNodes">0</attribute>
               <attribute name="timeToLiveSeconds">0</attribute>
               </region>
               </config>
               </attribute>
              
               <!--
               Indicate whether to use marshalling or not. Set this to true if you are running under a scoped
               class loader, e.g., inside an application server. Default is "false".
               -->
               <attribute name="UseRegionBasedMarshalling">false</attribute>
               <attribute name="InactiveOnStartup">false</attribute>
               </mbean>
              </server>
              


              • 5. Re: Initial State Transfer Failure
                Manik Surtani Master

                Are you using JBoss Cache 2.x in your own application, or as a part of HTTP or EJB3 session replication?

                • 6. Re: Initial State Transfer Failure
                  Tony Lauro Newbie

                  I'm sorry I'm a bit new to this stuff so I don't know if I understand what you are asking. We create a web app with EJBs etc and deploy it into a JBoss container. The cache service is configured via an xml file that deploys as a service in the container. We have created an abstract cache class that we extend for each cache region. The abstract class implements methods that uses a TreeCacheMBean object to acces cached objects.

                  • 7. Re: Initial State Transfer Failure
                    Tony Lauro Newbie

                    Manik,

                    I think I understand more what you are asking - it's EJB session replication

                    • 8. Re: Initial State Transfer Failure
                      Brian Stansberry Master

                      I think your problem is in PING:

                      initial_hosts="192.168.181.191[48866]


                      If you are going to use this, the port associated with the peer is not the UDP.mcast_port (48866), it is the UDP.bind_port, which you would need to specify in UDP. If you specify PING.initial_hosts, PING sends UDP unicast messages to each listed member; it does not send a multicast. For the unicast message it needs to know the UDP.bind_port. If you don't specify bind_port, UDP picks an ephemeral port, which obviously you can't list in initial_hosts.

                      With your config, what happens is the initial group discovery fails and each node forms a group by itself. No initial state transfer when you are in a group by yourself. Then MERGE2 kicks in and the groups merge into a single cluster. But a merge does not trigger an initial state transfer.

                      Why are you specifying initial_hosts? Don't if you don't really need to; it's not the usual way to use PING.



                      • 9. Re: Initial State Transfer Failure
                        Tony Lauro Newbie

                        we put initial_hosts to try and fix the problem - when we didn't specify initial_hosts we still had the same problem. so if i understand what you are saying i need to:

                        1. remove the initial_hosts from PING
                        2. specify bind_addr in UDP

                        and that should solve the problem, yes? OK - i will try it and post again after testing.

                        thank you very much!

                        • 10. Re: Initial State Transfer Failure
                          Brian Stansberry Master

                          OK, let us know. If you didn't have initial_hosts in there before and it still failed you may have some other issue.

                          • 11. Re: Initial State Transfer Failure
                            Tony Lauro Newbie

                            Brian and Manik,

                            The bind_addr was the thing - and we did find the other problem. The ejbs were loading after the cache service was loading and so there were failures. We have the cache start up after the ejbs have loaded and all is good!

                            Thank you very much!

                            Tony