0 Replies Latest reply on Feb 15, 2005 8:42 PM by Zalle

    Please Help!!!

    Zalle Newbie

      Hello there,

      It is the first time I am dealing with JBoss clustering using JGroups. Basically, I have been recently handed over a live application running on JBoss 4.0 platform using Jboss-cache and Hibernate. This application is deployed in a clustered environment with 4 nodes, running on separate Win 2000 Adv.Svr boxes.

      Everything was running fine until I changed the ip address mapping of the smtp host (used by the nodes) in the hosts files and restarted the machines. Following that I keep getting this error message in the JBoss log files:

      2005-02-16 01:08:46,500 WARN [caw.util.hibernate.cache.JBossTreeCacheService] No transaction manager lookup class has been defined. TX will be null
      2005-02-16 01:08:46,531 INFO [caw.util.hibernate.cache.JBossTreeCacheService] interceptor chain is:
      class org.jboss.cache.interceptors.CallInterceptor
      class org.jboss.cache.interceptors.ReplicationInterceptor
      2005-02-16 01:08:46,531 INFO [caw.util.hibernate.cache.JBossTreeCacheService] cache mode is REPL_ASYNC
      2005-02-16 01:08:47,500 INFO [STDOUT]
      -------------------------------------------------------
      GMS: address is primary:1079
      -------------------------------------------------------
      2005-02-16 01:08:54,593 WARN [org.jgroups.protocols.pbcast.ClientGmsImpl] handleJoin(primary:1079) failed, retrying
      2005-02-16 01:09:03,593 WARN [org.jgroups.protocols.pbcast.ClientGmsImpl] handleJoin(primary:1079) failed, retrying
      2005-02-16 01:09:12,593 WARN [org.jgroups.protocols.pbcast.ClientGmsImpl] handleJoin(primary:1079) failed, retrying
      2005-02-16 01:09:21,593 WARN [org.jgroups.protocols.pbcast.ClientGmsImpl] handleJoin(primary:1079) failed, retrying
      2005-02-16 01:09:30,593 WARN [org.jgroups.protocols.pbcast.ClientGmsImpl] handleJoin(primary:1079) failed, retrying
      2005-02-16 01:09:39,593 WARN [org.jgroups.protocols.pbcast.ClientGmsImpl] handleJoin(primary:1079) failed, retrying
      2005-02-16 01:09:48,593 WARN [org.jgroups.protocols.pbcast.ClientGmsImpl] handleJoin(primary:1079) failed, retrying



      Below is the jboss-cach and cluster configuration:

      <?xml version="1.0" encoding="UTF-8"?>
      <!DOCTYPE server>







      jboss:service=Naming
      jboss:service=TransactionManager

      java:/TreeCache

      <!--
      Configure the TransactionManager
      Note for Isolation level : NONE - use no TransactionManagerLookupClass (comment attribute out)
      -->
      <!-- attribute name="TransactionManagerLookupClass">org.jboss.cache.DummyTransactionManagerLookup</attribute -->
      <!--
      Isolation level : SERIALIZABLE
      REPEATABLE_READ (default)
      READ_COMMITTED
      READ_UNCOMMITTED
      NONE
      -->
      <!-- TODO - This should be set to READ_COMMITTED, but cannot because of bug in hibernate -->
      NONE

      <!-- Valid modes are LOCAL, REPL_ASYNC and REPL_SYNC -->
      REPL_ASYNC

      <!-- Cache Loader -->
      <!-- ??? attribute name="CacheLoaderClass"></attribute-->
      <!-- ??? attribute name="CacheLoaderConfig"></attribute-->
      <!-- ??? attribute name="CacheLoaderShared"></attribute-->
      <!-- ??? attribute name="CacheLoaderPreload"></attribute-->
      <!-- ??? attribute name="CacheLoaderFetchPersistentState"></attribute-->
      <!-- ??? attribute name="CacheLoaderFetchTransientState"></attribute-->
      <!-- ??? attribute name="CacheLoaderFetchTransientState"></attribute-->

      <!-- Just used for async repl: use a replication queue -->
      false

      <!-- Replication interval for replication queue (in ms) -->
      1000

      <!-- Max number of elements which trigger replication -->
      20

      <!-- Whether or not to fetch state on joining a cluster -->
      true

      <!--
      The max amount of time (in milliseconds) we wait until the
      initial state (ie. the contents of the cache) are retrieved from
      existing members in a clustered environment
      -->
      15000

      <!--
      Number of milliseconds to wait until all responses for a
      synchronous call have been received.
      -->
      10000

      <!-- Max number of milliseconds to wait for a lock acquisition -->
      15000

      <!-- Name of cluster. Needs to be the same for all clusters, in order
      to find each other
      -->
      TreeCache-Cluster

      <!-- JGroups protocol stack properties. Can also be a URL,
      e.g. file:/home/bela/default.xml
      * Set the cluster properties. If the cache is to use the new properties,
      * it has to be redeployed

      -->



      <!-- UDP: if you have a multihomed machine,
      set the bind_addr attribute to the appropriate NIC IP address -->
      <!-- UDP: On Windows machines, because of the media sense feature
      being broken with multicast (even after disabling media sense)
      set the loopback attribute to true -->
      <!-- UDP mcast_addr="228.1.2.3" mcast_port="12233" - Oscar Production
      UDP mcast_addr="228.8.8.7" mcast_port="12234" - Oscar UAT
      -->
      <UDP mcast_addr="228.1.2.3" mcast_port="12233"
      ip_ttl="64" ip_mcast="true"
      mcast_send_buf_size="150000" mcast_recv_buf_size="80000"
      ucast_send_buf_size="150000" ucast_recv_buf_size="80000"
      bind_addr="primary.nic.b3xt.host"
      loopback="true"/>
      <PING timeout="2000" num_initial_members="3"
      />
      <MERGE2 min_interval="10000" max_interval="20000"/>
      <!-- <FD shun="true" up_thread="true" down_thread="true" />-->
      <FD_SOCK/>
      <VERIFY_SUSPECT timeout="1500"
      up_thread="false" down_thread="false"/>
      <pbcast.NAKACK gc_lag="50" retransmit_timeout="600,1200,2400,4800"
      max_xmit_size="8192" up_thread="false" down_thread="false"/>
      <UNICAST timeout="600,1200,2400" window_size="100" min_threshold="10"
      down_thread="false"/>
      <pbcast.STABLE desired_avg_gossip="20000"
      up_thread="false" down_thread="false"/>
      <FRAG frag_size="8192"
      down_thread="false" up_thread="false"/>
      <pbcast.GMS join_timeout="5000" join_retry_timeout="2000"
      shun="true" print_local_addr="true"/>
      <pbcast.STATE_TRANSFER up_thread="true" down_thread="true"/>









      It seems like the restarted nodes are trying to joing the cluster but are being unable to. Can someone point me in the direction to further localize this problem. I need to have this up and running before morning, so am pretty desparate.

      Please Help!!

      Many thanks
      Zalle