0 Replies Latest reply on Feb 15, 2005 8:42 PM by Zalle

    Please Help!!!

    Zalle Newbie

      Hello there,

      It is the first time I am dealing with JBoss clustering using JGroups. Basically, I have been recently handed over a live application running on JBoss 4.0 platform using Jboss-cache and Hibernate. This application is deployed in a clustered environment with 4 nodes, running on separate Win 2000 Adv.Svr boxes.

      Everything was running fine until I changed the ip address mapping of the smtp host (used by the nodes) in the hosts files and restarted the machines. Following that I keep getting this error message in the JBoss log files:

      2005-02-16 01:08:46,500 WARN [caw.util.hibernate.cache.JBossTreeCacheService] No transaction manager lookup class has been defined. TX will be null
      2005-02-16 01:08:46,531 INFO [caw.util.hibernate.cache.JBossTreeCacheService] interceptor chain is:
      class org.jboss.cache.interceptors.CallInterceptor
      class org.jboss.cache.interceptors.ReplicationInterceptor
      2005-02-16 01:08:46,531 INFO [caw.util.hibernate.cache.JBossTreeCacheService] cache mode is REPL_ASYNC
      2005-02-16 01:08:47,500 INFO [STDOUT]
      GMS: address is primary:1079
      2005-02-16 01:08:54,593 WARN [org.jgroups.protocols.pbcast.ClientGmsImpl] handleJoin(primary:1079) failed, retrying
      2005-02-16 01:09:03,593 WARN [org.jgroups.protocols.pbcast.ClientGmsImpl] handleJoin(primary:1079) failed, retrying
      2005-02-16 01:09:12,593 WARN [org.jgroups.protocols.pbcast.ClientGmsImpl] handleJoin(primary:1079) failed, retrying
      2005-02-16 01:09:21,593 WARN [org.jgroups.protocols.pbcast.ClientGmsImpl] handleJoin(primary:1079) failed, retrying
      2005-02-16 01:09:30,593 WARN [org.jgroups.protocols.pbcast.ClientGmsImpl] handleJoin(primary:1079) failed, retrying
      2005-02-16 01:09:39,593 WARN [org.jgroups.protocols.pbcast.ClientGmsImpl] handleJoin(primary:1079) failed, retrying
      2005-02-16 01:09:48,593 WARN [org.jgroups.protocols.pbcast.ClientGmsImpl] handleJoin(primary:1079) failed, retrying

      Below is the jboss-cach and cluster configuration:

      <?xml version="1.0" encoding="UTF-8"?>
      <!DOCTYPE server>



      Configure the TransactionManager
      Note for Isolation level : NONE - use no TransactionManagerLookupClass (comment attribute out)
      <!-- attribute name="TransactionManagerLookupClass">org.jboss.cache.DummyTransactionManagerLookup</attribute -->
      Isolation level : SERIALIZABLE
      REPEATABLE_READ (default)
      <!-- TODO - This should be set to READ_COMMITTED, but cannot because of bug in hibernate -->

      <!-- Valid modes are LOCAL, REPL_ASYNC and REPL_SYNC -->

      <!-- Cache Loader -->
      <!-- ??? attribute name="CacheLoaderClass"></attribute-->
      <!-- ??? attribute name="CacheLoaderConfig"></attribute-->
      <!-- ??? attribute name="CacheLoaderShared"></attribute-->
      <!-- ??? attribute name="CacheLoaderPreload"></attribute-->
      <!-- ??? attribute name="CacheLoaderFetchPersistentState"></attribute-->
      <!-- ??? attribute name="CacheLoaderFetchTransientState"></attribute-->
      <!-- ??? attribute name="CacheLoaderFetchTransientState"></attribute-->

      <!-- Just used for async repl: use a replication queue -->

      <!-- Replication interval for replication queue (in ms) -->

      <!-- Max number of elements which trigger replication -->

      <!-- Whether or not to fetch state on joining a cluster -->

      The max amount of time (in milliseconds) we wait until the
      initial state (ie. the contents of the cache) are retrieved from
      existing members in a clustered environment

      Number of milliseconds to wait until all responses for a
      synchronous call have been received.

      <!-- Max number of milliseconds to wait for a lock acquisition -->

      <!-- Name of cluster. Needs to be the same for all clusters, in order
      to find each other

      <!-- JGroups protocol stack properties. Can also be a URL,
      e.g. file:/home/bela/default.xml
      * Set the cluster properties. If the cache is to use the new properties,
      * it has to be redeployed


      <!-- UDP: if you have a multihomed machine,
      set the bind_addr attribute to the appropriate NIC IP address -->
      <!-- UDP: On Windows machines, because of the media sense feature
      being broken with multicast (even after disabling media sense)
      set the loopback attribute to true -->
      <!-- UDP mcast_addr="" mcast_port="12233" - Oscar Production
      UDP mcast_addr="" mcast_port="12234" - Oscar UAT
      <UDP mcast_addr="" mcast_port="12233"
      ip_ttl="64" ip_mcast="true"
      mcast_send_buf_size="150000" mcast_recv_buf_size="80000"
      ucast_send_buf_size="150000" ucast_recv_buf_size="80000"
      <PING timeout="2000" num_initial_members="3"
      <MERGE2 min_interval="10000" max_interval="20000"/>
      <!-- <FD shun="true" up_thread="true" down_thread="true" />-->
      <VERIFY_SUSPECT timeout="1500"
      up_thread="false" down_thread="false"/>
      <pbcast.NAKACK gc_lag="50" retransmit_timeout="600,1200,2400,4800"
      max_xmit_size="8192" up_thread="false" down_thread="false"/>
      <UNICAST timeout="600,1200,2400" window_size="100" min_threshold="10"
      <pbcast.STABLE desired_avg_gossip="20000"
      up_thread="false" down_thread="false"/>
      <FRAG frag_size="8192"
      down_thread="false" up_thread="false"/>
      <pbcast.GMS join_timeout="5000" join_retry_timeout="2000"
      shun="true" print_local_addr="true"/>
      <pbcast.STATE_TRANSFER up_thread="true" down_thread="true"/>

      It seems like the restarted nodes are trying to joing the cluster but are being unable to. Can someone point me in the direction to further localize this problem. I need to have this up and running before morning, so am pretty desparate.

      Please Help!!

      Many thanks