Please Help!!!
zalle_cool Feb 15, 2005 8:42 PMHello there,
It is the first time I am dealing with JBoss clustering using JGroups. Basically, I have been recently handed over a live application running on JBoss 4.0 platform using Jboss-cache and Hibernate. This application is deployed in a clustered environment with 4 nodes, running on separate Win 2000 Adv.Svr boxes.
Everything was running fine until I changed the ip address mapping of the smtp host (used by the nodes) in the hosts files and restarted the machines. Following that I keep getting this error message in the JBoss log files:
2005-02-16 01:08:46,500 WARN [caw.util.hibernate.cache.JBossTreeCacheService] No transaction manager lookup class has been defined. TX will be null
2005-02-16 01:08:46,531 INFO [caw.util.hibernate.cache.JBossTreeCacheService] interceptor chain is:
class org.jboss.cache.interceptors.CallInterceptor
class org.jboss.cache.interceptors.ReplicationInterceptor
2005-02-16 01:08:46,531 INFO [caw.util.hibernate.cache.JBossTreeCacheService] cache mode is REPL_ASYNC
2005-02-16 01:08:47,500 INFO [STDOUT]
-------------------------------------------------------
GMS: address is primary:1079
-------------------------------------------------------
2005-02-16 01:08:54,593 WARN [org.jgroups.protocols.pbcast.ClientGmsImpl] handleJoin(primary:1079) failed, retrying
2005-02-16 01:09:03,593 WARN [org.jgroups.protocols.pbcast.ClientGmsImpl] handleJoin(primary:1079) failed, retrying
2005-02-16 01:09:12,593 WARN [org.jgroups.protocols.pbcast.ClientGmsImpl] handleJoin(primary:1079) failed, retrying
2005-02-16 01:09:21,593 WARN [org.jgroups.protocols.pbcast.ClientGmsImpl] handleJoin(primary:1079) failed, retrying
2005-02-16 01:09:30,593 WARN [org.jgroups.protocols.pbcast.ClientGmsImpl] handleJoin(primary:1079) failed, retrying
2005-02-16 01:09:39,593 WARN [org.jgroups.protocols.pbcast.ClientGmsImpl] handleJoin(primary:1079) failed, retrying
2005-02-16 01:09:48,593 WARN [org.jgroups.protocols.pbcast.ClientGmsImpl] handleJoin(primary:1079) failed, retrying
Below is the jboss-cach and cluster configuration:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE server>
jboss:service=Naming
jboss:service=TransactionManager
java:/TreeCache
<!--
Configure the TransactionManager
Note for Isolation level : NONE - use no TransactionManagerLookupClass (comment attribute out)
-->
<!-- attribute name="TransactionManagerLookupClass">org.jboss.cache.DummyTransactionManagerLookup</attribute -->
<!--
Isolation level : SERIALIZABLE
REPEATABLE_READ (default)
READ_COMMITTED
READ_UNCOMMITTED
NONE
-->
<!-- TODO - This should be set to READ_COMMITTED, but cannot because of bug in hibernate -->
NONE
<!-- Valid modes are LOCAL, REPL_ASYNC and REPL_SYNC -->
REPL_ASYNC
<!-- Cache Loader -->
<!-- ??? attribute name="CacheLoaderClass"></attribute-->
<!-- ??? attribute name="CacheLoaderConfig"></attribute-->
<!-- ??? attribute name="CacheLoaderShared"></attribute-->
<!-- ??? attribute name="CacheLoaderPreload"></attribute-->
<!-- ??? attribute name="CacheLoaderFetchPersistentState"></attribute-->
<!-- ??? attribute name="CacheLoaderFetchTransientState"></attribute-->
<!-- ??? attribute name="CacheLoaderFetchTransientState"></attribute-->
<!-- Just used for async repl: use a replication queue -->
false
<!-- Replication interval for replication queue (in ms) -->
1000
<!-- Max number of elements which trigger replication -->
20
<!-- Whether or not to fetch state on joining a cluster -->
true
<!--
The max amount of time (in milliseconds) we wait until the
initial state (ie. the contents of the cache) are retrieved from
existing members in a clustered environment
-->
15000
<!--
Number of milliseconds to wait until all responses for a
synchronous call have been received.
-->
10000
<!-- Max number of milliseconds to wait for a lock acquisition -->
15000
<!-- Name of cluster. Needs to be the same for all clusters, in order
to find each other
-->
TreeCache-Cluster
<!-- JGroups protocol stack properties. Can also be a URL,
e.g. file:/home/bela/default.xml
* Set the cluster properties. If the cache is to use the new properties,
* it has to be redeployed
-->
<!-- UDP: if you have a multihomed machine,
set the bind_addr attribute to the appropriate NIC IP address -->
<!-- UDP: On Windows machines, because of the media sense feature
being broken with multicast (even after disabling media sense)
set the loopback attribute to true -->
<!-- UDP mcast_addr="228.1.2.3" mcast_port="12233" - Oscar Production
UDP mcast_addr="228.8.8.7" mcast_port="12234" - Oscar UAT
-->
<UDP mcast_addr="228.1.2.3" mcast_port="12233"
ip_ttl="64" ip_mcast="true"
mcast_send_buf_size="150000" mcast_recv_buf_size="80000"
ucast_send_buf_size="150000" ucast_recv_buf_size="80000"
bind_addr="primary.nic.b3xt.host"
loopback="true"/>
<PING timeout="2000" num_initial_members="3"
/>
<MERGE2 min_interval="10000" max_interval="20000"/>
<!-- <FD shun="true" up_thread="true" down_thread="true" />-->
<FD_SOCK/>
<VERIFY_SUSPECT timeout="1500"
up_thread="false" down_thread="false"/>
<pbcast.NAKACK gc_lag="50" retransmit_timeout="600,1200,2400,4800"
max_xmit_size="8192" up_thread="false" down_thread="false"/>
<UNICAST timeout="600,1200,2400" window_size="100" min_threshold="10"
down_thread="false"/>
<pbcast.STABLE desired_avg_gossip="20000"
up_thread="false" down_thread="false"/>
<FRAG frag_size="8192"
down_thread="false" up_thread="false"/>
<pbcast.GMS join_timeout="5000" join_retry_timeout="2000"
shun="true" print_local_addr="true"/>
<pbcast.STATE_TRANSFER up_thread="true" down_thread="true"/>
It seems like the restarted nodes are trying to joing the cluster but are being unable to. Can someone point me in the direction to further localize this problem. I need to have this up and running before morning, so am pretty desparate.
Please Help!!
Many thanks
Zalle