TreeCache error when node leaves network
knatarajan Dec 22, 2005 7:27 PMHi all,
Each developer in my team is running a single node cluster instance (jboss 4.0.1 on windows XP) on his/her machine by specifying a unique jboss partition name and a unique jboss cluster name. We can see messages on the jboss console that confirm that on each developer's instance that the number of cluster members=1. We also see messages on node A that its is discarding messages from Node B and the vice versa.
However, when any machine in the network that is running such a jboss instance disconnects from the network, all other servers see the following exception, where is the name of the machine that left :
19:28:43,300 INFO [STDOUT] CacheException while treeCache.put rsp=sender=:4387, retval=null, received=false, suspected=false
19:28:43,300 INFO [STDOUT] org.jboss.cache.lock.TimeoutException: rsp=sender=:4387, retval=null, received=false, suspected=false
19:28:43,300 INFO [STDOUT] at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:2235)
19:28:43,300 INFO [STDOUT] at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:2257)
19:28:43,300 INFO [STDOUT] at org.jboss.cache.interceptors.ReplicationInterceptor.invoke(ReplicationInterceptor.java:103)
19:28:43,300 INFO [STDOUT] at org.jboss.cache.TreeCache.invokeMethod(TreeCache.java:3132)
19:28:43,300 INFO [STDOUT] at org.jboss.cache.TreeCache.put(TreeCache.java:1812)
19:28:43,300 INFO [STDOUT] at org.jboss.cache.TreeCache.put(TreeCache.java:1795)
19:28:43,300 INFO [STDOUT] at edu.yale.its.tp.cas.ticket.ServiceTicketCache.storeTicket(ServiceTicketCache.java:139)
19:28:43,300 INFO [STDOUT] at edu.yale.its.tp.cas.ticket.ActiveTicketCache.addTicket(ActiveTicketCache.java:33)
19:28:43,300 INFO [STDOUT] at edu.yale.its.tp.cas.servlet.Login.grantForService(Login.java:201)
19:28:43,300 INFO [STDOUT] at edu.yale.its.tp.cas.servlet.Login.doGet(Login.java:167)
19:28:43,300 INFO [STDOUT] at edu.yale.its.tp.cas.servlet.Login.doPost(Login.java:86)
19:28:43,300 INFO [STDOUT] at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
Our cluster configuration is as follows :
<!-- UDP: if you have a multihomed machine,
set the bind_addr attribute to the appropriate NIC IP address, e.g bind_addr="192.168.0.2"
-->
<!-- UDP: On Windows machines, because of the media sense feature
being broken with multicast (even after disabling media sense)
set the loopback attribute to true -->
<UDP mcast_addr="230.1.2.3" mcast_port="45577"
ip_ttl="64" ip_mcast="true"
mcast_send_buf_size="150000" mcast_recv_buf_size="80000"
ucast_send_buf_size="150000" ucast_recv_buf_size="80000"
loopback="false"/>
<PING timeout="2000" num_initial_members="3"
up_thread="false" down_thread="false"/>
<MERGE2 min_interval="10000" max_interval="20000"/>
<FD_SOCK/>
<VERIFY_SUSPECT timeout="1500"
up_thread="false" down_thread="false"/>
<pbcast.NAKACK gc_lag="50" retransmit_timeout="600,1200,2400,4800"
max_xmit_size="8192" up_thread="false" down_thread="false"/>
<UNICAST timeout="600,1200,2400" window_size="100" min_threshold="10"
down_thread="false"/>
<pbcast.STABLE desired_avg_gossip="20000"
up_thread="false" down_thread="false"/>
<FRAG frag_size="8192"
down_thread="false" up_thread="false"/>
<pbcast.GMS join_timeout="5000" join_retry_timeout="2000"
shun="true" print_local_addr="true"/>
<pbcast.STATE_TRANSFER up_thread="true" down_thread="true"/>
Any pointers on why this would happen (even though each jboss is a different partition and different cluster) ?
Thanks,
K