0 Replies Latest reply on Oct 1, 2010 9:59 AM by raj jha

    Failure while joining jboss cache group/cluster

    raj jha Newbie



      Am using jboss cache in clustered way to be used accross the applications deployed on same or deiiferent jboss AS.

      Same multicast ip address is being used to connect to the group (cluster for caching service). This connection to the jboss caching service happens while application bootstraps.


      The problem occurs when one of the node (application/jboss AS) is restarted and it tries to join the group again for caching service. It hangs at this point and application does not come up. I could see the jboss AS pinging again and again to get the access to the group but fails.


      Following is the configuration with which all application starts up and forms a cluster with given multicast IP address.


      <attribute name="ClusterConfig">
                      <!-- Transport protocol: http://wiki.jboss.org/wiki/JGroupsUDP -->
                      <UDP mcast_addr="239.xxx.xxx.x" mcast_port="9011" ip_ttl="1" ip_mcast="true" mcast_send_buf_size="150000"
                          mcast_recv_buf_size="80000" ucast_send_buf_size="150000" ucast_recv_buf_size="80000" loopback="false" />
                      <!-- Discovery protocols: http://wiki.jboss.org/wiki/JGroupsPING -->
                      <PING timeout="2000" num_initial_members="3" up_thread="false" down_thread="false" />


                      <!--  Merging Protocols: http://wiki.jboss.org/wiki/JGroupsMERGE2  -->
                      <MERGE2 min_interval="10000" max_interval="20000" />


                      <!-- Failure Detection Protocols: http://wiki.jboss.org/wiki/JGroupsFD_SOCK -->
                      <FD_SOCK />


                      <!-- Failure Detection Protocols: http://wiki.jboss.org/wiki/JGroupsVERIFY_SUSPECT -->
                      <VERIFY_SUSPECT timeout="1500" up_thread="false" down_thread="false" />


                      <!-- Reliable Message Transmission Protocols: http://wiki.jboss.org/wiki/JGroupsPbcastNAKACK -->
                      <pbcast.NAKACK gc_lag="50" retransmit_timeout="600,1200,2400,4800" max_xmit_size="8192" up_thread="false"
                          down_thread="false" />


                      <!-- Reliable Message Transmission Protocols: http://wiki.jboss.org/wiki/JGroupsUNICAST-->
                      <UNICAST timeout="600,1200,2400" window_size="100" min_threshold="10" down_thread="false" />


                      <!-- Reliable Message Transmission Protocols: http://wiki.jboss.org/wiki/JGroupsPbcastSTABLE -->
                      <pbcast.STABLE desired_avg_gossip="20000" up_thread="false" down_thread="false" />


                      <!-- Group membership protocol: http://wiki.jboss.org/wiki/JGroupsPbcastGMS -->
                      <pbcast.GMS join_timeout="5000" join_retry_timeout="2000" shun="true" print_local_addr="true" />


                      <!-- Fragmentation protocol: http://wiki.jboss.org/wiki/JGroupsFRAG -->
                      <FRAG frag_size="8192" down_thread="false" up_thread="true" />


                      <!-- State tranfer protocol: http://wiki.jboss.org/wiki/JGroupsPbcastSTATE_TRANSFER -->
                      <pbcast.STATE_TRANSFER up_thread="true" down_thread="true" />



      Below is what i could see in logs:


      "ScannerThread" daemon prio=1 tid=0x0000002afedfe570 nid=0x48aa in Object.wait() [0x000000004175f000..0x0000000041766b30]


              at java.lang.Object.wait(Native Method)


              - waiting on <0x0000002aa809b708> (a org.jgroups.util.Promise)


              at java.lang.Object.wait(Object.java:474)


              at org.jgroups.util.Promise.doWait(Promise.java:100)


              at org.jgroups.util.Promise._getResultWithTimeout(Promise.java:52)


              at org.jgroups.util.Promise.getResultWithTimeout(Promise.java:28)


              - locked <0x0000002aa809b708> (a org.jgroups.util.Promise)


              at org.jgroups.util.Promise.getResult(Promise.java:77)


              at org.jgroups.JChannel.connect(JChannel.java:420)


              - locked <0x0000002aa6952538> (a org.jgroups.JChannel)

              at org.jboss.cache.TreeCache.startService(TreeCache.java:1482)



      Please suggest if am doing something problmatic.