3 Replies Latest reply on Oct 1, 2010 7:39 AM by rajsjha

    Problem starting JbossCache in a cluster with nodes on different machines

    rajsoni

      Hello Everyone!

       

      Am trying to use the Jboss cache outside the Jboss container(As standalone Java program).

       

      In my Java code,I just create and start the cache as:

       

      CacheFactory factory = new DefaultCacheFactory();
      Cache cache = factory.createCache("/usr/local/jbosscache/config-samples/test_total-replication.xml", false);

       

      The configuration file test_total-replication.xml is as follows:

       

      ***********************************************************************************************************************

       

      <?xml version="1.0" encoding="UTF-8"?>

       

      <jbosscache xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:jboss:jbosscache-core:config:3.2">

       

         <!-- Configure the TransactionManager -->

         <transaction transactionManagerLookupClass="org.jboss.cache.transaction.GenericTransactionManagerLookup"/>

       

         <clustering mode="replication" clusterName="testcluster">

            <!-- JGroups protocol stack properties. -->

            <jgroupsConfig>

               <UDP discard_incompatible_packets="true" enable_bundling="false" enable_diagnostics="false" ip_ttl="2"

                    loopback="true" max_bundle_size="64000" max_bundle_timeout="30" mcast_addr="228.10.10.10"

                    bind_addr="10.9.10.164" mcast_port="45588" mcast_recv_buf_size="25000000" mcast_send_buf_size="640000"

                    oob_thread_pool.enabled="true" oob_thread_pool.keep_alive_time="10000" oob_thread_pool.max_threads="4"

                    oob_thread_pool.min_threads="1" oob_thread_pool.queue_enabled="true" oob_thread_pool.queue_max_size="10"

                    oob_thread_pool.rejection_policy="Run" thread_naming_pattern="pl" thread_pool.enabled="true"

                    thread_pool.keep_alive_time="30000" thread_pool.max_threads="25" thread_pool.min_threads="1"

                    thread_pool.queue_enabled="true" thread_pool.queue_max_size="10" thread_pool.rejection_policy="Run"

                    tos="8" ucast_recv_buf_size="20000000" ucast_send_buf_size="640000" use_concurrent_stack="true"

                    use_incoming_packet_handler="true"/>

               <PING num_initial_members="3" timeout="2000"/>

               <MERGE2 max_interval="30000" min_interval="10000"/>

               <FD_SOCK/>

               <FD max_tries="5" shun="true" timeout="10000"/>

               <VERIFY_SUSPECT timeout="1500"/>

               <pbcast.NAKACK discard_delivered_msgs="true" gc_lag="0" retransmit_timeout="300,600,1200,2400,4800"

                              use_mcast_xmit="false"/>

               <UNICAST timeout="300,600,1200,2400,3600"/>

               <pbcast.STABLE desired_avg_gossip="50000" max_bytes="400000" stability_delay="1000"/>

               <pbcast.GMS join_timeout="5000" print_local_addr="true" shun="false" view_ack_collection_timeout="5000"

                           view_bundling="true"/>

               <FRAG2 frag_size="60000"/>

               <pbcast.STREAMING_STATE_TRANSFER/>

               <pbcast.FLUSH timeout="0"/>

       

            </jgroupsConfig>

       

            <sync />

            <!-- Alternatively, to use async replication, comment out the element above and uncomment the element below.  -->

            <!-- <async /> -->

       

         </clustering>

      </jbosscache>

       

      ********************************************************************************************************************************************************

      Then in another Java program(node2) I do the same thing.

       

      I see that if both my nodes are started on the same physical machine,then the nodes can see each other and join the cluster fine.If a node is started on a different physical machine,it starts fine but can not see the cluster(even though the cluster name is same).

       

      We are using the 2.8.1 GA jgroups.jar and jboss cache is 3.2.5.

       

      If I use the bind_addr attribute in UDP above to specify the respective machine names,then the two nodes come up fine on Machine1 but the third node can not be started on machine2.The machine1 in this case identifies machine2 node in this case though.

       

      PS : With an older version of jgroups.jar,(2.6) the nodes were coming up fine on two separate machines(with bind_addr attribute specified in UDP configuration) and were able to see each other in cluster but I needed to upgrade the jgroups.jar due to issues with the third node in that case as mentioned in the link below,there  was a fix in version 2.7 of jgroups.jar for that:

       

      https://jira.jboss.org/browse/JGRP-845

       

      Have been struggling with this for quite some time now,please help if you know anything about this problem.

       

      thanks!!!