1 Reply Latest reply on Feb 14, 2007 9:50 AM by David Webb

    Trouble getting clustering to work in a certain setup

    Jonas Heineson Newbie

      Hi,

      I have a problem getting clustering to work in a test environment using identical Jboss-configurations (4.0.4GA and JGroups 2.2.9.3) on two different machines (one Win XP and one Ubuntu Linux, kernel 2.6.15).

      Everything works fine when I first start the JBoss on Windows and then the JBoss on Linux, but if I do the other way around the Windows-JBoss wont join the cluster and prints these messages:

      2007-02-14 12:25:14,801 INFO [org.jboss.system.server.Server] JBoss (MX MicroKernel) [4.0.4.GA (build: CVSTag=JBoss_4_0_4_GA date=200605151000)] Started in 26s:794ms
      2007-02-14 12:25:17,536 INFO [org.jboss.cache.TreeCache] viewAccepted(): [172.30.153.65:34429|2] [172.30.153.65:34429, 172.30.153.46:1473, 172.30.153.46:1484]
      2007-02-14 12:25:17,536 INFO [org.jboss.cache.TreeCache] received the state (size=1024 bytes)
      2007-02-14 12:25:19,676 ERROR [org.jgroups.protocols.UNICAST] window_size is deprecated and will be ignored
      2007-02-14 12:25:19,676 ERROR [org.jgroups.protocols.UNICAST] min_threshold is deprecated and will be ignored
      2007-02-14 12:25:19,692 INFO [STDOUT]
      -------------------------------------------------------
      GMS: address is 172.30.153.46:1490
      -------------------------------------------------------
      2007-02-14 12:25:26,707 WARN [org.jgroups.protocols.pbcast.GMS] join(172.30.153.46:1490) sent to 172.30.153.65:34435 timed out, retrying
      2007-02-14 12:25:27,066 ERROR [org.jgroups.protocols.UNICAST] window_size is deprecated and will be ignored
      2007-02-14 12:25:27,066 ERROR [org.jgroups.protocols.UNICAST] min_threshold is deprecated and will be ignored
      2007-02-14 12:25:27,082 INFO [STDOUT]
      -------------------------------------------------------
      GMS: address is 172.30.153.46:1493
      -------------------------------------------------------
      2007-02-14 12:25:30,706 INFO [org.jboss.cache.TreeCache] viewAccepted(): [172.30.153.65:34435|2] [172.30.153.65:34435, 172.30.153.46:1486, 172.30.153.46:1490]
      2007-02-14 12:25:30,706 INFO [org.jboss.cache.TreeCache] received the state (size=1024 bytes)
      2007-02-14 12:25:34,112 WARN [org.jgroups.protocols.pbcast.GMS] join(172.30.153.46:1493) sent to 172.30.153.65:34429 timed out, retrying
      2007-02-14 12:25:40,221 ERROR [org.jgroups.protocols.UNICAST] window_size is deprecated and will be ignored
      2007-02-14 12:25:40,221 ERROR [org.jgroups.protocols.UNICAST] min_threshold is deprecated and will be ignored
      2007-02-14 12:25:40,237 INFO [STDOUT]
      -------------------------------------------------------
      GMS: address is 172.30.153.46:1496
      -------------------------------------------------------
      2007-02-14 12:25:43,127 WARN [org.jgroups.protocols.pbcast.GMS] join(172.30.153.46:1493) sent to 172.30.153.65:34429 timed out, retrying
      2007-02-14 12:25:47,267 WARN [org.jgroups.protocols.pbcast.GMS] join(172.30.153.46:1496) sent to 172.30.153.65:34435 timed out, retrying
      2007-02-14 12:25:52,142 WARN [org.jgroups.protocols.pbcast.GMS] join(172.30.153.46:1493) sent to 172.30.153.65:34429 timed out, retrying
      2007-02-14 12:25:56,282 WARN [org.jgroups.protocols.pbcast.GMS] join(172.30.153.46:1496) sent to 172.30.153.65:34435 timed out, retrying
      2007-02-14 12:25:57,032 INFO [org.jboss.cache.TreeCache] viewAccepted(): [172.30.153.65:34429|4] [172.30.153.65:34429, 172.30.153.46:1473, 172.30.153.46:1484, 172.30.153.46:1493]
      2007-02-14 12:25:57,032 INFO [org.jboss.cache.TreeCache] received the state (size=1024 bytes)
      2007-02-14 12:26:05,297 WARN [org.jgroups.protocols.pbcast.GMS] join(172.30.153.46:1496) sent to 172.30.153.65:34435 timed out, retrying
      2007-02-14 12:26:06,546 ERROR [org.jgroups.protocols.UNICAST] window_size is deprecated and will be ignored
      2007-02-14 12:26:06,546 ERROR [org.jgroups.protocols.UNICAST] min_threshold is deprecated and will be ignored
      2007-02-14 12:26:06,562 INFO [STDOUT]
      -------------------------------------------------------
      GMS: address is 172.30.153.46:1501
      -------------------------------------------------------
      2007-02-14 12:26:10,171 INFO [org.jboss.cache.TreeCache] viewAccepted(): [172.30.153.65:34435|4] [172.30.153.65:34435, 172.30.153.46:1486, 172.30.153.46:1490, 172.30.153.46:1496]
      2007-02-14 12:26:10,171 INFO [org.jboss.cache.TreeCache] received the state (size=1024 bytes)
      2007-02-14 12:26:13,577 WARN [org.jgroups.protocols.pbcast.GMS] join(172.30.153.46:1501) sent to 172.30.153.65:34429 timed out, retrying
      2007-02-14 12:26:19,686 ERROR [org.jgroups.protocols.UNICAST] window_size is deprecated and will be ignored
      2007-02-14 12:26:19,686 ERROR [org.jgroups.protocols.UNICAST] min_threshold is deprecated and will be ignored
      2007-02-14 12:26:19,701 INFO [STDOUT]
      -------------------------------------------------------
      GMS: address is 172.30.153.46:1504
      -------------------------------------------------------
      2007-02-14 12:26:22,592 WARN [org.jgroups.protocols.pbcast.GMS] join(172.30.153.46:1501) sent to 172.30.153.65:34429 timed out, retrying
      2007-02-14 12:26:26,701 WARN [org.jgroups.protocols.pbcast.GMS] join(172.30.153.46:1504) sent to 172.30.153.65:34435 timed out, retrying
      2007-02-14 12:26:31,607 WARN [org.jgroups.protocols.pbcast.GMS] join(172.30.153.46:1501) sent to 172.30.153.65:34429 timed out, retrying
      2007-02-14 12:26:35,716 WARN [org.jgroups.protocols.pbcast.GMS] join(172.30.153.46:1504) sent to 172.30.153.65:34435 timed out, retrying
      2007-02-14 12:26:40,606 WARN [org.jgroups.protocols.pbcast.GMS] join(172.30.153.46:1501) sent to 172.30.153.65:34429 timed out, retrying
      2007-02-14 12:26:44,715 WARN [org.jgroups.protocols.pbcast.GMS] join(172.30.153.46:1504) sent to 172.30.153.65:34435 timed out, retrying
      2007-02-14 12:26:49,636 WARN [org.jgroups.protocols.pbcast.GMS] join(172.30.153.46:1501) sent to 172.30.153.65:34429 timed out, retrying
      2007-02-14 12:26:53,745 WARN [org.jgroups.protocols.pbcast.GMS] join(172.30.153.46:1504) sent to 172.30.153.65:34435 timed out, retrying
      2007-02-14 12:26:58,651 WARN [org.jgroups.protocols.pbcast.GMS] join(172.30.153.46:1501) sent to 172.30.153.65:34429 timed out, retrying
      2007-02-14 12:27:02,760 WARN [org.jgroups.protocols.pbcast.GMS] join(172.30.153.46:1504) sent to 172.30.153.65:34435 timed out, retrying
      2007-02-14 12:27:07,666 WARN [org.jgroups.protocols.pbcast.GMS] join(172.30.153.46:1501) sent to 172.30.153.65:34429 timed out, retrying
      2007-02-14 12:27:11,775 WARN [org.jgroups.protocols.pbcast.GMS] join(172.30.153.46:1504) sent to 172.30.153.65:34435 timed out, retrying
      2007-02-14 12:27:16,696 WARN [org.jgroups.protocols.pbcast.GMS] join(172.30.153.46:1501) sent to 172.30.153.65:34429 timed out, retrying
      2007-02-14 12:27:20,696 INFO [org.jboss.cache.TreeCache] viewAccepted(): [172.30.153.65:34429|8] [172.30.153.65:34429, 172.30.153.46:1473, 172.30.153.46:1484, 172.30.153.46:1493, 172.30.153.46:1501]
      2007-02-14 12:27:20,696 INFO [org.jboss.cache.TreeCache] received the state (size=1024 bytes)
      2007-02-14 12:27:20,774 WARN [org.jgroups.protocols.pbcast.GMS] join(172.30.153.46:1504) sent to 172.30.153.65:34435 timed out, retrying
      2007-02-14 12:27:29,789 WARN [org.jgroups.protocols.pbcast.GMS] join(172.30.153.46:1504) sent to 172.30.153.65:34435 timed out, retrying
      2007-02-14 12:27:30,226 ERROR [org.jgroups.protocols.UNICAST] window_size is deprecated and will be ignored
      2007-02-14 12:27:30,226 ERROR [org.jgroups.protocols.UNICAST] min_threshold is deprecated and will be ignored
      2007-02-14 12:27:30,242 INFO [STDOUT]
      -------------------------------------------------------
      GMS: address is 172.30.153.46:1512
      -------------------------------------------------------

      and it goes on and on like this, and the Linux-JBoss prints a bunch of messages like these:

      2007-02-14 12:23:13,735 WARN [org.jgroups.protocols.pbcast.GMS] failed to collect all ACKs (5) for view [172.30.153.65:34435|8] after 20000ms, missing ACKs from [172.30.153.65:34435, 172.30.153.46:1486, 172.30.153.46:1490, 172.30.153.46:1496, 172.30.153.46:1504] (received=[]), local_addr=172.30.153.65:34435
      2007-02-14 12:23:13,736 WARN [org.jgroups.protocols.pbcast.Digest] entry for 172.30.153.46:1504 was overwritten with low=0, high=0, highest seen=-1

      What can be wrong?
      Also, when we try to connect another machine (Win XP, identical JBoss config) we also fail to do that, we get messages like viewAccepted and can see all nodes there (on all machines) but the new machine is still not connected to the cluster.

      cluster-sevice.xml:

      <UDP mcast_addr="${jboss.partition.udpGroup:228.1.2.3}" mcast_port="45588"
      ip_ttl="8" ip_mcast="true"
      mcast_send_buf_size="800000" mcast_recv_buf_size="150000"
      ucast_send_buf_size="800000" ucast_recv_buf_size="150000"
      loopback="true"/>
      <PING timeout="2000" num_initial_members="3"
      up_thread="true" down_thread="true"/>
      <MERGE2 min_interval="10000" max_interval="20000"/>
      <FD shun="true" up_thread="true" down_thread="true"
      timeout="2500" max_tries="5"/>
      <VERIFY_SUSPECT timeout="3000" num_msgs="3"
      up_thread="true" down_thread="true"/>
      <pbcast.NAKACK gc_lag="50" retransmit_timeout="300,600,1200,2400,4800"
      max_xmit_size="8192"
      up_thread="true" down_thread="true"/>
      <UNICAST timeout="300,600,1200,2400,4800" window_size="100" min_threshold="10"
      down_thread="true"/>
      <pbcast.STABLE desired_avg_gossip="20000"
      up_thread="true" down_thread="true"/>
      <FRAG frag_size="8192"
      down_thread="true" up_thread="true"/>
      <pbcast.GMS join_timeout="5000" join_retry_timeout="2000"
      shun="true" print_local_addr="true"/>
      <pbcast.STATE_TRANSFER up_thread="true" down_thread="true"/>


      tc5-cluster.sar/META-INF/jboss-service.xml:

      <UDP mcast_addr="230.1.2.7"
      mcast_port="45599"
      ucast_recv_buf_size="20000000"
      ucast_send_buf_size="640000"
      mcast_recv_buf_size="25000000"
      mcast_send_buf_size="640000"
      loopback="true"
      max_bundle_size="64000"
      max_bundle_timeout="30"
      use_incoming_packet_handler="true"
      use_outgoing_packet_handler="true"
      ip_ttl="2"
      down_thread="false" up_thread="false"
      enable_bundling="true"/>
      <PING timeout="2000"
      down_thread="false" up_thread="false" num_initial_members="3"/>
      <MERGE2 max_interval="100000"
      down_thread="false" up_thread="false" min_interval="20000"/>
      <FD shun="true" up_thread="false" down_thread="false"
      timeout="2500" max_tries="5"/>
      <VERIFY_SUSPECT timeout="1500"
      up_thread="false" down_thread="false"/>
      <pbcast.NAKACK max_xmit_size="60000"
      use_mcast_xmit="false" gc_lag="50"
      retransmit_timeout="100,200,300,600,1200,2400,4800"
      down_thread="false" up_thread="false"
      discard_delivered_msgs="true"/>
      <UNICAST timeout="300,600,1200,2400,3600"
      down_thread="false" up_thread="false"/>
      <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
      down_thread="false" up_thread="false"
      max_bytes="2100000"/>
      <pbcast.GMS print_local_addr="true" join_timeout="3000"
      down_thread="false" up_thread="false"
      join_retry_timeout="2000" shun="true"/>
      <!-- If your CacheMode is set to REPL_SYNC we recommend you
      comment out the FC (flow control) protocol -->
      <FC max_credits="10000000" down_thread="false" up_thread="false"
      min_threshold="0.20"/>
      <FRAG2 frag_size="60000" down_thread="false" up_thread="false"/>
      <pbcast.STATE_TRANSFER down_thread="false" up_thread="false"/>


      JBoss i started with -Djboss.partition.name=MatsProdPartition -Djboss.partition.udpGroup=228.1.2.4


      Regards
      Jonas Heineson

        • 1. Re: Trouble getting clustering to work in a certain setup
          David Webb Newbie

          I had this issue the other day. It has do do with machines with multiple NICs/IPAddresses.

          Edit the following files:
          $JBOSS_HOME/server/yourserver/deploy/cluster-service.xml
          $JBOSS_HOME/server/yourserver/deploy/tc5-cluster.sar/META-INF/jboss-service.xml

          Look for this comment where the Mutlicast settings are located (PartitionConfig and ClusterConfig respectively).

           <!--
           The default UDP stack:
           - If you have a multihomed machine, set the UDP protocol's bind_addr attribute to the
           appropriate NIC IP address, e.g bind_addr="192.168.0.2".
           - On Windows machines, because of the media sense feature being broken with multicast
           (even after disabling media sense) set the UDP protocol's loopback attribute to true
           -->
          


          As suggested by the comment, add the bind_addr="" attribute to the tag and specify the one(1) IP Address you want to bind to for multicasting.