2 Replies Latest reply on Sep 13, 2007 11:02 AM by brucespringfield

    Cluster members don't see each other

    nitesh

      Hello,

      I'm using JBoss AS 4.0.3 SP1 with JDK 1.5.10.

      I've setup a cluster of two JBoss instances with the partition name "DefaultPartition" and I pass the same IP Multicast address using the '-u' switch. However, the two JBoss instances don't see each other. I've not made any changes to the cluster-services.xml.

      When I start JBoss, I see the following log on each instance (I've blanked out the IP address):

      2007-09-06 15:54:34,370 INFO [org.jgroups.JChannel] JGroups version: 2.4.1
      2007-09-06 15:54:34,634 WARN [org.jgroups.JChannel] option GET_STATE_EVENTS has been deprecated (it is always true now); this option is ignored
      2007-09-06 15:54:37,006 DEBUG [org.jgroups.JChannel] cannot get state from myself (xx.xxx.xx.xxx:32900): probably the first member

      Also, when I look at the DefaultPartition MBean from the jmx console, I see that CurrentView contains IP address of only one node.

      Could anyone point me to what could be going on?

      Thanks,
      Nitesh

        • 1. Re: Cluster members don't see each other
          nitesh

          I ran the jgroups Probe utility on the cluster and this is what I found. There are two group-names "Tomcat-Cluster" and "DefaultPartition". While "Tomcat-Cluster" sees both the JBoss nodes, "DefaultPartition" only sees the local node. Obviously, there is some difference is the partition configuration between Tomcat-Cluster and DefaultPartition.

          However, I'm not sure how to decode the differences, so if someone can help me, it'll be much appreciated.

          Thanks,
          Nitesh

          java -cp jgroups-2.4.1.jar org.jgroups.tests.Probe -addr 239.0.1.113 -query jmx -query props
          ---------------------------------------------------------------------------
          -- send probe on /239.0.1.113:7500

          #1 (2235 bytes): 10.191.10.147:32912 (DefaultPartition)
          local_addr=10.191.10.147:32912
          group_name=DefaultPartition
          version=2.4.1, cvs="$Id: Version.java,v 1.42.2.1 2006/12/04 13:57:06 belaban Exp $"
          view: [10.191.10.147:32912|0] [10.191.10.147:32912]
          group_addr=239.0.1.113:45566
          stats:
          UNICAST={num_bytes_sent=1505, num_xmit_requests_received=0, num_acks_sent=1, num_msgs_sent=1, num_acks_received=1, num_msgs_received=1, num_bytes_received=1505}
          NAKACK={xmit_rsps_received=0, xmit_rsps_sent=0, missing_msgs_received=0, xmit_reqs_sent=0, sent_msgs=[381 - 435] (55), received_msgs=10.191.10.147:32912: received_msgs: [], delivered_msgs: [382 - 435] (size=53)
          , xmit_reqs_received=0}
          UDP={num_bytes_sent=113355, num_msgs_sent=954, num_msgs_received=954, num_bytes_received=1749}
          channel={received_bytes=0, sent_msgs=0, received_msgs=0, sent_bytes=0}

          props:

          <UDP mcast_port="45566"
          mcast_recv_buf_size="150000"
          mcast_send_buf_size="800000"
          mcast_addr="239.0.1.113"
          loopback="false"
          ip_mcast="true"
          ucast_recv_buf_size="150000"
          ip_ttl="8"
          ucast_send_buf_size="800000" />
          <PING num_initial_members="3"
          up_thread="true"
          timeout="2000"
          down_thread="true" />
          <MERGE2 max_interval="20000"
          min_interval="10000" />
          <FD max_tries="5" shun="true"
          up_thread="true"
          timeout="2500"
          down_thread="true" />
          <FD_SOCK up_thread="false"
          down_thread="false" />
          <VERIFY_SUSPECT num_msgs="3"
          up_thread="true"
          timeout="3000"
          down_thread="true" />
          <NAKACK max_xmit_size="8192"
          up_thread="true"
          retransmit_timeout="300,600,1200,2400,4800"
          down_thread="true"
          gc_lag="50" />
          <UNICAST min_threshold="10"
          window_size="100"
          timeout="300,600,1200,2400,4800"
          down_thread="true" />
          <STABLE up_thread="true"
          desired_avg_gossip="20000"
          down_thread="true" />
          <FRAG up_thread="true"
          frag_size="8192"
          down_thread="true" />
          <GMS shun="true"
          print_local_addr="true"
          join_timeout="5000"
          join_retry_timeout="2000" />
          <STATE_TRANSFER
          up_thread="true"
          down_thread="true" />


          #2 (2259 bytes): 10.191.10.147:32921 (Tomcat-Cluster)
          local_addr=10.191.10.147:32921
          group_name=Tomcat-Cluster
          version=2.4.1, cvs="$Id: Version.java,v 1.42.2.1 2006/12/04 13:57:06 belaban Exp $"
          view: MergeView::[10.191.10.147:32921|1] [10.191.10.147:32921, 10.191.10.148:37308], subgroups=[[10.191.10.147:32921|0] [10.191.10.147:32921], [10.191.10.148:37308|0] [10.191.10.148:37308]]
          group_addr=239.0.1.113:45577
          stats:
          UNICAST={num_bytes_sent=0, num_xmit_requests_received=0, num_acks_sent=4, num_msgs_sent=6, num_acks_received=6, num_msgs_received=4, num_bytes_received=0}
          NAKACK={xmit_rsps_received=0, xmit_rsps_sent=0, missing_msgs_received=0, xmit_reqs_sent=0, sent_msgs=[114 - 239] (126), received_msgs=10.191.10.148:37308: received_msgs: [], delivered_msgs: [null - null]
          10.191.10.147:32921: received_msgs: [], delivered_msgs: [115 - 239] (size=124)
          , xmit_reqs_received=0}
          UDP={num_bytes_sent=62081, num_msgs_sent=771, num_msgs_received=1715, num_bytes_received=0}
          channel={received_bytes=0, sent_msgs=0, received_msgs=0, sent_bytes=0}

          props:

          <UDP mcast_port="45577"
          mcast_recv_buf_size="80000"
          mcast_send_buf_size="150000"
          mcast_addr="239.0.1.113"
          loopback="false"
          ip_mcast="true"
          ucast_recv_buf_size="80000"
          ip_ttl="8"
          ucast_send_buf_size="150000" />
          <PING num_initial_members="3"
          up_thread="false"
          timeout="2000"
          down_thread="false" />
          <MERGE2 max_interval="20000"
          min_interval="10000" />
          <FD_SOCK />
          <VERIFY_SUSPECT
          up_thread="false"
          timeout="1500"
          down_thread="false" />
          <NAKACK max_xmit_size="8192"
          up_thread="false"
          retransmit_timeout="600,1200,2400,4800"
          down_thread="false"
          gc_lag="50" />
          <UNICAST min_threshold="10"
          window_size="100"
          timeout="600,1200,2400"
          down_thread="false" />
          <STABLE up_thread="false"
          desired_avg_gossip="20000"
          down_thread="false" />
          <FRAG up_thread="false"
          frag_size="8192"
          down_thread="false" />
          <GMS shun="true"
          print_local_addr="true"
          join_timeout="5000"
          join_retry_timeout="2000" />
          <STATE_TRANSFER
          up_thread="true"
          down_thread="true" />



          • 2. Re: Cluster members don't see each other
            brucespringfield

            "Tomcat-Cluster" sounds like the web session replication service.