2 Replies Latest reply on Nov 6, 2008 12:08 PM by agk

    JBoss 4.2 TCP Clustering problem

    agk

      I am upgrading JBoss from 4.0.5 to 4.2.3 and unable to set up clustering on the new version. I have spent quite a bit of time reading clustering FAQ and Wiki\JbossHA. I was unable to find a good TCP clustering example for 4.2.x. I have JBoss installed on 2 windows boxes and starting it with -c all. I cannot get the JGroups layer to work - nodes don't discover each other: the log on each box always show:

      2008-11-05 15:40:07,263 INFO [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] Number of cluster members: 1
      2008-11-05 15:40:07,263 INFO [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] Other members: 0
      2008-11-05 15:40:07,263 INFO [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] Fetching state (will wait for 30000 milliseconds):
      2008-11-05 15:40:14,326 INFO [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] State could not be retrieved (we are the first member in group)

      I am using TCP transport and my config in cluster-service.xml is like this:

      <TCP bind_addr="box1" start_port="7800" loopback="true"
      tcp_nodelay="true"
      recv_buf_size="20000000"
      send_buf_size="640000"
      discard_incompatible_packets="true"
      enable_bundling="false"
      max_bundle_size="64000"
      max_bundle_timeout="30"
      use_incoming_packet_handler="true"
      use_outgoing_packet_handler="false"
      down_thread="false" up_thread="false"
      use_send_queues="false"
      sock_conn_timeout="300"
      skip_suspected_members="true"/>
      <TCPPING initial_hosts="box1[7800],box2[7800]" port_range="3"
      timeout="3000"
      down_thread="true" up_thread="true"
      num_initial_members="3"/>
      <MERGE2 max_interval="100000"
      down_thread="true" up_thread="true" min_interval="50000"/>
      <FD_SOCK down_thread="true" up_thread="true"/>
      <FD timeout="10000" max_tries="5" down_thread="true" up_thread="true" shun="true"/>
      <VERIFY_SUSPECT timeout="1500" down_thread="false" up_thread="false"/>
      <pbcast.NAKACK max_xmit_size="60000"
      use_mcast_xmit="false" gc_lag="100"
      retransmit_timeout="300,600,1200,2400,4800"
      down_thread="true" up_thread="true"
      discard_delivered_msgs="true"/>
      <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
      down_thread="false" up_thread="false"
      max_bytes="400000"/>
      <pbcast.GMS print_local_addr="true" join_timeout="3000"
      down_thread="true" up_thread="true"
      join_retry_timeout="2000" shun="false"
      view_bundling="true"/>
      <pbcast.STATE_TRANSFER down_thread="true" up_thread="true" use_flush="true"/>


      Please help!
      A working config example for JBoss 4.2 would help a lot.

      Thank you,
      Alex

        • 1. Re: JBoss 4.2 TCP Clustering problem
          oosie

          Try to start each node with "-b 0.0.0.0 -c all".

          • 2. Re: JBoss 4.2 TCP Clustering problem
            agk

            Thank you, oozie - this moved me forward a bit. However now I am getting other errors, resulting in DefaultPartition failure:

            2008-11-06 09:50:41,351 WARN [org.jgroups.blocks.ConnectionTable] packet from /192.168.5.41:3954 has different version (12338) from ours (4353). This may cause problems
            2008-11-06 09:50:41,361 WARN [org.jgroups.blocks.ConnectionTable] exception is java.net.UnknownHostException: addr is of illegal length
            2008-11-06 09:50:41,831 INFO [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] Number of cluster members: 2
            2008-11-06 09:50:41,841 INFO [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] Other members: 1
            2008-11-06 09:50:41,841 INFO [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] Fetching state (will wait for 30000 milliseconds):
            2008-11-06 09:50:45,867 WARN [org.jgroups.protocols.pbcast.STATE_TRANSFER] Initiator of flush and state requesting member 192.168.5.42:7800 timed out waiting for flush responses after 4000 msec
            2008-11-06 09:50:50,914 WARN [org.jgroups.protocols.pbcast.STATE_TRANSFER] Initiator of flush and state requesting member 192.168.5.42:7800 timed out waiting for flush responses after 4000 msec
            2008-11-06 09:50:56,192 WARN [org.jgroups.protocols.pbcast.STATE_TRANSFER] Initiator of flush and state requesting member 192.168.5.42:7800 timed out waiting for flush responses after 4000 msec
            2008-11-06 09:51:03,172 WARN [org.jgroups.protocols.pbcast.STATE_TRANSFER] Initiator of flush and state requesting member 192.168.5.42:7800 timed out waiting for flush responses after 4000 msec
            2008-11-06 09:51:07,308 WARN [org.jgroups.protocols.pbcast.STATE_TRANSFER] Initiator of flush and state requesting member 192.168.5.42:7800 timed out waiting for flush responses after 4000 msec
            2008-11-06 09:51:12,946 WARN [org.jgroups.protocols.pbcast.STATE_TRANSFER] Initiator of flush and state requesting member 192.168.5.42:7800 timed out waiting for flush responses after 4000 msec
            2008-11-06 09:51:14,038 WARN [org.jboss.system.ServiceController] Problem starting service jboss:service=DefaultPartition
            java.lang.IllegalStateException: Initial state transfer failed: Channel.getState() returned false
            .....

            And then:

            2008-11-06 09:52:22,726 ERROR [org.jboss.deployment.scanner.URLDeploymentScanner] Incomplete Deployment listing:

            --- MBeans waiting for other MBeans ---
            ObjectName: jboss:service=DefaultPartition
            State: FAILED
            Reason: java.lang.IllegalStateException: Initial state transfer failed: Channel.getState() returned false