1 Reply Latest reply on Aug 4, 2008 1:17 PM by manik

    Jboss Cluster with Window NLB

    ericncc

      Hi,
      I've setup 2 machine to run both MS NLB for load balancing and Jboss AS 4.2.2GA for Clustering.

      1st machine's IP address is 192.168.0.3
      2nd machine's IP address is 192.168.0.7
      The virtual IP exposed by the NLB for both machines is 192.168.0.225

      I've start the Jboss Server to run under all mode which as below:-
      run -b 0.0.0.0 -c all

      I've change the ClusterConfig for both all\deploy\cluster-service.xml and all\deploy\jboss-web-cluster.sar\META-INF\jboss-service.xml file to run under TCP setting as below :-
      ===============================================

      <TCP start_port="7810" loopback="true"
      tcp_nodelay="true"
      recv_buf_size="20000000"
      send_buf_size="640000"
      discard_incompatible_packets="true"
      enable_bundling="false"
      max_bundle_size="64000"
      max_bundle_timeout="30"
      use_incoming_packet_handler="true"
      use_outgoing_packet_handler="false"
      down_thread="false" up_thread="false"
      use_send_queues="false"
      sock_conn_timeout="300"
      skip_suspected_members="true"/>
      <TCPPING initial_hosts="192.168.0.3[7810],192.168.0.7[7810]" port_range="3"
      timeout="3000"
      down_thread="false" up_thread="false"
      num_initial_members="3"/>
      <MERGE2 max_interval="100000"
      down_thread="false" up_thread="false" min_interval="20000"/>
      <FD_SOCK down_thread="false" up_thread="false"/>
      <FD timeout="10000" max_tries="5" down_thread="false" up_thread="false" shun="true"/>
      <VERIFY_SUSPECT timeout="1500" down_thread="false" up_thread="false"/>
      <pbcast.NAKACK max_xmit_size="60000"
      use_mcast_xmit="false" gc_lag="0"
      retransmit_timeout="300,600,1200,2400,4800"
      down_thread="false" up_thread="false"
      discard_delivered_msgs="true"/>
      <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
      down_thread="false" up_thread="false"
      max_bytes="400000"/>
      <pbcast.GMS print_local_addr="true" join_timeout="3000"
      down_thread="false" up_thread="false"
      join_retry_timeout="2000" shun="true"
      view_bundling="true"/>
      <FC max_credits="2000000" down_thread="false" up_thread="false"
      min_threshold="0.10"/>
      <FRAG2 frag_size="60000" down_thread="false" up_thread="false"/>
      <pbcast.STATE_TRANSFER down_thread="false" up_thread="false" use_flush="false"/>

      ===============================================

      Everything seems ok when I startup both server. And I can see that both server was detect and join with each other in the same cluster.
      But I get the below error message keep showing in both server console.

      Error keep showing in machine 192.168.0.7
      ===============================================
      15:28:50,343 WARN [GMS] failed to collect all ACKs (1) for view [192.168.0.7:2872|1] [192.168.0.7:2872, 192.168.0.3:119
      9] after 5000ms, missing ACKs from [192.168.0.7:2872] (received=[]), local_addr=192.168.0.7:2872

      15:28:52,312 WARN [Digest] entry for 192.168.0.3:1199 was overwritten with low=0, high=0, highest seen=-1

      ===============================================


      Error keep showing in machine 192.168.0.3
      ===============================================
      -------------------------------------------------------
      GMS: address is 192.168.0.3:1219
      -------------------------------------------------------
      00:35:11,924 WARN [GMS] join(192.168.0.3:1219) sent to 192.168.0.7:2872 timed out, retrying
      00:35:15,950 INFO [TreeCache] viewAccepted(): [192.168.0.7:2872|7] [192.168.0.7:2872, 192.168.0.3:1199, 192.168.0.3:120
      9, 192.168.0.3:1214, 192.168.0.3:1219]
      00:35:16,020 WARN [STATE_TRANSFER] state received from 192.168.0.7:2872 is null, will return null state to application
      00:35:20,958 INFO [TreeCache] viewAccepted(): [192.168.0.7:2872|8] [192.168.0.7:2872, 192.168.0.3:1199, 192.168.0.3:120
      9, 192.168.0.3:1214, 192.168.0.3:1219]

      ===============================================

      It seems like the State was not transfer among the server.
      What I've doing wrong with my configuration? How to resolve the error message? your help is much appreciated! Thank You!

      Regards,
      Eric