1 Reply Latest reply on May 21, 2010 3:21 AM by praveen.kumar

    JBoss Cluster with TCP - Discarded messages

    patrickheinzelmann
      I'm trying to create a Jboss Cluster using TCP and using two Windows PCs with Jboss 5.1 GA. I have to use TCP, because the hoster doesn't support UCP.
      The servers are creating a cluster with two members, but one node (192.168.0.5) discards the messages of the other node (192.168.0.7).


      The following lines are in the server.log files.

      Logfile of 192.168.0.5:
      2010-02-27 21:37:14,421 ERROR [org.jgroups.protocols.pbcast.NAKACK] (Incoming-4,192.168.0.5:7650) sender 192.168.0.7:7650 not found in xmit_table
      2010-02-27 21:37:14,421 ERROR [org.jgroups.protocols.pbcast.NAKACK] (Incoming-4,192.168.0.5:7650) range is null
      2010-02-27 21:37:14,437 INFO  [org.jboss.cache.RPCManagerImpl] (Incoming-4,192.168.0.5:7650) Received new cluster view: MergeView::[192.168.0.5:7650|5] [192.168.0.5:7650, 192.168.0.7:7650], subgroups=[[192.168.0.5:7650|0] [192.168.0.5:7650], [192.168.0.7:7650|4] [192.168.0.7:7650]]
      2010-02-27 21:37:15,843 INFO  [org.jboss.web.tomcat.service.deployers.TomcatDeployment] (main) deploy, ctxPath=/
      2010-02-27 21:37:16,000 INFO  [org.jboss.web.tomcat.service.deployers.TomcatDeployment] (main) deploy, ctxPath=/jmx-console
      2010-02-27 21:37:16,375 INFO  [org.apache.coyote.http11.Http11Protocol] (main) Starting Coyote HTTP/1.1 on http-0.0.0.0-8080
      2010-02-27 21:37:16,421 INFO  [org.apache.coyote.ajp.AjpProtocol] (main) Starting Coyote AJP/1.3 on ajp-0.0.0.0-8009
      2010-02-27 21:37:16,437 INFO  [org.jboss.bootstrap.microcontainer.ServerImpl] (main) JBoss (Microcontainer) [5.1.0.GA (build: SVNTag=JBoss_5_1_0_GA date=200905221053)] Started in 1m:52s:266ms
      2010-02-27 21:37:21,296 WARN  [org.jgroups.protocols.pbcast.NAKACK] (OOB-5,192.168.0.5:3434) 192.168.0.5:3434] discarded message from non-member 192.168.0.7:1491, my view is [192.168.0.5:3434|0] [192.168.0.5:3434]
      2010-02-27 21:37:21,828 WARN  [org.jgroups.protocols.pbcast.NAKACK] (OOB-5,192.168.0.5:3434) 192.168.0.5:3434] discarded message from non-member 192.168.0.7:1491, my view is [192.168.0.5:3434|0] [192.168.0.5:3434]
      2010-02-27 21:37:23,218 WARN  [org.jgroups.protocols.pbcast.NAKACK] (OOB-4,192.168.0.5:7650) 192.168.0.5:7650] discarded message from non-member 192.168.0.7:7650, my view is [192.168.0.5:7650|0] [192.168.0.5:7650]
      2010-02-27 21:37:23,859 WARN  [org.jgroups.protocols.pbcast.NAKACK] (OOB-4,192.168.0.5:7650) 192.168.0.5:7650] discarded message from non-member 192.168.0.7:7650, my view is [192.168.0.5:7650|0] [192.168.0.5:7650]
      2010-02-27 21:38:09,437 WARN  [org.jgroups.protocols.pbcast.NAKACK] (OOB-5,192.168.0.5:3434) 192.168.0.5:3434] discarded message from non-member 192.168.0.7:1491, my view is [192.168.0.5:3434|0] [192.168.0.5:3434]
      2010-02-27 21:38:09,781 WARN  [org.jgroups.protocols.pbcast.NAKACK] (OOB-5,192.168.0.5:3434) 192.168.0.5:3434] discarded message from non-member 192.168.0.7:1491, my view is [192.168.0.5:3434|0] [192.168.0.5:3434]
      2010-02-27 21:38:21,937 WARN  [org.jgroups.protocols.pbcast.NAKACK] (Incoming-6,192.168.0.5:7650) 192.168.0.5:7650] discarded message from non-member 192.168.0.7:7650, my view is [192.168.0.5:7650|0] [192.168.0.5:7650]
      2010-02-27 21:38:21,937 ERROR [org.jgroups.protocols.pbcast.NAKACK] (Incoming-2,192.168.0.5:7650) sender 192.168.0.7:7650 not found in xmit_table
      2010-02-27 21:38:21,937 ERROR [org.jgroups.protocols.pbcast.NAKACK] (Incoming-2,192.168.0.5:7650) range is null
      2010-02-27 21:38:21,953 INFO  [org.jboss.ha.framework.interfaces.HAPartition.lifecycle.TESTPART] (Incoming-2,192.168.0.5:7650) New cluster view for partition TESTPART (id: 5, delta: 1) : [192.168.0.5:1099, 192.168.0.7:1099]
      2010-02-27 21:38:21,937 WARN  [org.jgroups.protocols.pbcast.NAKACK] (Incoming-4,192.168.0.5:7650) 192.168.0.5:7650] discarded message from non-member 192.168.0.7:7650, my view is [192.168.0.5:7650|0] [192.168.0.5:7650]


      Logfile of 192.168.0.7:
      2010-02-27 21:36:25,147 ERROR [org.jgroups.protocols.pbcast.NAKACK] (Incoming-2,192.168.0.7:7650) sender 192.168.0.5:7650 not found in xmit_table
      2010-02-27 21:36:25,147 ERROR [org.jgroups.protocols.pbcast.NAKACK] (Incoming-2,192.168.0.7:7650) range is null
      2010-02-27 21:36:25,147 INFO  [org.jboss.cache.RPCManagerImpl] (Incoming-2,192.168.0.7:7650) Received new cluster view: MergeView::[192.168.0.5:7650|5] [192.168.0.5:7650, 192.168.0.7:7650], subgroups=[[192.168.0.5:7650|0] [192.168.0.5:7650], [192.168.0.7:7650|4] [192.168.0.7:7650]]
      2010-02-27 21:36:28,147 WARN  [org.jgroups.protocols.FD_SOCK] (OOB-96,192.168.0.7:7650) I was suspected by 192.168.0.5:7650; ignoring the SUSPECT message
      2010-02-27 21:37:26,585 WARN  [org.jgroups.protocols.pbcast.NAKACK] (OOB-98,192.168.0.7:7650) 192.168.0.7:7650] discarded message from non-member 192.168.0.5:7650, my view is [192.168.0.7:7650|4] [192.168.0.7:7650]
      2010-02-27 21:37:27,288 WARN  [org.jgroups.protocols.pbcast.NAKACK] (OOB-99,192.168.0.7:7650) 192.168.0.7:7650] discarded message from non-member 192.168.0.5:7650, my view is [192.168.0.7:7650|4] [192.168.0.7:7650]
      2010-02-27 21:37:32,569 WARN  [org.jgroups.protocols.pbcast.NAKACK] (Incoming-6,192.168.0.7:7650) 192.168.0.7:7650] discarded message from non-member 192.168.0.5:7650, my view is [192.168.0.7:7650|4] [192.168.0.7:7650]
      2010-02-27 21:37:32,585 ERROR [org.jgroups.protocols.pbcast.NAKACK] (Incoming-2,192.168.0.7:7650) sender 192.168.0.5:7650 not found in xmit_table
      2010-02-27 21:37:32,585 ERROR [org.jgroups.protocols.pbcast.NAKACK] (Incoming-2,192.168.0.7:7650) range is null



      As configuration of cluster, I'm using the standard "all" config. To start the cluster, I'm using following commands.

      start command of 192.168.0.5 :
      run.bat -c all -b 0.0.0.0 -Djboss.partition.name="TESTPART" -Djboss.default.jgroups.stack="tcp-sync" -Djgroups.bind_address="192.168.0.5"

      start command of 192.168.0.7:
      run.bat -c all -b 0.0.0.0 -Djboss.partition.name="TESTPART" -Djboss.default.jgroups.stack="tcp-sync" -Djgroups.bind_address="192.168.0.7"


      I made small changes at the tcp-sync config in the jgroups-channelfactory-stacks.xml. I changed the ping method from MPING to TCPPING and initial_hosts.

      part of the jgroups-channelfactory-stacks.xml of 192.168.0.5:


          <stack name="tcp-sync"
                 description="TCP based stack, without flow control and without
                              message bundling. TCP stacks are usually used when IP
                              multicasting cannot be used in a network (e.g.routers
                              discard multicast). This configuration should be used
                              instead of 'tcp' above when (1) synchronous calls are
                              used and (2) the message volume (rate and size) is not
                              that large.">
              <config>
                  <TCP
                       singleton_name="tcp_sync"
                       start_port="${jboss.jgroups.tcp_sync.tcp_port:7650}"
                       tcp_nodelay="true"
                       loopback="false"
                       recv_buf_size="20000000"
                       send_buf_size="640000"
                       discard_incompatible_packets="true"
                       max_bundle_size="64000"
                       max_bundle_timeout="30"
                       use_incoming_packet_handler="true"
                       enable_bundling="false"
                       use_send_queues="false"
                       sock_conn_timeout="300"
                       skip_suspected_members="true"
                       enable_diagnostics="${jboss.jgroups.enable_diagnostics:true}"
                       diagnostics_addr="${jboss.jgroups.diagnostics_addr:224.0.0.75}"
                       diagnostics_port="${jboss.jgroups.diagnostics_port:7500}"
                     
                       use_concurrent_stack="true"
                     
                       thread_pool.enabled="true"
                         thread_pool.min_threads="8"
                         thread_pool.max_threads="200"
                         thread_pool.keep_alive_time="5000"
                         thread_pool.queue_enabled="true"
                         thread_pool.queue_max_size="1000"
                         thread_pool.rejection_policy="discard"
            
                         oob_thread_pool.enabled="true"
                         oob_thread_pool.min_threads="1"
                         oob_thread_pool.max_threads="8"
                         oob_thread_pool.keep_alive_time="5000"
                         oob_thread_pool.queue_enabled="false"
                         oob_thread_pool.queue_max_size="100"
                         oob_thread_pool.rejection_policy="run"/>
                  <!-- Alternative 1: multicast-based automatic discovery. --> 
                  <!--<MPING timeout="3000"
                         num_initial_members="3"
                         mcast_addr="${jboss.partition.udpGroup:231.11.11.11}"
                         mcast_port="${jboss.jgroups.tcp_sync.mping_mcast_port:45701}"
                         ip_ttl="${jgroups.udp.ip_ttl:2}"/>-->         
                  <!-- Alternative 2: non multicast-based replacement for MPING. Requires a static configuration
                       of all possible cluster members.-->
                  <TCPPING timeout="3000"
                           initial_hosts="${jgroups.tcpping.initial_hosts:192.168.0.5[7650],192.168.0.7[7650]}"
                           port_range="1"
                           num_initial_members="2"/>

                  <MERGE2 max_interval="100000" min_interval="20000"/>
                  <FD_SOCK/>
                  <FD timeout="6000" max_tries="5" shun="true"/>
                  <VERIFY_SUSPECT timeout="1500"/>
                  <pbcast.NAKACK use_mcast_xmit="false" gc_lag="0"
                                 retransmit_timeout="300,600,1200,2400,4800"
                                 discard_delivered_msgs="true"/>
                  <UNICAST timeout="300,600,1200,2400,3600"/>
                  <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
                                 max_bytes="400000"/>
                  <pbcast.GMS print_local_addr="true" join_timeout="3000"
                              shun="true"
                              view_bundling="true"
                              view_ack_collection_timeout="5000"/>
                  <!-- pbcast.STREAMING_STATE_TRANSFER/ -->
                  <pbcast.STATE_TRANSFER/>
                  <pbcast.FLUSH timeout="0"/>
              </config>
          </stack>


      part of the jgroups-channelfactory-stacks.xml of 192.168.0.7:


          <stack name="tcp-sync"
                 description="TCP based stack, without flow control and without
                              message bundling. TCP stacks are usually used when IP
                              multicasting cannot be used in a network (e.g.routers
                              discard multicast). This configuration should be used
                              instead of 'tcp' above when (1) synchronous calls are
                              used and (2) the message volume (rate and size) is not
                              that large.">
              <config>
                  <TCP
                       singleton_name="tcp_sync"
                       start_port="${jboss.jgroups.tcp_sync.tcp_port:7650}"
                       tcp_nodelay="true"
                       loopback="false"
                       recv_buf_size="20000000"
                       send_buf_size="640000"
                       discard_incompatible_packets="true"
                       max_bundle_size="64000"
                       max_bundle_timeout="30"
                       use_incoming_packet_handler="true"
                       enable_bundling="false"
                       use_send_queues="false"
                       sock_conn_timeout="300"
                       skip_suspected_members="true"
                       enable_diagnostics="${jboss.jgroups.enable_diagnostics:true}"
                       diagnostics_addr="${jboss.jgroups.diagnostics_addr:224.0.0.75}"
                       diagnostics_port="${jboss.jgroups.diagnostics_port:7500}"
                     
                       use_concurrent_stack="true"
                     
                       thread_pool.enabled="true"
                         thread_pool.min_threads="8"
                         thread_pool.max_threads="200"
                         thread_pool.keep_alive_time="5000"
                         thread_pool.queue_enabled="true"
                         thread_pool.queue_max_size="1000"
                         thread_pool.rejection_policy="discard"
            
                         oob_thread_pool.enabled="true"
                         oob_thread_pool.min_threads="1"
                         oob_thread_pool.max_threads="8"
                         oob_thread_pool.keep_alive_time="5000"
                         oob_thread_pool.queue_enabled="false"
                         oob_thread_pool.queue_max_size="100"
                         oob_thread_pool.rejection_policy="run"/>
                  <!-- Alternative 1: multicast-based automatic discovery. --> 
                  <!--<MPING timeout="3000"
                         num_initial_members="3"
                         mcast_addr="${jboss.partition.udpGroup:231.11.11.11}"
                         mcast_port="${jboss.jgroups.tcp_sync.mping_mcast_port:45701}"
                         ip_ttl="${jgroups.udp.ip_ttl:2}"/>-->         
                  <!-- Alternative 2: non multicast-based replacement for MPING. Requires a static configuration
                       of all possible cluster members.-->
                  <TCPPING timeout="3000"
                           initial_hosts="${jgroups.tcpping.initial_hosts:192.168.0.7[7650],192.168.0.5[7650]}"
                           port_range="1"
                           num_initial_members="2"/>



                  <MERGE2 max_interval="100000" min_interval="20000"/>
                  <FD_SOCK/>
                  <FD timeout="6000" max_tries="5" shun="true"/>
                  <VERIFY_SUSPECT timeout="1500"/>
                  <pbcast.NAKACK use_mcast_xmit="false" gc_lag="0"
                                 retransmit_timeout="300,600,1200,2400,4800"
                                 discard_delivered_msgs="true"/>
                  <UNICAST timeout="300,600,1200,2400,3600"/>
                  <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
                                 max_bytes="400000"/>
                  <pbcast.GMS print_local_addr="true" join_timeout="3000"
                              shun="true"
                              view_bundling="true"
                              view_ack_collection_timeout="5000"/>
                  <!-- pbcast.STREAMING_STATE_TRANSFER/ -->
                  <pbcast.STATE_TRANSFER/>
                  <pbcast.FLUSH timeout="0"/>
              </config>
          </stack>


      Are there some additional steps to create a cluster? What I'm doing wrong?