1 Reply Latest reply on Aug 4, 2011 3:27 PM by ablevine1

    merge error in AS5.1

    ablevine1

      I am using jboss 5.1.0 with JGroups 2.6.10GA

      I have a cluster with 3 members configured to use the TCP stack config seen below: and it seems to work fine for a while but eventually merging fails and I end up with multiple clusters.

       

       

      <stack name="tcp"

                 description="TCP based stack, with flow control and message bundling.

                              TCP stacks are usually used when IP multicasting cannot

                              be used in a network, e.g. because it is disabled (e.g.

                              routers discard multicast)">

              <config>

                  <TCP

                       singleton_name="tcp"

                       start_port="${jboss.jgroups.tcp.tcp_port:7600}"

                       tcp_nodelay="true"

                       loopback="false"

                       recv_buf_size="20000000"

                       send_buf_size="640000"

                       discard_incompatible_packets="true"

                       max_bundle_size="64000"

                       max_bundle_timeout="30"

                       use_incoming_packet_handler="true"

                       enable_bundling="true"

                       use_send_queues="false"

                       sock_conn_timeout="300"

                       skip_suspected_members="true"

                       timer.num_threads="12"

                       enable_diagnostics="${jboss.jgroups.enable_diagnostics:true}"

                       diagnostics_addr="${jboss.jgroups.diagnostics_addr:224.0.0.75}"

                       diagnostics_port="${jboss.jgroups.diagnostics_port:7500}"

       

       

                       use_concurrent_stack="true"

       

       

                                     thread_pool.enabled="true"

                                     thread_pool.min_threads="20"

                                     thread_pool.max_threads="200"

                                     thread_pool.keep_alive_time="5000"

                                     thread_pool.queue_enabled="true"

                                     thread_pool.queue_max_size="1000"

                                     thread_pool.rejection_policy="discard"

       

       

                       oob_thread_pool.enabled="true"

                                     oob_thread_pool.min_threads="1"

                                     oob_thread_pool.max_threads="20"

                                     oob_thread_pool.keep_alive_time="5000"

                                     oob_thread_pool.queue_enabled="false"

                                     oob_thread_pool.queue_max_size="100"

                                     oob_thread_pool.rejection_policy="run"/>

                                <!-- Alternative 1: multicast-based automatic discovery. -->

                  <!-- Alternative 2: non multicast-based replacement for MPING. Requires a static configuration

                       of *all* possible cluster members.

                   -->

                  <TCPPING timeout="3000"

                           initial_hosts="${jgroups.tcpping.initial_hosts:jboss-batch-stage-1[7600],jboss-batch-stage-2[7600],jboss-batch-stage-3[7600]}"

                           port_range="1"

                           num_initial_members="2"/>

                  <MERGE2 max_interval="100000" min_interval="20000"/>

                  <FD_SOCK/>

                  <FD timeout="6000" max_tries="5" shun="true"/>

                  <VERIFY_SUSPECT timeout="1500"/>

                  <pbcast.NAKACK use_mcast_xmit="false" gc_lag="0"

                                 retransmit_timeout="300,600,1200,2400,4800"

                                 discard_delivered_msgs="true"/>

                  <UNICAST timeout="300,600,1200,2400,3600"/>

                  <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"

                                 max_bytes="400000"/>

                  <pbcast.GMS print_local_addr="true" join_timeout="3000"

                              shun="true"

                              view_bundling="true"

                              view_ack_collection_timeout="5000"/>

                  <FC max_credits="2000000" min_threshold="0.10"

                      ignore_synchronous_response="true"/>

                  <FRAG2 frag_size="60000"/>

                  <!-- pbcast.STREAMING_STATE_TRANSFER/ -->

                  <pbcast.STATE_TRANSFER/>

                  <pbcast.FLUSH timeout="0"/>

              </config>

          </stack>

       

      I see the following ERROR log statement repeatedly.

       

       

      ERROR [OOB-21202,10.10.67.81:7600][2011-07-31 09:12:22,290][org.jgroups.protocols.pbcast.GMS] CoordGmsImpl.java(217): merge_id ([10.10.67.81:7600|1312128725273]) or this.merge_id (null) is null (sender=10.10.67.81:7600).

       

       

      One thing to note is that the sender IP is the IP of the machine I'm seeing the error on.

      I see this on two of the 3 machines in the cluster

       

       

       

      Any idea what exactly this means and/or how I can fix it?

        • 1. Re: merge error in AS5.1
          ablevine1

          I also eventually see log statements similar to this after two separate clusters have formed:

          WARN  [Incoming-2,10.10.67.82:7600][2011-08-04 11:00:20,778][org.jgroups.protocols.pbcast.NAKACK] NAKACK.java(841): 10.10.67.82:7600] discarded message from non-member 10.10.67.81:7600, my view is [10.10.67.80:7600|1] [10.10.67.80:7600, 10.10.67.82:7600]