3 Replies Latest reply on Jun 6, 2011 11:08 AM by Mircea Markus

    pbcast.NACKACK   dropped message after reconnect network

    ghostho Newbie



      i have infinispan in two server. I put some entries and then i disconnect the network. After reconnect both caches find eachother but i get this message:


      2011-05-27 10:45:27,768 | WARN  | ,PC1-41273 | groups.protocols.pbcast.NAKACK  788 | PC1-41273: dropped messag

      e from PC1-25543 (not in table [PC1-41273]), view=[PC1-41273|2] [PC1-41273]


      and this message


      2011-05-27 10:45:27,783 | INFO  | ,PC1-41273 | n.util.logging.AbstractLogImpl   20 | Received new, MERGED clus

      ter view: MergeView::[PC1-25543|3] [PC1-25543, PC1-41273], subgroups=[[PC1-25543|2] [PC1-25543], [PC1-41273|2]



      The keys are not the same. Its not replicated.



      How can i fixed this ???



      my config



      <config xmlns="urn:org:jgroups" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

          xsi:schemaLocation="urn:org:jgroups file:schema/JGroups-2.8.xsd">


              TCP based stack, with flush, flow control and message bundling. This

              is usually used when IP multicasting cannot be used in a network, e.g.

              because it is disabled (routers discard multicast). Note that

              TCP.bind_addr and TCPPING.initial_hosts should be set, possibly via

              system properties, e.g. -Djgroups.bind_addr= and



          <TCP start_port="1164" loopback="true" recv_buf_size="20000000"

              send_buf_size="640000" discard_incompatible_packets="true"

              max_bundle_size="64000" max_bundle_timeout="30"

              use_incoming_packet_handler="true" enable_bundling="true"

              use_send_queues="false" sock_conn_timeout="300"

              skip_suspected_members="true" use_concurrent_stack="true"


              thread_pool.enabled="true" thread_pool.min_threads="1"

              thread_pool.max_threads="25" thread_pool.keep_alive_time="5000"

              thread_pool.queue_enabled="false" thread_pool.queue_max_size="100"

              thread_pool.rejection_policy="run" oob_thread_pool.enabled="true"

              oob_thread_pool.min_threads="1" oob_thread_pool.max_threads="8"

              oob_thread_pool.keep_alive_time="5000" oob_thread_pool.queue_enabled="false"

              oob_thread_pool.queue_max_size="100" oob_thread_pool.rejection_policy="run" />


          <TCPPING timeout="3000"


              port_range="1" num_initial_members="3" />


              <MPING bind_addr="${jgroups.bind_addr:}" break_on_coord_rsp="true"

            mcast_addr="${jgroups.udp.mcast_addr:}" mcast_port="${jgroups.udp.mcast_port:46655}" ip_ttl="${jgroups.udp.ip_ttl:2}"



          <MERGE2 max_interval="100000" min_interval="20000" />

          <FD_SOCK />

          <FD timeout="10000" max_tries="5" shun="true" />

          <VERIFY_SUSPECT timeout="1500" />

          <pbcast.NAKACK max_xmit_size="90000" use_mcast_xmit="false"

              gc_lag="0" retransmit_timeout="300,600,1200,2400,4800"

              discard_delivered_msgs="false" />

           <VIEW_SYNC avg_send_interval="10000"/>

          <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"

              max_bytes="400000" />

          <pbcast.GMS print_local_addr="true" join_timeout="3000"

              join_retry_timeout="2000" shun="false" view_bundling="true" />

          <FC max_credits="2000000" min_threshold="0.10" />

          <FRAG2 frag_size="60000" />

          <pbcast.STREAMING_STATE_TRANSFER />

          <!-- <pbcast.STATE_TRANSFER/> -->

          <pbcast.FLUSH timeout="0" />


        • 1. Re: pbcast.NACKACK   dropped message after reconnect network
          craig bomba Novice

          Sounds like it is working as designed actually.  You have a split brain scenario in this case.  If you look at Infinispan as Infinispan plus Jgroups the messages you are seeing may make more sense.  When you disconnected the network, the cluster must have detected this and was broken/separate.  JGroups was later able to heal the cluster (via the MERGED message), but Infinispan remains isolated between your 2 JVMs.  This is by design and I believe the basic principal here is that either JVM could have made changes to their caches while the cluster was broken.  Based on the idea that it would not be clear whose deltas were correct the caches remain separated.  This requires application knowledge to figure out which JVM may be correct.  I believe there are only 2 ways to fix this situation.  First, you can choose to restart one of your JVMs and get them to sync once again (via StateTransfer).  Or, you can reconnect to the cache from your application (effectively re-syncing to your other JVM).

          • 2. Re: pbcast.NACKACK   dropped message after reconnect network
            ghostho Newbie



            how can i reconnect the cache ?? I get a new Instance of the cache with my application. The problem is, when i get a reconnect of my network, the NAKACK has the wrong destination address. Because its only and the port. Its not the right address. How can i fix it ?? Can i reconnect manually or resync manually ?

            • 3. Re: pbcast.NACKACK   dropped message after reconnect network
              Mircea Markus Master

              How can i fixed this ???

              You can register a @Listener to be notified on view merges.



                 public static class MergedViewListener {



                    public volatile boolean merged;




                    public void mergedView(MergeEvent me) {

                       log.infof("View merged received %s", me);

                       merged = true;



              and then register it:

                    CacheManager cm = getCacheManager();

                    cm.addListener(new MergedViewListener());