6 Replies Latest reply on Aug 2, 2012 12:53 PM by zeituni

    Infinispan eviction: cluster gets out of sync

    zeituni

      Hi,

      I have recently ugraded the old jboss tree cache to the new infinispan infrastructure (5.1.5 Final). I have a system which is usually under quite heavy load and I need the eviction policy in order to limit and clean the cache once in a while. I have 2 clustered instances, each one gets the same configuration/ The problem is that once the eviction mechanizm begins on 1 cluster the other cluster does not get synchronized, and I have a situation that each cluster shows a different amount of records.  This is the tcp.xml configuration I use:

      <code lang="xml">

      <config xmlns="urn:org:jgroups"

              xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

              xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/JGroups-3.0.xsd">

          <TCP bind_port="20000"

               bind_addr="${bind.address}"

               loopback="false"

               discard_incompatible_packets="true"

               use_send_queues="true"

               sock_conn_timeout="300"

               conn_expire_time="1800000"

       

       

               timer_type="new"

               timer.min_threads="4"

               timer.max_threads="30"

               timer.keep_alive_time="3000"

               timer.queue_max_size="500"

       

       

               thread_pool.enabled="true"

               thread_pool.min_threads="1"

               thread_pool.max_threads="30"

               thread_pool.keep_alive_time="5000"

               thread_pool.queue_enabled="false"

               thread_pool.queue_max_size="100"

               thread_pool.rejection_policy="discard"

       

       

               oob_thread_pool.enabled="true"

               oob_thread_pool.min_threads="1"

               oob_thread_pool.max_threads="8"

               oob_thread_pool.keep_alive_time="5000"

               oob_thread_pool.queue_enabled="false"

               oob_thread_pool.queue_max_size="100"

               oob_thread_pool.rejection_policy="discard"/>

       

       

          <TCPPING timeout="2000"

                   initial_hosts="${active.storage.bind.address}[20000],${passive.storage.bind.address}[20000]"

                   port_range="2"

                   num_initial_members="3"/>

          <MERGE2  min_interval="10000"

                   max_interval="20000"/>

          <FD_SOCK/>

          <FD timeout="3000" max_tries="3" />

          <VERIFY_SUSPECT timeout="1500"  />

          <BARRIER />

          <pbcast.NAKACK use_mcast_xmit="false"

                         exponential_backoff="500"

                         discard_delivered_msgs="true"/>

          <UNICAST />

          <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000" max_bytes="4M"/>

          <pbcast.GMS print_local_addr="true" join_timeout="3000" view_bundling="true"/>

          <UFC max_credits="2M"

               min_threshold="0.4"/>

          <MFC max_credits="2M"

               min_threshold="0.4"/>

          <FRAG2 frag_size="8192" />

          <pbcast.STATE_TRANSFER/>

      </config>

       

      </code>

       

      The eviction configuration is done via java code:

       

      <code lang="java">

      Configuration configuration = new ConfigurationBuilder().clustering().

                                                                       cacheMode(CacheMode.REPL_ASYNC).async().replQueueInterval(3000).replQueueMaxElements(50).useReplQueue(false).

                                                                       stateTransfer().fetchInMemoryState(true).timeout(10000).eviction().strategy(EvictionStrategy.LIRS)

                                                                       .maxEntries(1000000).expiration().lifespan(-1).maxIdle(604800000).wakeUpInterval(86400000).build();

      </code>

       

       

      Can anyone tell me what am I missing here?

       

      Thanks!

        • 1. Re: Infinispan eviction: cluster gets out of sync
          vblagojevic

          Just to confirm that you want to have one million entries in cache? Do you ever fill it up? How do you observe that each cluster member has different number of records?

          • 2. Re: Infinispan eviction: cluster gets out of sync
            zeituni

            Yes, I have even more than one million records in the cache. This is why I want to set the limit to 1 million. And yes - it fills up: sometimes due to bugs / network failures records are not removed from the cache and this causes the cache to have too many entries.

            I observe each cluster by invoking cache.size() of the cluster instance via JMX operation.

            When I stop the load I see that only one cluster is evicting records while the other one is out of sync. When I start the load again I get diffierent amount of records in each cluster.

            • 3. Re: Infinispan eviction: cluster gets out of sync
              vblagojevic

              If "out of sync" you mean by that the cache.size() returns different values on different cache nodes - it is to be expected. Each node does eviction locally and eviction on node N is not a global event to be replicated/reproduced on some other node.

              1 of 1 people found this helpful
              • 4. Re: Infinispan eviction: cluster gets out of sync
                zeituni

                Thanks for that information. I might have missed it in the documentation.

                So how would you suggest to do eviction and synchronize all nodes?

                • 5. Re: Infinispan eviction: cluster gets out of sync
                  mircea.markus

                  As you're using an async replication cache I assume you can live with having the nodes *temporarily* out of sync.

                  The eviction might run with diffrenet timings so after a while, for the data you don't use, things should get in sync again. Reducing the wakeup interval would help with that as well.

                  It might also happen that you read data on one node (so that the idle time is reset on that node) and you don't read it on the other - at that point your cluster might get out of sync for longer periods of time.

                  If you want to enforce stricter sync between the nodes (just be sure that this is what you need in the first place, as that comes at some per cost) you can use the @CacheEntriesEvicted listener on on each entry eviction trigger an cluster remove(i.e. cache.remove).

                  1 of 1 people found this helpful
                  • 6. Re: Infinispan eviction: cluster gets out of sync
                    zeituni

                    Thanks Mircea! I will take this to consideration.

                    This information is very helpful.