6 Replies Latest reply on Aug 2, 2012 12:53 PM by zeituni

Infinispan eviction: cluster gets out of sync

zeituni Jul 31, 2012 3:55 PM

Hi,

I have recently ugraded the old jboss tree cache to the new infinispan infrastructure (5.1.5 Final). I have a system which is usually under quite heavy load and I need the eviction policy in order to limit and clean the cache once in a while. I have 2 clustered instances, each one gets the same configuration/ The problem is that once the eviction mechanizm begins on 1 cluster the other cluster does not get synchronized, and I have a situation that each cluster shows a different amount of records. This is the tcp.xml configuration I use:

<config xmlns="urn:org:jgroups"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/JGroups-3.0.xsd">

<TCP bind_port="20000"

bind_addr="${bind.address}"

loopback="false"

discard_incompatible_packets="true"

use_send_queues="true"

sock_conn_timeout="300"

conn_expire_time="1800000"

timer_type="new"

timer.min_threads="4"

timer.max_threads="30"

timer.keep_alive_time="3000"

timer.queue_max_size="500"

thread_pool.enabled="true"

thread_pool.min_threads="1"

thread_pool.max_threads="30"

thread_pool.keep_alive_time="5000"

thread_pool.queue_enabled="false"

thread_pool.queue_max_size="100"

thread_pool.rejection_policy="discard"

oob_thread_pool.enabled="true"

oob_thread_pool.min_threads="1"

oob_thread_pool.max_threads="8"

oob_thread_pool.keep_alive_time="5000"

oob_thread_pool.queue_enabled="false"

oob_thread_pool.queue_max_size="100"

oob_thread_pool.rejection_policy="discard"/>

<TCPPING timeout="2000"

initial_hosts="${active.storage.bind.address}[20000],${passive.storage.bind.address}[20000]"

port_range="2"

num_initial_members="3"/>

<MERGE2 min_interval="10000"

max_interval="20000"/>

<FD_SOCK/>

<VERIFY_SUSPECT timeout="1500" />

<pbcast.NAKACK use_mcast_xmit="false"

exponential_backoff="500"

discard_delivered_msgs="true"/>

<pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000" max_bytes="4M"/>

<pbcast.GMS print_local_addr="true" join_timeout="3000" view_bundling="true"/>

<UFC max_credits="2M"

min_threshold="0.4"/>

<MFC max_credits="2M"

min_threshold="0.4"/>

<pbcast.STATE_TRANSFER/>

</config>

</code>

The eviction configuration is done via java code:

Configuration configuration = new ConfigurationBuilder().clustering().

cacheMode(CacheMode.REPL_ASYNC).async().replQueueInterval(3000).replQueueMaxElements(50).useReplQueue(false).

stateTransfer().fetchInMemoryState(true).timeout(10000).eviction().strategy(EvictionStrategy.LIRS)

.maxEntries(1000000).expiration().lifespan(-1).maxIdle(604800000).wakeUpInterval(86400000).build();

</code>

Can anyone tell me what am I missing here?

Thanks!

1. Re: Infinispan eviction: cluster gets out of sync

vblagojevic Jul 31, 2012 5:49 PM (in response to zeituni)

Just to confirm that you want to have one million entries in cache? Do you ever fill it up? How do you observe that each cluster member has different number of records?
Actions
2. Re: Infinispan eviction: cluster gets out of sync

zeituni Aug 1, 2012 1:59 AM (in response to vblagojevic)

Yes, I have even more than one million records in the cache. This is why I want to set the limit to 1 million. And yes - it fills up: sometimes due to bugs / network failures records are not removed from the cache and this causes the cache to have too many entries.
I observe each cluster by invoking cache.size() of the cluster instance via JMX operation.
When I stop the load I see that only one cluster is evicting records while the other one is out of sync. When I start the load again I get diffierent amount of records in each cluster.
Actions
3. Re: Infinispan eviction: cluster gets out of sync

vblagojevic Aug 1, 2012 11:24 AM (in response to zeituni)

If "out of sync" you mean by that the cache.size() returns different values on different cache nodes - it is to be expected. Each node does eviction locally and eviction on node N is not a global event to be replicated/reproduced on some other node.
1 of 1 people found this helpful
Actions
4. Re: Infinispan eviction: cluster gets out of sync

zeituni Aug 1, 2012 12:11 PM (in response to vblagojevic)

Thanks for that information. I might have missed it in the documentation.
So how would you suggest to do eviction and synchronize all nodes?
Actions
5. Re: Infinispan eviction: cluster gets out of sync

mircea.markus Aug 2, 2012 10:39 AM (in response to zeituni)

As you're using an async replication cache I assume you can live with having the nodes *temporarily* out of sync.
The eviction might run with diffrenet timings so after a while, for the data you don't use, things should get in sync again. Reducing the wakeup interval would help with that as well.
It might also happen that you read data on one node (so that the idle time is reset on that node) and you don't read it on the other - at that point your cluster might get out of sync for longer periods of time.
If you want to enforce stricter sync between the nodes (just be sure that this is what you need in the first place, as that comes at some per cost) you can use the @CacheEntriesEvicted listener on on each entry eviction trigger an cluster remove(i.e. cache.remove).
1 of 1 people found this helpful
Actions
6. Re: Infinispan eviction: cluster gets out of sync

zeituni Aug 2, 2012 12:53 PM (in response to mircea.markus)

Thanks Mircea! I will take this to consideration.
This information is very helpful.
Actions

Go to original post