2 Replies Latest reply on May 18, 2011 5:19 PM by Shane Johnson

    Under the hood how does infinispan do rebalancing to prevent the domino effect ?

    Shadar Dongol Newbie

      I am assessing Inifinispan in a distributed mode as Im caching large amounts of data that could not stored into a single JVM heap space. There are a number of things are I do not completely understand regarding how does infinispan do things under the hood and hopefully someone will be able to provide me with an explanation.





      Assuming I have three nodes A,B,C each running infinispan with 512MB heap space and I want to cache to 2GB of data. Ideally I should be able to cache about 1.5GB (less the java application heap). If I set the consistent hash number of owners to 2 (<hash  numOwners="2 />) this means that data is constantly being replicated to another node, I could see through the console of node Node that is the cache entries is almost equal to the number of cache entries on node A yet the amount of consumed heap space is significantly less, if the same data is being replicated over to another node then why isnt the heap utilisation the same?


      On the other when I fire up nodes B and C why the number of objects in CacheA  does not decrease indicating that data is equally balanced among all nodes of the cluster?


      Example of this question: I have set the Hash numOwners=2 and I inserted 50K element in Cache on NodeA, then fired NodeB without performing any write operation, when I ran JMX console on NodeB is had 50K objects in the store, which is acceptable as data is saved in two locations. Now when I fired NodeC without performing any insert operation it had 20K objects when it joined the clustered which implied that some data was moved to NodeC, yet I checked the Cache sizes on Nodes A and B they still read 50K! Inifinspan was supposed to make only two copies of the data so if entries were made into NodeC then why didnt the number of cache entries decrease on Nodes A and B"


      If hash number is the number of copies of the data to be made is correct to say that: Maximum Cache Size= Total Available Heap space / No. of numOwners so in my case its 750MB = 1500MB / 2   ?


      Finally I have noticed that if I kill one node in the cluster immediately the number cache entries on another node increases without doing any inserts into the cache this implies that data is moved around. My concern is this could result in a domino effect where data from dead nodes is too large to be accomodated by exisiting nodes and as result the entire cluster gets killed.


      Because the exact behaviour is not clear I find it difficult to configure and design a reliable and scalable solution based on infinispan. Understanding this behaviour is key to define the amount of data to be cached, number of nodes and amount of JVM to be allocated, number of copies to be made. I think an article on this topic would be very useful for all architects and developers who are considering Infinispan.