3 Replies Latest reply on Dec 22, 2009 7:03 AM by manik

    GUI Demo Cache for distributed caching example wrong?

      Hi did some tests with Infinispan and in my mind it's rather cool tool! However, I'm still unsure about the stability. Maybe my understanding is wrong because of missing documentation or hints I haven't found so far, or things are strange of a misbehaviour.

       

      I started the simple InfinispanDemo class two times, created caches with 100 entries and my understanding of the demo config is, that cache is (distribution, numOwners=2) hold two times in two VM instances. Now starting a third one, replicated 31 entries on the third node automatically. Okay, I thought maybe this is mechanism which has a mathematical background. Then I stopped the coordinater and had two caches, one with 100 entries, the other one with 61 entries. This is no longer matching the policy of numOwners=2 in my mind.

       

      I would really appreciate using Infinispan in my environment. But currently I'm unsure about the stability or whether my knowledge is too high level to understand the basics. Currently I'm afraid of data loss. At the same time, the documentation is still not available, even though docs are important, specially for a release candidate state to be tested and a self jump into the rather high version 4.0 as first release.

       

      When do you expect to have a complete documentation finished? Wiki is nice but seems to be a patchwork of information in my mind.

       

      Best regards

      Lars.

        • 1. Re: GUI Demo Cache for distributed caching example wrong?
          manik

          Hi Lars

           

          Glad you like what you see so far.

           

          Regarding docs, yes I know this is a shortcoming but we are working on this and I expect for the next few weeks people will have to rely on a combination of the wiki, FAQs and this article I wrote for DZone - http://java.dzone.com/articles/infinispan-data-grid-platform - as well as the sample configuration files, Javadocs, configdocs and jmxdocs we generate with each release (we do make an effort to keep Javadocs, etc. as detailed and up-to-date as possible).  There are also tutorials and demos on the wiki.

           

          At some stage the information represented in the various sources mentioned above will be wrapped up onto a coherent User Guide, but this takes time since most of us are busy on actually developing stuff, performance and stability improvements, etc. 

           

          Regarding your issue with the GUI Demo, can you confirm that you have rehashing enabled?  Also the GUI frame does not auto-refresh (my Swing skills are pathetic!) so you may have to click the "refresh" button to see changes sometimes...

           

          Cheers

          Manik

          1 of 1 people found this helpful
          • 2. Re: GUI Demo Cache for distributed caching example wrong?

            Regarding the GUI demo behaviour, in my mind it remains unpredictable to me. I didn't try to check all entries from the table model shown in the cache view. But starting with a first VM A of 100 entries, then starting a second vm instance B, it synched in a new test only 96 of 100 entries (?). Even after clicking the refresh button several times also after minutes of runtime it didn't change. Instance C also started in this cluster of numOwners=2 then showed 64 entries. Is this okay? At least not for two VM instances in my mind.

            So the Swing UI refresh is not the problem. Either shown number of entries is wrong, or the background distribution algorithms seems to behave somehow strange.

             

            I don't know enough about the internals, but each time I started with default config and play a little bit around with starting and stopping of three VM instances of this demo UI thing in same cluster the results and the number of cached entried change a little. To me this is a little bit strange. I don't dare to create a bug on this, since the behaviour changes each time like the number of cached entries and I don't know, whether the reason is a misunderstanding on my side.

             

            That's why I currently have a bad feeling even though I would like to take it into account in my architecture, i.e. as dynamic storage for binary jar files with an classloader on top of it to exchange data and code of application at runtime. However, because of this strange number of entries in the  cache view even after refresh I'm unsure.

             

            How do I switch on rehashing? Maybe you are laughing, but I didn't find a checkbox or something like this and this reveals that I'm absolutely newbe in this area. GUI showed that the standard config is loaded from gui-demo-cache-config.xml with content:

            <xml version="1.0" encoding="UTF-8"?>

             

            <

             

            infinispan xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:infinispan:config:4.0">

             

            <global>

             

            <transport clusterName="demoCluster"/>

             

            </global>

             

             

            <default>

             

            <clustering mode="distribution">

             

            <l1 enabled="true" lifespan="60000"/>

             

            <hash numOwners="2" rehashRpcTimeout="120000"/>

             

            <sync/>

             

            </clustering>

             

            </default>

            </

             

            infinispan>

            • 3. Re: GUI Demo Cache for distributed caching example wrong?
              manik

              lwunderlich wrote:

               

              Regarding the GUI demo behaviour, in my mind it remains unpredictable to me. I didn't try to check all entries from the table model shown in the cache view. But starting with a first VM A of 100 entries, then starting a second vm instance B, it synched in a new test only 96 of 100 entries (?). Even after clicking the refresh button several times also after minutes of runtime it didn't change. Instance C also started in this cluster of numOwners=2 then showed 64 entries. Is this okay? At least not for two VM instances in my mind.

              So the Swing UI refresh is not the problem. Either shown number of entries is wrong, or the background distribution algorithms seems to behave somehow strange.

              You need to try and understand how distribution works if you want to analyse it at this level  Here are a few things to consider:

               

              Distribution will never guarantee an equal balance among nodes.  This is due to fixed hash wheel positions of nodes (based on a bit-spread hash code on each node's address) and a (bit-spread) hash code of your key.  So unless you artificially engineer these two elements, you can never have a perfectly balanced cluster.  So don't expect to see equal distribution when you have 3 nodes running, however you should expect to see a total number of entries. 

               

              In your case, with numOwners = 2 and 100 entries put on Node A, you should see sizeof(NodeA) + sizeof(NodeB) + sizeof(NodeC) == 200.  If this is not the case, possible suspects are an eviction thread, or the use of a lifespan/maxAge when you put entries in to the cache.  If these still don't 'add up', we can dig further, perhaps wrap this into a unit test - or, as you say, check the Swing code for accurate reporting of size.

               

              lwunderlich wrote:

               

              How do I switch on rehashing? Maybe you are laughing, but I didn't find a checkbox or something like this and this reveals that I'm absolutely newbe in this area. GUI showed that the standard config is loaded from gui-demo-cache-config.xml with content:

              <xml version="1.0" encoding="UTF-8"?>

               

              <

               

              infinispan xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:infinispan:config:4.0">

               

              <global>

               

              <transport clusterName="demoCluster"/>

               

              </global>

               

               

              <default>

               

              <clustering mode="distribution">

               

              <l1 enabled="true" lifespan="60000"/>

               

              <hash numOwners="2" rehashRpcTimeout="120000"/>

               

              <sync/>

               

              </clustering>

               

              </default>

              </

               

               

              infinispan>

              <hash ... rehashEnabled="true" /> - but this is true by default.  I was just wondering if you had it disabled explicitly, which is not the case.

               

              Cheers

              Manik

              1 of 1 people found this helpful