9 Replies Latest reply on Jun 7, 2010 3:36 PM by Lin Ye

    A few questions

    Lin Ye Novice

      I asked the following questions a few weeks ago, but forgot to mark it as a question. And they were not answered. So I posted them here again.

       

      1. When a  putAll() to the cache fails on updating an entry in the middle, and  throws exception, does it mean the previous entries keep updated but not  the entries after? And if so, is there a way to tell where it fails? Or  the update is in an all or nothing manner for putAll?
      2. When  rehash is diabled for distributed cache with numOfOwners=2, when an  instance leaves the cluster, does it mean for certain data there is only  1 copy left? Otherwise, if rehash is enabled, when an instance leaves,  in addition to re-balance all data, does it guarantee the data  originally on the leaving node will be re-distributed to 2 remaining  instances?
      3. I did some testing with fetchInMemoryState enabled  for replicated cache, for a cache with 1M custom objects we typically  use, when a new instance as joining the cluster it took near 3 minutes  to fetch the states; similarly, for a distributed cache with rehash  enabled, for 1M float values it took about 13 seconds to join, while for  1M objects it failed to join with "Couldn't discover and join the  cluster". Can the replicated cach joining time be optimized? And how to  fix the joining issue for distributed cache with big amount of data?
        • 1. Re: A few questions
          Manik Surtani Master

          1.  All or nothing.  The putAll() operation is atomic.

           

          2.  Yes.  Only if rehashing is enabled are additional copies made so that the numOwners rule is adhered to.

           

          3.  Re: replicated caches, the bulk of the time would be spent in serializing and streaming the serialized stream.  So you should look at optimising your objects.  Implement Externalizable rather than Serializable, and make sure readExternal()/writeExternal() are as efficient as can be.  Alternatively, set fetchInMemoryState to false, and use a ClusterCacheLoader which will lazily load stuff from neighbour nodes as needed.

           

           

          Re: large state and rehashing, do you have any TRACE level logging when this fails?  Also, have you tried the latest 4.1.0.BETA2, as some work has gone into the rehashing code in this release.

          1 of 1 people found this helpful
          • 2. Re: A few questions
            Lin Ye Novice

            Manik, thanks for the information.

             

            Re: large state and rehashing, it seems happen with a small existing cluster size when the new instance is joining (in my case less than 5 existing instances, potentially more cache entries move around introduced by rehashsing). Attached is the log file with TRACE level logging.

             

            Another thing I noticed is that when an instance left the cluster, sometimes a remaining instance with a need of rehashing got the following exception:

            1234594 [Incoming-2,SSELabClusterMaster-59762] INFO  org.infinispan.distribution.DistributionManagerImpl  - Need to rehash
            1234596 [Rehasher-SSELabClusterMaster-59762] ERROR org.infinispan.distribution.LeaveTask  - Caught exception! Completed successfully? false
            java.lang.UnsupportedOperationException
                    at org.infinispan.util.ImmutableListCopy.removeAll(ImmutableListCopy.java:163)
                    at org.infinispan.distribution.InMemoryStateMap.addState(LeaveTask.java:262)
                    at org.infinispan.distribution.InMemoryStateMap.addState(LeaveTask.java:233)
                    at org.infinispan.distribution.LeaveTask.performRehash(LeaveTask.java:70)
                    at org.infinispan.distribution.RehashTask.call(RehashTask.java:54)
                    at org.infinispan.distribution.RehashTask.call(RehashTask.java:32)
                    at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
                    at java.util.concurrent.FutureTask.run(Unknown Source)
                    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
                    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
                    at java.lang.Thread.run(Unknown Source)

            • 3. Re: A few questions
              Mircea Markus Master

              3.  Re: replicated caches, the bulk of the time would be spent in serializing and streaming the serialized stream.  So you should look at optimising your objects.  Implement Externalizable rather than Serializable, and make sure readExternal()/writeExternal() are as efficient as can be.  Alternatively, set fetchInMemoryState to false, and use a ClusterCacheLoader which will lazily load stuff from neighbour nodes as needed.

               

               

              What about keeping the object in the cache serialised (i.e. byte array)? Serialisation of byte arrays is no-time. I know this is not the most nice way to go, but it will confirm that the time is spent in serialisation, rather than somewhere else. Manik, this would be also an + for keeping the objects in the data container as byte[], wdyt?

              1 of 1 people found this helpful
              • 4. Re: A few questions
                Manik Surtani Master

                Yes, if this works for you, this would work.  Mircea's suggestion relates to setting the lazyDeserialization setting to true.

                • 5. Re: A few questions
                  Manik Surtani Master

                  Can I confirm again what version you are using?  Some of this code changed in 4.1.0.BETA2 thanks to ISPN-420.

                  • 6. Re: A few questions
                    Vladimir Blagojevic Master

                    Hello Lin,

                     

                    Give it a try with 4.1.0.BETA2. We are in continious process of simplifying rehashing and, as Manik mentioned, some of this code changed in 4.1.0.BETA2.

                     

                    Regards,

                    Vladimir

                    • 7. Re: A few questions
                      Lin Ye Novice

                      Thank you, guys. 4.1.0.BETA2 did fix the rehashing problem I had seen at node joining/leaving before.

                       

                      Re: replicated caches, to keep the object in cache serialized (in bite array) per Mircea's suggestion, is the only thing I need to do setting lazyDeserialization to true? Please suggest the detailed steps I need to follow. Also, ClusterCacheLoader was suggested by Manik earlier, where to find more detailed information on how to use ClusterCacheLoader?

                      • 8. Re: A few questions
                        Mircea Markus Master

                        lazyDeserialization is described here

                        For cache loaders see the "Cache loaders" section on the documentation main page.

                        • 9. Re: A few questions
                          Lin Ye Novice

                          I tried setting lazyDeserialization to true, and it actually took even more time to fetch state. Any suggestions? Thanks in advance for the response.