9 Replies Latest reply on Jun 7, 2010 3:36 PM by yelin666

A few questions

yelin666 May 27, 2010 10:35 AM

I asked the following questions a few weeks ago, but forgot to mark it as a question. And they were not answered. So I posted them here again.

When a putAll() to the cache fails on updating an entry in the middle, and throws exception, does it mean the previous entries keep updated but not the entries after? And if so, is there a way to tell where it fails? Or the update is in an all or nothing manner for putAll?
When rehash is diabled for distributed cache with numOfOwners=2, when an instance leaves the cluster, does it mean for certain data there is only 1 copy left? Otherwise, if rehash is enabled, when an instance leaves, in addition to re-balance all data, does it guarantee the data originally on the leaving node will be re-distributed to 2 remaining instances?
I did some testing with fetchInMemoryState enabled for replicated cache, for a cache with 1M custom objects we typically use, when a new instance as joining the cluster it took near 3 minutes to fetch the states; similarly, for a distributed cache with rehash enabled, for 1M float values it took about 13 seconds to join, while for 1M objects it failed to join with "Couldn't discover and join the cluster". Can the replicated cach joining time be optimized? And how to fix the joining issue for distributed cache with big amount of data?

1. Re: A few questions

manik Jun 2, 2010 12:42 PM (in response to yelin666)

1. All or nothing. The putAll() operation is atomic.

2. Yes. Only if rehashing is enabled are additional copies made so that the numOwners rule is adhered to.

3. Re: replicated caches, the bulk of the time would be spent in serializing and streaming the serialized stream. So you should look at optimising your objects. Implement Externalizable rather than Serializable, and make sure readExternal()/writeExternal() are as efficient as can be. Alternatively, set fetchInMemoryState to false, and use a ClusterCacheLoader which will lazily load stuff from neighbour nodes as needed.

Re: large state and rehashing, do you have any TRACE level logging when this fails? Also, have you tried the latest 4.1.0.BETA2, as some work has gone into the rehashing code in this release.
1 of 1 people found this helpful
Actions
2. Re: A few questions

yelin666 Jun 3, 2010 4:21 PM (in response to manik)
Manik, thanks for the information.

Re: large state and rehashing, it seems happen with a small existing cluster size when the new instance is joining (in my case less than 5 existing instances, potentially more cache entries move around introduced by rehashsing). Attached is the log file with TRACE level logging.

Another thing I noticed is that when an instance left the cluster, sometimes a remaining instance with a need of rehashing got the following exception:
1234594 [Incoming-2,SSELabClusterMaster-59762] INFO org.infinispan.distribution.DistributionManagerImpl - Need to rehash
1234596 [Rehasher-SSELabClusterMaster-59762] ERROR org.infinispan.distribution.LeaveTask - Caught exception! Completed successfully? false
java.lang.UnsupportedOperationException
        at org.infinispan.util.ImmutableListCopy.removeAll(ImmutableListCopy.java:163)
        at org.infinispan.distribution.InMemoryStateMap.addState(LeaveTask.java:262)
        at org.infinispan.distribution.InMemoryStateMap.addState(LeaveTask.java:233)
        at org.infinispan.distribution.LeaveTask.performRehash(LeaveTask.java:70)
        at org.infinispan.distribution.RehashTask.call(RehashTask.java:54)
        at org.infinispan.distribution.RehashTask.call(RehashTask.java:32)
        at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
        at java.util.concurrent.FutureTask.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

omtest.log.zip 71.3 KB
Actions
3. Re: A few questions

mircea.markus Jun 4, 2010 6:02 AM (in response to manik)

3. Re: replicated caches, the bulk of the time would be spent in serializing and streaming the serialized stream. So you should look at optimising your objects. Implement Externalizable rather than Serializable, and make sure readExternal()/writeExternal() are as efficient as can be. Alternatively, set fetchInMemoryState to false, and use a ClusterCacheLoader which will lazily load stuff from neighbour nodes as needed.

What about keeping the object in the cache serialised (i.e. byte array)? Serialisation of byte arrays is no-time. I know this is not the most nice way to go, but it will confirm that the time is spent in serialisation, rather than somewhere else. Manik, this would be also an + for keeping the objects in the data container as byte[], wdyt?

1 of 1 people found this helpful
Actions
4. Re: A few questions

manik Jun 4, 2010 11:29 AM (in response to mircea.markus)

Yes, if this works for you, this would work. Mircea's suggestion relates to setting the lazyDeserialization setting to true.
Actions
5. Re: A few questions

manik Jun 4, 2010 11:30 AM (in response to yelin666)

Can I confirm again what version you are using? Some of this code changed in 4.1.0.BETA2 thanks to ISPN-420.
Actions
6. Re: A few questions

vblagojevic Jun 4, 2010 2:46 PM (in response to manik)

Hello Lin,

Give it a try with 4.1.0.BETA2. We are in continious process of simplifying rehashing and, as Manik mentioned, some of this code changed in 4.1.0.BETA2.

Regards,
Vladimir
Actions
7. Re: A few questions

yelin666 Jun 6, 2010 10:48 PM (in response to vblagojevic)

Thank you, guys. 4.1.0.BETA2 did fix the rehashing problem I had seen at node joining/leaving before.

Re: replicated caches, to keep the object in cache serialized (in bite array) per Mircea's suggestion, is the only thing I need to do setting lazyDeserialization to true? Please suggest the detailed steps I need to follow. Also, ClusterCacheLoader was suggested by Manik earlier, where to find more detailed information on how to use ClusterCacheLoader?
Actions
8. Re: A few questions

mircea.markus Jun 7, 2010 7:38 AM (in response to yelin666)

lazyDeserialization is described here
For cache loaders see the "Cache loaders" section on the documentation main page.
Actions
9. Re: A few questions

yelin666 Jun 7, 2010 3:36 PM (in response to mircea.markus)

I tried setting lazyDeserialization to true, and it actually took even more time to fetch state. Any suggestions? Thanks in advance for the response.
Actions

Go to original post