-
1. Re: Data organization internals in Infinispan's distribution
mircea.markus Apr 25, 2012 5:59 AM (in response to dushyanttiwari)I think the infinispan equivalent is called virtualNodes and works slightly different: you configure the number of virtual nodes per each physical node and not for the entire cluster.[1]
Infinsipan has support for distributed transactions, I think that includes your use case. [2]
[1] https://docs.jboss.org/author/display/ISPN/Clustering+modes#Clusteringmodes-DistributionMode
[2] https://docs.jboss.org/author/display/ISPN/Infinispan+transactions
-
2. Re: Data organization internals in Infinispan's distribution
dushyanttiwari Apr 25, 2012 6:22 AM (in response to mircea.markus)Thanks Markus for the response.
I can understand from the article that the space is broken well if we use virtual nodes. Good to know that the no of virtual nodes are configured per node basis - it does make sense. Hence the hashing function plays a central role here. I see by default it uses MurmurHash3. So the distribution depends on the hash space/range of this function. I think we can use our custom hash function of different range, if required and control the behaviour.
Is my understanding correct?
Also if I say noOfOwners=2 is this a primary, backup like thing where all the writes will occur only on the primary and back is only for availability (like in other products) or is it both are copies are equivalent?
Can you suggest some preload strategy for the grid. I can only think of client based preloading on startup. With the other products what we used to do was store the partitionId with the data in the persistence layer (Db). Now we change the no of partitions very rarely. Hence we can easily preload by each server quering for the data of the partitions it hosts. But when we change the no of partitions we need to rehash all the keys.
About transactions : If it is supporting distributed transactions it needs to acquire distributed locks on the data. I wonder if infinispan optimizes the performance by knowing the scope of the operation - if local data is inviolved, only and if remote data is involved as well. Can you comment a little bit about the internals?