2 Replies Latest reply on Apr 25, 2012 6:22 AM by dushyanttiwari

Data organization internals in Infinispan's distribution

dushyanttiwari Apr 25, 2012 2:36 AM

Hi Users/Mods,

I would like to quick start with infinispan. Trying to understand the data organization in infinispan - need some help here.

What I understand is (by studies on Gemfire, eXtremeScale and Hazelcast) one can configure the number of partitions/buckets in these products and based on the number of nodes these atomic units are deployed on the servers. For Example if we configure 10 partitions and start 5 nodes there will be 2 partitions per node (dynamic) or if we start 2 nodes there will be 5.

Now data is hashed into these buckets and routed to the location hosting these buckets/partitions. If the data organization similar in 'distribution' of infinispan? How do we configure the no of partitions?

If something different please explain or point to a description resource.

The backup copies are just for failure recovery in these products and writes are done only on the primary copy. I am hoping for a similar behavior in infinispan.

Finally about transactions: These products (with an exception of Hazelcast) does not support transactions acoss partitions(eXtremeScale)/nodes (Gemfire). Can Infinispan support such transactions across partitions/nodes? More over can we configure the transactions to be single phase commit? How?

Thanks,

Dushyant

1. Re: Data organization internals in Infinispan's distribution

mircea.markus Apr 25, 2012 5:59 AM (in response to dushyanttiwari)

I think the infinispan equivalent is called virtualNodes and works slightly different: you configure the number of virtual nodes per each physical node and not for the entire cluster.[1]
Infinsipan has support for distributed transactions, I think that includes your use case. [2]
[1] https://docs.jboss.org/author/display/ISPN/Clustering+modes#Clusteringmodes-DistributionMode
[2] https://docs.jboss.org/author/display/ISPN/Infinispan+transactions
Actions
2. Re: Data organization internals in Infinispan's distribution

dushyanttiwari Apr 25, 2012 6:22 AM (in response to mircea.markus)

Thanks Markus for the response.
I can understand from the article that the space is broken well if we use virtual nodes. Good to know that the no of virtual nodes are configured per node basis - it does make sense. Hence the hashing function plays a central role here. I see by default it uses MurmurHash3. So the distribution depends on the hash space/range of this function. I think we can use our custom hash function of different range, if required and control the behaviour.
Is my understanding correct?

Also if I say noOfOwners=2 is this a primary, backup like thing where all the writes will occur only on the primary and back is only for availability (like in other products) or is it both are copies are equivalent?

Can you suggest some preload strategy for the grid. I can only think of client based preloading on startup. With the other products what we used to do was store the partitionId with the data in the persistence layer (Db). Now we change the no of partitions very rarely. Hence we can easily preload by each server quering for the data of the partitions it hosts. But when we change the no of partitions we need to rehash all the keys.

About transactions : If it is supporting distributed transactions it needs to acquire distributed locks on the data. I wonder if infinispan optimizes the performance by knowing the scope of the operation - if local data is inviolved, only and if remote data is involved as well. Can you comment a little bit about the internals?
Actions

Go to original post