3 Replies Latest reply on Nov 11, 2009 7:07 AM by manik

Known technical limitations?

sannegrinovero Nov 10, 2009 10:44 AM

Hello again,
I've been asked if Infinispan could be used in some special situations.

1)Did anyone any test on geographically distributed nodes?
I guess having a DIST_SYNC cluster from London to India isn't an option (right?); would a clever solution working by ASYNC be viable in your opinion?

2)Is Infinispan taking any measure against split brain situations?

3)At cacheManager initialization I would like to block until I "see" all members; I don't know the node numbers beforehand. Is there anything I can do to address this problem? Am I having good chances to have discovered all nodes if I wait for "at least one" other member in pooling?

1. Re: Known technical limitations?

manik Nov 10, 2009 1:27 PM (in response to sannegrinovero)

"sannegrinovero" wrote:

1)Did anyone any test on geographically distributed nodes?

No. But this is on our roadmap - until we get to it! ISPN-262

"sannegrinovero" wrote:

I guess having a DIST_SYNC cluster from London to India isn't an option (right?); would a clever solution working by ASYNC be viable in your opinion?

Even an Async one would add a lot of latency for GETs since some GETs may well have to fetch stuff from across continents. Also the current CH algorithm doesn't guarantee that the 2 nodes it picks are in different data centres (otherwise whats the point!)

"sannegrinovero" wrote:

2)Is Infinispan taking any measure against split brain situations?

JGroups is able to detect this. At the moment it just presents a callback and we don't handle it; in future I plan to handle such events from JGroups. ISPN-263

"sannegrinovero" wrote:

3)At cacheManager initialization I would like to block until I "see" all members; I don't know the node numbers beforehand. Is there anything I can do to address this problem? Am I having good chances to have discovered all nodes if I wait for "at least one" other member in pooling?

The cache manager *will* block until it has started the channel, which in turn waits until it receives a full cluster view. Are you seeing anything to the contrary?

Cheers
Manik
Actions
2. Re: Known technical limitations?

sannegrinovero Nov 10, 2009 7:40 PM (in response to sannegrinovero)
"manik.surtani@jboss.com" wrote:
"sannegrinovero" wrote:

1)Did anyone any test on geographically distributed nodes?

No. But this is on our roadmap - until we get to it! <a href="https://jira.jboss.org/jira/browse/ISPN-262">ISPN-262</a>

Very nice. If needed I volunteer as italian node when time will come for this issue.

"manik.surtani@jboss.com" wrote:
"sannegrinovero" wrote:

I guess having a DIST_SYNC cluster from London to India isn't an option (right?); would a clever solution working by ASYNC be viable in your opinion?

Even an Async one would add a lot of latency for GETs since some GETs may well have to fetch stuff from across continents. Also the current CH algorithm doesn't guarantee that the 2 nodes it picks are in different data centres (otherwise whats the point!)

Some dev teams are spread around the globe and still want to access Jira locally to have it respond quickly, but have it cluster with other instances which are "local" to the other team. I doubt this is a good reason technically, it is more business like: competitors have such a feature, so we would like too even if not as a priority. Internally many components could be switched to async, in this case it could be a good reason technically too.

"sannegrinovero" wrote:

2)Is Infinispan taking any measure against split brain situations?

JGroups is able to detect this. At the moment it just presents a callback and we don't handle it; in future I plan to handle such events from JGroups. ISPN-263

"manik.surtani@jboss.com" wrote:
"sannegrinovero" wrote:

3)At cacheManager initialization I would like to block until I "see" all members; I don't know the node numbers beforehand. Is there anything I can do to address this problem? Am I having good chances to have discovered all nodes if I wait for "at least one" other member in pooling?

The cache manager *will* block until it has started the channel, which in turn waits until it receives a full cluster view. Are you seeing anything to the contrary?

Interesting answer. I didn't experience it directly - my human reflexes are not as sparkling - but then what is the reason for most Infinispan Core tests to do

TestingUtil.blockUntilViewsReceived(10000, caches);

(Like all tests extending org.infinispan.test.MultipleCacheManagersTest)
?
Actions
3. Re: Known technical limitations?

manik Nov 11, 2009 7:07 AM (in response to sannegrinovero)
"sannegrinovero" wrote:

"manik.surtani@jboss.com" wrote:

Even an Async one would add a lot of latency for GETs since some GETs may well have to fetch stuff from across continents. Also the current CH algorithm doesn't guarantee that the 2 nodes it picks are in different data centres (otherwise whats the point!)

Some dev teams are spread around the globe and still want to access Jira locally to have it respond quickly, but have it cluster with other instances which are "local" to the other team. I doubt this is a good reason technically, it is more business like: competitors have such a feature, so we would like too even if not as a priority. Internally many components could be switched to async, in this case it could be a good reason technically too.

I agree that it is a useful thing. My comment was on running DIST across data centres (in either sync or async mode) being pointless since DIST (the way it is right now) will not be able to pick nodes in different data centres. The algo may well put all copies of a particular entry in just 1 data centre defeating the purpose of such a setup. We need to enhance the consistent hash algo to select nodes accordingly.

"sannegrinovero" wrote:

Interesting answer. I didn't experience it directly - my human reflexes are not as sparkling - but then what is the reason for most Infinispan Core tests to do

TestingUtil.blockUntilViewsReceived(10000, caches);

(Like all tests extending org.infinispan.test.MultipleCacheManagersTest)
?

That is more of a common idiom to ensure, for example in a unit test that adds a node to a running cluster, that the existing nodes have seen the new node join before proceeding. Just because the joiner has seen the existing cluster (and CacheManager.getCache() returns) doesn't mean the rest of the cluster has seen the joiner!

1 of 1 people found this helpful
Actions

Go to original post