6 Replies Latest reply on Feb 12, 2007 4:57 AM by chrismeadows

Persistent configuration in clustered environment.

victortr Dec 3, 2006 12:54 PM

Hello,

my requirement is to implement configuration infrastructure. The configuration should be persistent and should be synchronized in all nodes of the cluster.

My idea is to use JBoss cache for cluster synchronization and FileCacheLoader for persistency. I also want to make my own version of FileCacheLoader that will serialize/deserialize the configuration object in xml files instead of the java serialized objects.

Is this solution make any sense?

Could you suggest other approaches as well?

Thanks a lot.
Victor.

1. Re: Persistent configuration in clustered environment.

manik Dec 14, 2006 12:43 PM (in response to victortr)

Yes, this makes a lot of sense. Perhaps you could use something like XStream in your custom cache loader to serialize the objects to XML.
Actions
2. Re: Persistent configuration in clustered environment.

victortr Dec 17, 2006 8:04 AM (in response to victortr)

Thank you for the answer.

I have one problematic scenario.
There're two instances in my clustered environment: "serverA" and "serverB", at the beginning both of them are alive.
step 1. shutdown the "serverA"
step 2. make a change in "serverB".
step 3. shutdown the "serverB"
step 4. start the "serverA"
step 5. start the "serverB".

The problem is that a change in step 2 was lost.

Is there any solution here?

Victor.
Actions
3. Re: Persistent configuration in clustered environment.

manik Dec 21, 2006 8:49 PM (in response to victortr)

Classic problem of replicated state being lost. By step 3, the entire cluster is unavailable so all transient state is lost. This is why you have cache loaders to persist the state somewhere so on restart this state is still available to the cluster.
Actions

4. Re: Persistent configuration in clustered environment.

chrismeadows Feb 6, 2007 12:35 PM (in response to victortr)

"victor75" wrote:
Thank you for the answer.

I have one problematic scenario.
There're two instances in my clustered environment: "serverA" and "serverB", at the beginning both of them are alive.
step 1. shutdown the "serverA"
step 2. make a change in "serverB".
step 3. shutdown the "serverB"
step 4. start the "serverA"
step 5. start the "serverB".

The problem is that a change in step 2 was lost.

Is there any solution here?

Victor.

I've tried to do this as well with JBC 1.4.1, and I am using a cache loader as per Manik's suggestion, but I still get the same problem as Victor - changes to serverB are lost.

So I thought I'd go back and look at the JUnit tests and run them. The one I am interested in is

build one.test.aop50 -Dtest=org.jboss.cache.loader.CacheLoaderWithReplicationTest

I've added a test csae like this

 public void testPessSyncReplFailureRecovery1() throws Exception
 {
 cache1.setCacheMode(TreeCache.REPL_SYNC);
 cache2.setCacheMode(TreeCache.REPL_SYNC);

 cache1.startService();
 cache2.startService();

 Assert.assertNull(cache1.get(fqn, key));
 Assert.assertNull(cache2.get(fqn, key));


 CacheLoader loader1 = cache1.getCacheLoader();
 CacheLoader loader2 = cache2.getCacheLoader();

 TransactionManager mgr = cache1.getTransactionManager();
 mgr.begin();
 cache1.put(fqn, key, "value");

 Assert.assertEquals("value", cache1.get(fqn, key));
 Assert.assertNull(cache2.get(fqn, key));
 Assert.assertNull(loader1.get(fqn));
 Assert.assertNull(loader2.get(fqn));
 mgr.commit();

 Assert.assertEquals("value", cache1.get(fqn, key));
 Assert.assertEquals("value", cache2.get(fqn, key));
 Assert.assertEquals("value", loader1.get(fqn).get(key));
 Assert.assertEquals("value", loader2.get(fqn).get(key));

 cache2.stopService();

 mgr.begin();
 cache1.put(fqn, key, "value2");

 Assert.assertEquals("value2", cache1.get(fqn, key));
 Assert.assertEquals("value", loader1.get(fqn).get(key));

 mgr.commit();

 Assert.assertEquals("value2", cache1.get(fqn, key));
 Assert.assertEquals("value2", loader1.get(fqn).get(key));

 cache1.stopService();

 cache1.startService();
 cache2.startService();

 Assert.assertEquals("value2", cache1.get(fqn, key));
 Assert.assertEquals("value2", loader1.get(fqn).get(key));
 Assert.assertEquals("value2", cache2.get(fqn, key));
 Assert.assertEquals("value2", loader2.get(fqn).get(key));

 // force clean up
 tearDown();
 }

and it fails on the line in bold, indicating that the caches are in synch but that the cache loaders are not. Could somebody explain why please? Shouldn't the cache loaders also be in synch?

I then swapped the order in which cache1 and cache2 are restarted as follows

 public void testPessSyncReplFailureRecovery2() throws Exception
 {
 cache1.setCacheMode(TreeCache.REPL_SYNC);
 cache2.setCacheMode(TreeCache.REPL_SYNC);

 cache1.startService();
 cache2.startService();

 Assert.assertNull(cache1.get(fqn, key));
 Assert.assertNull(cache2.get(fqn, key));


 CacheLoader loader1 = cache1.getCacheLoader();
 CacheLoader loader2 = cache2.getCacheLoader();

 TransactionManager mgr = cache1.getTransactionManager();
 mgr.begin();
 cache1.put(fqn, key, "value");

 Assert.assertEquals("value", cache1.get(fqn, key));
 Assert.assertNull(cache2.get(fqn, key));
 Assert.assertNull(loader1.get(fqn));
 Assert.assertNull(loader2.get(fqn));
 mgr.commit();

 Assert.assertEquals("value", cache1.get(fqn, key));
 Assert.assertEquals("value", cache2.get(fqn, key));
 Assert.assertEquals("value", loader1.get(fqn).get(key));
 Assert.assertEquals("value", loader2.get(fqn).get(key));

 cache2.stopService();

 mgr.begin();
 cache1.put(fqn, key, "value2");

 Assert.assertEquals("value2", cache1.get(fqn, key));
 Assert.assertEquals("value", loader1.get(fqn).get(key));

 mgr.commit();

 Assert.assertEquals("value2", cache1.get(fqn, key));
 Assert.assertEquals("value2", loader1.get(fqn).get(key));

 cache1.stopService();

 cache2.startService();
 cache1.startService();

 Assert.assertEquals("value2", cache1.get(fqn, key));
 Assert.assertEquals("value2", loader1.get(fqn).get(key));
 Assert.assertEquals("value2", cache2.get(fqn, key));
 Assert.assertEquals("value2", loader2.get(fqn).get(key));

 // force clean up
 tearDown();
 }

and that fails on the line in bold, indicating that the update has been lost completely from cache1. Huh?

Anyone have ideas on how this is supposed to work? Have I misunderstood something? If not, it would be nice if these JUnit tests were included in the JBossCache test suite (and passed, obvs.)

Any input appreciated.

Chris

5. Re: Persistent configuration in clustered environment.

manik Feb 9, 2007 8:17 AM (in response to victortr)

Will look into adding your tests, but for now:

1) This should pass - and it does on Branch_JBossCache_1_4_0 (off which 1.4.1.SP1 was recently released)

2) Changing the order with which you restart caches is a problem. There is no way of knowing which cache's persistent state is "correct". Therefore, the first one that comes back up is treated as correct. When the 2nd cache comes up, it looks for any existing members in the cluster and requests state from that member. If that member happens to have older state (since it was shut down and missed stuff) then that node is still treated as holding the true data rep since there is no way of knowing that it was shut down for a while, things moved on, and then the rest of the cluster was shut down.

One way around this is to use a shared cache loader so that all cache instances use the same persistent store.
Actions
6. Re: Persistent configuration in clustered environment.

chrismeadows Feb 12, 2007 4:57 AM (in response to victortr)

Thanks Manik.

My caches are geographically separated (UK, Japan, east and west coast US), so I don't think shared cache loaders I going to work for me unless I do async cache loader updates, and I'm pretty sure I don't want to do that.

The configuration I'm trying out has N gegraphically separate persistent JBC clusters, each cluster having it's own local shared cache loader. Local changes in the cluster are propagated transactionally to cluster members and persistent store.

One member of each cluster is then also in another cluster that handles the async data propagation between localised clusters. I might make the cluster-cluster synchronous and use buddy replication in a cyclic graph instead.

Does this sound like a sensible configuration, or am I going wrong?
Actions

Go to original post