-
1. Re: Beginner Problem — Data Loss Starting Two Nodes in Order
alex.heneveld Jul 14, 2011 7:41 PM (in response to alex.heneveld)Just noticed the attachments all got zipped automatically. Here is the relevant code, and a more convenient single zip attached (hopefully not zipzipped...).
public void bug() throws InterruptedException { EmbeddedCacheManager cm1 = newCM(); Cache c1 = cm1.getCache("x"); c1.put("key", "1"); Thread.sleep(1000); EmbeddedCacheManager cm2 = newCM(); Cache c2 = cm2.getCache("x"); assert c1.get("key") != null : "value at cache 1 was lost"; cm1.stop(); cm2.stop(); } public EmbeddedCacheManager newCM() { GlobalConfiguration gc = GlobalConfiguration.getClusteredDefault(); Configuration c = new Configuration().fluent() .mode(Configuration.CacheMode.DIST_SYNC) .hash().numOwners(1) .clustering().l1().disable() .build(); return new DefaultCacheManager(gc, c); }
-
LosingDataFiles.zip 3.6 KB
-
-
2. Re: Beginner Problem — Data Loss Starting Two Nodes in Order
sannegrinovero Jul 15, 2011 7:12 AM (in response to alex.heneveld)Hi Alex,
it's likely that after starting the second cachemanager it didn't finish joining the cluster and performing state transfer from the other nodes.
When you do "getCache()" it won't block to wait for all state to be received, as the expected cluster size is unknown; therefore it is good practice to either wait a couple of second before testing it's content, or poll the members size like we do in the testsuite.
Have a look into the testsuite source:
org.infinispan.test.MultipleCacheManagersTest
especially methods createClusteredCaches and waitForClusterToForm.
Also note that we distribute the testing jars too, so that people can reuse these utilities in their own tests.
-
3. Re: Beginner Problem — Data Loss Starting Two Nodes in Order
alex.heneveld Jul 15, 2011 8:13 AM (in response to sannegrinovero)Thanks Sanne but I don't see how that applies.
In my world the first cache manager (CM1) has no way to know cluster size or if/when other cache managers are joining. He needs to be able to call cache.get(...) at any point in time and see consistent data. (I could cheat in this example since they are same JVM but that's not going to help IRL.)
What is happening is that CM2 comes in to existence, then does cm2.getCache("x"), and that seems to put the cache on CM1 into an inconsistent state, even before anyone touches the cache at CM2.
Is there something else CM2 should do as a good citizen _before_ calling getCache() ? I try getMembers() etc and getStatus() but they seems very boring (null, and initializing) until I have tried a getCache(). (Even cm2.start() followed by sleep 5s doesn't change status from initialized to running, or populate members, which was unexpected.)
Or possibly (due to the 7s join-time interruption and other network warnings) something not compatible with the default infinispan jgroups config?
Cheers
Alex
-
4. Re: Beginner Problem — Data Loss Starting Two Nodes in Order
alex.heneveld Jul 15, 2011 10:27 AM (in response to alex.heneveld)Healthier environment now, with -Djgroups.bind_addr=127.0.0.1, gets rid of the delay on joining and worrisome warnings, but the problem is still here.
public void bug() throws InterruptedException { EmbeddedCacheManager cm1 = newCM(); Cache c1 = cm1.getCache("x"); c1.put("key", "value"); Thread.sleep(3000); EmbeddedCacheManager cm2 = newCM(); System.out.println(c1.get("key")); //always says "value" Cache c2 = cm2.getCache("x"); System.out.println(c1.get("key")); //says null sometimes assert c1.get("key") != null : "value at cache 1 was lost"; cm1.stop(); cm2.stop(); } public EmbeddedCacheManager newCM() { GlobalConfiguration gc = GlobalConfiguration.getClusteredDefault(); Configuration cfg = new Configuration().fluent() .mode(Configuration.CacheMode.DIST_SYNC) .hash().numOwners(1) .clustering().l1().disable() .build(); return new DefaultCacheManager(gc, cfg); }
Log file attached.
-
healthier.log.zip 911 bytes
-
-
5. Re: Beginner Problem — Data Loss Starting Two Nodes in Order
alex.heneveld Jul 15, 2011 11:10 AM (in response to alex.heneveld)Filed as bug https://issues.jboss.org/browse/ISPN-1244 on Sanne's advice.