5 Replies Latest reply on May 21, 2014 12:08 PM by eshevchenko

    Clustering - sync/async replication

    eshevchenko

      I am try to figure out most useful configuration when cluster enabled. I noticed that async replication mode is not stable. I have two testcases one - create documents in random repository and check if the changes are replicated on all repositories. And another -  move nodes from random repository and check if the changes are replicated on all repositories.

      My observations

      • Async replication takes longer time(see benchmark report) - this happens not every launch
      • There are exeptions occured during async replication
      • Node.getNodes().getSize() has wrong value when async replication enabled, although all replicated nodes are available by id
      • When size of replicated documents is above than minimumBinarySizeInBytes param - there is an exception occured when try read it.
      • When i try to move node and async replication enabled i got an exception when try to read node

       

      {code}

      Caused by: org.modeshape.jcr.cache.NodeNotFoundInParentException: Cannot locate child node: 4c24b267505d642cf62467-dd6e-495d-8024-13a6dfa382e0 within parent: 4c24b267505d64a4e3c13c-df58-4277-9016-6fa9a24cec73

          at org.modeshape.jcr.cache.document.LazyCachedNode.parentReferenceToSelf(LazyCachedNode.java:250)

          at org.modeshape.jcr.cache.document.LazyCachedNode.getSegment(LazyCachedNode.java:287)

          at org.modeshape.jcr.cache.document.LazyCachedNode.getPath(LazyCachedNode.java:296)

          at org.modeshape.jcr.AbstractJcrNode.path(AbstractJcrNode.java:233)

          at org.modeshape.jcr.JcrNode.getPath(JcrNode.java:82)

      {code}

          Where df58-4277-9016-6fa9a24cec73 is the part of id source folder

       

      rhauch, could you look on my tests and give your opinion. I have an issue with configuration or i should report all this cases as separate bugs to modeshape issue tracker?

       

      Thanks by advance.

      Evgeniy Shevchenko

        • 1. Re: Clustering - sync/async replication
          rhauch

          I do NOT think it is wise to use ModeShape with async replication (e.g., how Infinispan transfers state and coordinates state activities). ModeShape is strongly-consistent, and (unlike a lot of other Infinispan applications) requires consistent views of related entries/nodes. Using async replication means that ISPN is no longer required to provide ModeShape a strongly-consistent view of the cached information. I think the problems you listed above are just the tip of the iceberg of possible side effects.

           

          But just to clarify, sync/async replication is very different from sync/async cache store writes (e.g., write-through or write-behind). I do think it's possible to use write-behind with ModeShape, as the Infinispan cluster still stays in sync via state transfer at the time of writes, and write-behind just means that the cache store is updated asynchronously.

          1 of 1 people found this helpful
          • 2. Re: Clustering - sync/async replication
            eshevchenko

            Randall, thanks for your answer.

            As for async replication. I have a Webservices bridge before Modeshape and planed to use load balancer with configured sticky sessions to route the requests to a particular node. So the replication will be finished by the time client is navigated to another node. But if you do not recomend to use async replication with Modeshape at all I'll stay with sync mode for replication.

             

            As for sync replication. Test with "moving nodes" is unstable for sync replication as well.

            I run this test with 4,10,15,20 threads and have different results after each launch. Usually one or two tests are failed with exception like this

            Caused by: org.junit.internal.runners.model.MultipleFailureException: There were 2 errors:

              java.util.concurrent.ExecutionException(javax.jcr.InvalidItemStateException: This session tried to save changes to node with key 'Cannot locate child node: 4c24b267505d646d397f81-a9be-445d-a3be-77ec166f725e within parent: 4c24b267505d644e5bd1c1-6a7d-46fb-a0c1-9ea4955074a9', but it was removed by another session.)

             

            When I run tests with JProfiler - all tests are passed.

             

            I can't figure out, is it a bug, or configuration issue?

            • 3. Re: Clustering - sync/async replication
              rhauch

              It may be a configuration issue if your repositories are not properly clustered. For example, with 3.x you have to make sure that both ModeShape and Infinispan are clustered properly, and that all have access to the same shared cache store. If so, and you're still seeing a problem with move, then please provide a test case with configuration information that demonstrates it -- we'll be happy to look into it.

               

              Some of it also depends on what your move tests are doing. For example, if your test is trying to concurrently move the same nodes in separate sessions, then you should expect that the first session to complete this will succeed, and that all subsequent sessions will fail with an exception. After all, we only reserve locks for writing nodes during the save operations; locking nodes while the session is making all the individual changes would dramatically reduce performance, and could be disastrous for normal concurrency.

              • 4. Re: Clustering - sync/async replication
                eshevchenko

                Thanks, hint with shared cache store add more stability. My previous configuration has own cache storage for each cluster node. The link to testcase that was in my previous message is the same - https://github.com/eshevchenko/modeshape/blob/cluster-benchmark/modeshape-jcr/src/test/java/org/modeshape/jcr/benchmark/MoveDocumentsBenchmarkTest.java.

                Main concern : I have an initial content file. It contain source folder and destination. The test has read the source folder and create callable task for each jcr node. The task will move this node to destination folder. So the case " if your test is trying to concurrently move the same nodes in separate sessions" shouldn't be happens.  Test is running with different count of threads:

                moveDocumentsSyncTcp - number of treads are the maximum number of processors available to the virtual machine

                moveDocumentsSyncTcp10 - number of treads are 10

                moveDocumentsSyncTcp15 - number of treads are 15

                moveDocumentsSyncTcp20 - number of treads are 20

                Each test is running for 3 execution(BenchmarkOptions - benchmarkRounds)

                The count of cluster nodes is 2

                After each execution i have different results. Increasing cluster node count has lead to increasing errors count. Increasing the count of benchmarkRounds has lead to increasing errors count. Error is still the same

                 

                java.lang.reflect.InvocationTargetException

                    at com.carrotsearch.junitbenchmarks.BenchmarkStatement$BaseEvaluator.evaluateInternally(BenchmarkStatement.java:90)

                    at com.carrotsearch.junitbenchmarks.BenchmarkStatement$SequentialEvaluator.evaluate(BenchmarkStatement.java:121)

                    at com.carrotsearch.junitbenchmarks.BenchmarkStatement.evaluate(BenchmarkStatement.java:346)

                    at org.modeshape.jcr.benchmark.AbstractBenchmarkTest$1$1.evaluate(AbstractBenchmarkTest.java:132)

                    at org.junit.rules.RunRules.evaluate(RunRules.java:18)

                    at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)

                    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)

                    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)

                    at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)

                    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)

                    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)

                    at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)

                    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)

                    at org.junit.runners.ParentRunner.run(ParentRunner.java:300)

                    at org.junit.runner.JUnitCore.run(JUnitCore.java:157)

                    at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:77)

                    at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:195)

                    at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:63)

                    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

                    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

                    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)

                Caused by: java.util.concurrent.ExecutionException: javax.jcr.InvalidItemStateException: This session tried to save changes to node with key 'Cannot locate child node: 4c24b267505d649a2c9bcd-b65b-494b-a15d-e791d40f6ac4 within parent: 4c24b267505d64df01b090-ef0d-4e4e-b80a-b52a832c988a', but it was removed by another session.

                    at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)

                    at java.util.concurrent.FutureTask.get(FutureTask.java:83)

                    at org.modeshape.jcr.benchmark.AbstractBenchmarkTest.executeTest(AbstractBenchmarkTest.java:203)

                    at org.modeshape.jcr.benchmark.MoveDocumentsBenchmarkTest.moveDocumentsSyncTcp20(MoveDocumentsBenchmarkTest.java:116)

                    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

                    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

                    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

                    at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)

                    at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)

                    at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)

                    at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)

                    at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)

                    at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30)

                    at com.carrotsearch.junitbenchmarks.BenchmarkStatement$BaseEvaluator.evaluateInternally(BenchmarkStatement.java:76)

                    ... 22 more

                Caused by: javax.jcr.InvalidItemStateException: This session tried to save changes to node with key 'Cannot locate child node: 4c24b267505d649a2c9bcd-b65b-494b-a15d-e791d40f6ac4 within parent: 4c24b267505d64df01b090-ef0d-4e4e-b80a-b52a832c988a', but it was removed by another session.

                    at org.modeshape.jcr.JcrSession.save(JcrSession.java:1152)

                    at org.modeshape.jcr.benchmark.MoveDocumentsBenchmarkTest$MoveNodeTask.call(MoveDocumentsBenchmarkTest.java:221)

                    at org.modeshape.jcr.benchmark.MoveDocumentsBenchmarkTest$MoveNodeTask.call(MoveDocumentsBenchmarkTest.java:186)

                    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)

                    at java.util.concurrent.FutureTask.run(FutureTask.java:138)

                    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)

                    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)

                    at java.util.concurrent.FutureTask.run(FutureTask.java:138)

                    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

                    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

                    at java.lang.Thread.run(Thread.java:662)

                Caused by: org.modeshape.jcr.cache.NodeNotFoundInParentException: Cannot locate child node: 4c24b267505d649a2c9bcd-b65b-494b-a15d-e791d40f6ac4 within parent: 4c24b267505d64df01b090-ef0d-4e4e-b80a-b52a832c988a

                    at org.modeshape.jcr.cache.document.LazyCachedNode.parentReferenceToSelf(LazyCachedNode.java:250)

                    at org.modeshape.jcr.cache.document.LazyCachedNode.getSegment(LazyCachedNode.java:287)

                    at org.modeshape.jcr.cache.document.LazyCachedNode.getPath(LazyCachedNode.java:313)

                    at org.modeshape.jcr.cache.PathCache.getPath(PathCache.java:49)

                    at org.modeshape.jcr.cache.document.WritableSessionCache.persistChanges(WritableSessionCache.java:994)

                    at org.modeshape.jcr.cache.document.WritableSessionCache.save(WritableSessionCache.java:621)

                    at org.modeshape.jcr.JcrSession.save(JcrSession.java:1145)

                 

                 

                Let me know if this is a topic for another discussion

                 

                Thanks by advance.

                • 5. Re: Clustering - sync/async replication
                  eshevchenko

                  Randall, I reworked test case, i think this one should give you better understanding what i mean. Appropriate pull request was created - [MODE-2216] Move operation on cluster causes InvalidItemStateException exception - JBoss Issue Tracker

                   

                  Thanks by advance.