1 Reply Latest reply on May 31, 2017 10:53 AM by jkreutzfeld

    Inconsistent version history after concurrent access

    jkreutzfeld

      Hi everyone,

       

      i really don't have clue on this problem and i'm hoping you can shed some light on it. After migrating to Modeshape 5.4.0 we sometimes get the issue that a node has a baseVersion reference to a nonexistant version, so any further access to this node via the VersionManager results in an error. I don't know why this happens, but it must be related to a lot of concurrent actions with that node (creating, reading, copying, update through sequencer). I was unable to create a small, self contained sample test for this. I extracted some data from the corrupted node to make this clearer (using a simple JCR client):

       

      This is the node and its properties (note the "jcr:baseVersion"):

      karaf@root()> jcr:describe home/jkreutzfeld/Crashtest2/basic_male+\%28Kopie+2\%29.png

      home/jkreutzfeld/Crashtest2/basic_male+%28Kopie+2%29.png

      basic_male+%28Kopie+2%29.png (nt:file)

         jcr:uuid = 74b69d27-9e1b-41f3-9dec-ccebf790b0ba

         jcr:uuid = 74b69d27-9e1b-41f3-9dec-ccebf790b0ba

         jcr:baseVersion = a69d00a317f1e783cbbfbb-9568-4d30-8fef-c4c24704553a

         jcr:predecessors =

         jcr:title = basic_male.png

         ds:scanDate = 2017-05-24T10:41:32.822+02:00

         jcr:createdBy = JKREUTZFELD

         ds:checksum = 831dc6d4aa369579142276229e09a82aee11a6dc

         jcr:primaryType = nt:file

         jcr:created = 2017-05-24T10:41:31.816+02:00

         jcr:mixinTypes = [ dsmix:virusChecked, mix:lockable, dsmix:checksum, mix:title, dsmix:thumbnails, mix:versionable ]

         jcr:versionHistory = a69d00a317f1e774b69d27-9e1b-41f3-9dec-ccebf790b0ba

         jcr:isCheckedOut = false

       

      When i try to checkout the node, i get this error:

      karaf@root()> jcr:checkout /home/jkreutzfeld/Crashtest2/basic_male+\%28Kopie+2\%29.png

      RepositoryException: Cannot find the base version with the key 'a69d00a317f1e783cbbfbb-9568-4d30-8fef-c4c24704553a' for the node '/home/jkreutzfeld/Crashtest2/basic_male+%28Kopie+2%29.png'. Make sure that any checkin/checkout operations are atomic for that node.

      javax.jcr.version.VersionException: Cannot find the base version with the key 'a69d00a317f1e783cbbfbb-9568-4d30-8fef-c4c24704553a' for the node '/home/jkreutzfeld/Crashtest2/basic_male+%28Kopie+2%29.png'. Make sure that any checkin/checkout operations are atomic for that node.

              at org.modeshape.jcr.AbstractJcrNode.getBaseVersion(AbstractJcrNode.java:3297)

              at org.modeshape.jcr.JcrVersionManager.checkout(JcrVersionManager.java:557)

              at org.modeshape.jcr.JcrVersionManager.checkout(JcrVersionManager.java:524)

              at de.doubleslash.servicecomponents.tools.contentrepositoryclient.impl.command.CheckoutCommand.executeCheckout(CheckoutCommand.java:97)

              at de.doubleslash.servicecomponents.tools.contentrepositoryclient.impl.command.CheckoutCommand.checkout(CheckoutCommand.java:58)

              at de.doubleslash.servicecomponents.tools.contentrepositoryclient.impl.command.CheckoutCommand.doExecute(CheckoutCommand.java:45)

              at org.apache.karaf.shell.console.AbstractAction.execute(AbstractAction.java:34)

              at org.apache.karaf.shell.console.OsgiCommandSupport.execute(OsgiCommandSupport.java:41)

              at org.apache.karaf.shell.commands.basic.AbstractCommand.execute(AbstractCommand.java:34)

              at org.apache.karaf.shell.compat.CommandTracker$1.execute(CommandTracker.java:112)

              at org.apache.karaf.shell.impl.console.osgi.secured.SecuredCommand.execute(SecuredCommand.java:67)

              at org.apache.karaf.shell.impl.console.osgi.secured.SecuredCommand.execute(SecuredCommand.java:87)

              at org.apache.felix.gogo.runtime.Closure.executeCmd(Closure.java:480)

              at org.apache.felix.gogo.runtime.Closure.executeStatement(Closure.java:406)

              at org.apache.felix.gogo.runtime.Pipe.run(Pipe.java:108)

              at org.apache.felix.gogo.runtime.Closure.execute(Closure.java:182)

              at org.apache.felix.gogo.runtime.Closure.execute(Closure.java:119)

              at org.apache.felix.gogo.runtime.CommandSessionImpl.execute(CommandSessionImpl.java:94)

              at org.apache.karaf.shell.impl.console.ConsoleSessionImpl.run(ConsoleSessionImpl.java:270)

              at java.lang.Thread.run(Thread.java:745)

      Caused by: org.modeshape.jcr.cache.NodeNotFoundInParentException: Cannot locate child node: a69d00a317f1e783cbbfbb-9568-4d30-8fef-c4c24704553a within parent: a69d00a317f1e774b69d27-9e1b-41f3-9dec-ccebf790b0ba

              at org.modeshape.jcr.cache.document.LazyCachedNode.parentReferenceToSelf(LazyCachedNode.java:238)

              at org.modeshape.jcr.cache.document.LazyCachedNode.getSegment(LazyCachedNode.java:282)

              at org.modeshape.jcr.cache.document.LazyCachedNode.getPath(LazyCachedNode.java:291)

              at org.modeshape.jcr.JcrSession.node(JcrSession.java:570)

              at org.modeshape.jcr.JcrSession.node(JcrSession.java:534)

              at org.modeshape.jcr.JcrSession.node(JcrSession.java:500)

              at org.modeshape.jcr.JcrSession.node(JcrSession.java:478)

              at org.modeshape.jcr.AbstractJcrNode.getBaseVersion(AbstractJcrNode.java:3293)

              ... 19 more

       

      Here is the version history for the node and the corresponding versions:

      karaf@root()> jcr:list jcr:system/jcr:versionStorage/08/5f/e3/085fe35d39c56090d913f58baceec8d8862b9fa7

      jcr:system/jcr:versionStorage/08/5f/e3/085fe35d39c56090d913f58baceec8d8862b9fa7

      ├ jcr:versionLabels (nt:versionLabels)

      ├ jcr:rootVersion (nt:version)

      ├ 1.0 (nt:version)

      └ 1.1 (nt:version)

       

       

      karaf@root()> jcr:describe jcr:system/jcr:versionStorage/08/5f/e3/085fe35d39c56090d913f58baceec8d8862b9fa7/jcr:rootVersion

      jcr:system/jcr:versionStorage/08/5f/e3/085fe35d39c56090d913f58baceec8d8862b9fa7/jcr:rootVersion

      jcr:rootVersion (nt:version)

         jcr:uuid = a69d00a317f1e7ee01bf25-8721-4bf6-91ab-ee50735a6115

         jcr:successors = [ a69d00a317f1e795f5f976-3b11-4c0b-b59c-50c39de7a555 ]

         jcr:primaryType = nt:version

         jcr:created = 2017-05-24T10:43:53.151+02:00

        

      karaf@root()> jcr:describe jcr:system/jcr:versionStorage/08/5f/e3/085fe35d39c56090d913f58baceec8d8862b9fa7/1.0

      jcr:system/jcr:versionStorage/08/5f/e3/085fe35d39c56090d913f58baceec8d8862b9fa7/1.0

      1.0 (nt:version)

         jcr:uuid = a69d00a317f1e795f5f976-3b11-4c0b-b59c-50c39de7a555

         jcr:successors = [ a69d00a317f1e7cb40dbc3-e391-42f7-aee2-6e26b74c77ba ]

         jcr:primaryType = nt:version

         jcr:predecessors = [ a69d00a317f1e7ee01bf25-8721-4bf6-91ab-ee50735a6115 ]

         jcr:created = 2017-05-24T10:43:53.648+02:00

        

       

       

      karaf@root()> jcr:describe jcr:system/jcr:versionStorage/08/5f/e3/085fe35d39c56090d913f58baceec8d8862b9fa7/1.1

      jcr:system/jcr:versionStorage/08/5f/e3/085fe35d39c56090d913f58baceec8d8862b9fa7/1.1

      1.1 (nt:version)

         jcr:uuid = a69d00a317f1e7cb40dbc3-e391-42f7-aee2-6e26b74c77ba

         jcr:successors = [ a69d00a317f1e783cbbfbb-9568-4d30-8fef-c4c24704553a ]

         jcr:primaryType = nt:version

         jcr:predecessors = [ a69d00a317f1e795f5f976-3b11-4c0b-b59c-50c39de7a555 ]

         jcr:created = 2017-05-24T10:43:54.619+02:00

       

      As you can see, there actually is no version with an ID that matches the jcr:baseVersion property of the node.

       

      I could not determine how this could even be possible, I hope anyone got an idea on this.

      What are we doing wrong? Could it be possible that a checkin operation fails in a way that the node is updated but the history node was not created correctly?

      I'm sorry that this problem is so vague.

       

      Kind regards

      Jan

        • 1. Re: Inconsistent version history after concurrent access
          jkreutzfeld

          I haven't found the cause of the problem yet, but I want to give a quick update on our latest tests: I updated ModeShape to v5.4.1, which did not fix the issue. I was also unable to reproduce the issue with ModeShape v4.6.0, so it must be related to the newer version (or the new file persistence without infinispan). After that, we switched from the file persistence to the database persistence with a PostgreSQL DB. With this setup, i could no longer reproduce the error. We will run some more tests on the server and if no other problems occur, we will go with the database persistence. Strangely, when importing our repository backup into the database persistence, we need to temporary remove the primary key constraint because there seems to be a duplicate entry that needs to be deleted afterwards. Additionally, we created a database index in PostgreSQL for the modeshape table on the ID column, because without that starting up the repository takes forever. I don't know if that is related to our specific repository backup or a more common problem. I'll update this thread if any new information becomes available.