7 Replies Latest reply on Jan 8, 2016 7:05 PM by ma6rl

    Concurrent modifications to the repository can result in org.modeshape.jcr.cache.NodeNotFoundInParentException

    ma6rl

      I am using the Modeshape 4.5.0.Final Wildfly Kit with the following infinispan configuration:

       

      <?xml version="1.0" encoding="UTF-8"?>
      <infinispan xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                  xsi:schemaLocation="urn:infinispan:config:7.0 http://www.infinispan.org/schemas/infinispan-config-7.0.xsd
                  urn:infinispan:config:store:jdbc:7.0 http://docs.jboss.org/infinispan/schemas/infinispan-cachestore-jdbc-config-7.0.xsd"
                  xmlns="urn:infinispan:config:7.0">
          <jgroups>
              <stack-file name="tcp" path="${jboss.server.config.dir}/modeshape/jgroups-config.xml"/>
          </jgroups>
          <cache-container default-cache="omakase-repo" statistics="false">
              <transport cluster="modeshape-cluster" stack="tcp"/>
              <jmx duplicate-domains="true"/>
              <replicated-cache name="repo" mode="SYNC">
                  <locking striping="false" isolation="READ_COMMITTED"/>
                  <transaction mode="NON_DURABLE_XA" locking="PESSIMISTIC"/>
                  <eviction max-entries="200000" strategy="LIRS"/>
                  <expiration interval="-1"/>
                  <persistence passivation="false">
                      <string-keyed-jdbc-store xmlns="urn:infinispan:config:store:jdbc:7.0" fetch-state="false" read-only="false" purge="false" shared="true">
                          <data-source jndi-url="java:jboss/datasources/DS"/>
                          <string-keyed-table
                                  prefix="modeshape"
                                  create-on-start="true"
                                  drop-on-exit="false">
                              <id-column name="id" type="VARCHAR(200)"/>
                              <data-column name="datum" type="LONGBLOB"/>
                              <timestamp-column name="version" type="BIGINT"/>
                          </string-keyed-table>
                      </string-keyed-jdbc-store>
                  </persistence>
              </replicated-cache>
          </cache-container>
      </infinispan>
      

       

      - I am using indexes but have been able to recreate the issue without indexes, it just occurs more frequently with indexes.

      - I do have eviction enabled but have been able to recreate the issue with eviction disabled.

      - I am using container managed transactions with each update is happening in it's own Tx.

       

      The issue occurs when several threads are making concurrent updates to the same repository. Each thread starts and commits it's own Tx (via container managed transactions). The problem seems to appear after the following error occurs during an update:

       

      12:43:18,820 ERROR [org.jboss.as.ejb3] (default task-7) javax.ejb.EJBTransactionRolledbackException: javax.jcr.RepositoryException: org.modeshape.jcr.TimeoutException: Timeout while attempting to lock the keys [f5e46727505d6451e1455a-9fd2-4530-a3f0-7265f7e546c1] after 0 retry attempts.
      
      Caused by: org.modeshape.jcr.TimeoutException: Timeout while attempting to lock the keys [f5e46727505d6451e1455a-9fd2-4530-a3f0-7265f7e546c1] after 0 retry attempts.
        at org.modeshape.jcr.cache.document.WritableSessionCache.save(WritableSessionCache.java:670) [modeshape-jcr-4.5.0.Final.jar:4.5.0.Final]
        at org.modeshape.jcr.JcrSession.save(JcrSession.java:1171) [modeshape-jcr-4.5.0.Final.jar:4.5.0.Final]
        ... 154 more
      Caused by: org.infinispan.util.concurrent.TimeoutException: Timeout while attempting to lock the keys [f5e46727505d6451e1455a-9fd2-4530-a3f0-7265f7e546c1] after 0 retry attempts.
        at org.modeshape.jcr.cache.document.WritableSessionCache.lockNodes(WritableSessionCache.java:1495) [modeshape-jcr-4.5.0.Final.jar:4.5.0.Final]
        at org.modeshape.jcr.cache.document.WritableSessionCache.save(WritableSessionCache.java:638) [modeshape-jcr-4.5.0.Final.jar:4.5.0.Final]
        ... 155 more
      

       

      Once this error occurs the modeshape repository can be left in an inconsistent state where a child node exists and references a parent node, but the parent node does not contain a reference to the child node hence the org.modeshape.jcr.cache.NodeNotFoundInParentException. As you can see from the stack trace above the transaction is rollback by the container but it seems like a child node is not being correctly cleaned up.

       

      I am currently working on a test-app that can demonstrate this problem, once I have that I plan to open a JIRA but in the mean time if you have any ideas about how I can work around this I would appreciate any help you can provide.

        • 1. Re: Concurrent modifications to the repository can result in org.modeshape.jcr.cache.NodeNotFoundInParentException
          hchiorean

          The only thing I can think of to suggest is to try and increase the locking timeout (acquire-timeout on the locking element) so as to avoid raising the TimeoutException.

          • 2. Re: Concurrent modifications to the repository can result in org.modeshape.jcr.cache.NodeNotFoundInParentException
            ma6rl

            My initial findings were incorrect, this issue is 100% related to eviction still. If I disable eviction it never happens, if I enable eviction it happens fairly consistently when performing concurrent modifications. Once the issue occurs the data in the repository is corrupted and any attempt to access the nodes or query them results in the above exception.

             

            I do have a test suite I can make available via GitHub if this helps. I plan on raising a JIRA for this issue, would you prefer me to re-open MODE-2148 or create a new issue?

             

            This is a major show stopper for us as we need to support concurrent modification and our data set is to large to keep it all in the cache which means we need eviction enabled so if there are any suggestions about how we can deal with this in the short term I would really appreciate them.

            • 3. Re: Concurrent modifications to the repository can result in org.modeshape.jcr.cache.NodeNotFoundInParentException
              hchiorean

              If this is related to eviction, then this is another bug in Infinispan and ideally should be reported via an Infinispan test-case to the ISPN team. Since it's clustering related, it should be another bug separate from the other eviction issues. Also, since eviction cannot be disabled without running into OOM errors, the only other thing you can try to change (besides the eviction max-entries) is the eviction algorithm. Maybe using another algorithm (besides LIRS) will not cause this issue. But again, if you are certain this is eviction related, you should ideally try to have an Infinispan-only test case, similar to one I produced for [ISPN-4810] Local Transactional Cache loses data when eviction is enabled and there are multiple readers and one writer ….

               

              One other thing which I can think of by looking at the configuration, is if you've tried removing the "expiration" entry altogether ? I don't know if setting interval to -1 is the same as not having it at all, but expiration should definitively never be configured for ModeShape.

               

              There is another aspect you should consider: if the bug were within ModeShape, we would normally try to reproduce it and fix it, but if it's with ISPN then we have no control over that. This coupled with the fact that we are moving away from ISPN in ModeShape 5 (see ModeShape 5 and beyond) means that for you to get the fix, the ISPN team would have to backport the/any fix to ISPN 7 since ModeShape 4.x will not be moving to ISPN 8 or newer. This also means we will not be spending significant time anymore into investigating any ISPN bugs ourselves, like we did for ISPN-4810 (which took me almost a week to investigate into the ModeShape codebase and translate it to an ISPN test case)

              • 4. Re: Concurrent modifications to the repository can result in org.modeshape.jcr.cache.NodeNotFoundInParentException
                ma6rl

                hchiorean this may or may not be an issue in the Infinispan, at this point all I know is that if I enable eviction the issue appears fairly quickly whereas I haven't see the issue with eviction disabled. This issue is not related to clustering and I have recreated using a local cache. What I do know at this point is that if I make concurrent modifications to the same set of nodes in separate threads using User Transactions that I can quickly cause an issue where a child node is committed to the cache with a reference to it's parent node, but the parent node does not contain a reference to the child node. Why this only happens when eviction is enabled is a little strange as it normally occurs well before the eviction threshold is reached. What it does mean is I can not safely make concurrent updates to my Modeshape repository without fear of corrupting the underlying node structure, as once it happens there is no way to recover from it.

                 

                I will commit my test project to GitHub shortly and raise a JIRA for this issue, I would really appreciate it you could take a look to see what is going on once I commit the test application as this is a show stopper for us at the moment in rolling out our application.

                 

                Horia Chiorean wrote:

                 

                One other thing which I can think of by looking at the configuration, is if you've tried removing the "expiration" entry altogether ? I don't know if setting interval to -1 is the same as not having it at all, but expiration should definitively never be configured for ModeShape.

                 

                 

                Removing the expiration configuration does not disable the expiration thread from running, it just never finds anything to expire but can cause issues if it runs at the same time as a node is being modified in a different thread. By setting it to -1 the thread never runs.

                 

                Horia Chiorean wrote:

                 

                 

                There is another aspect you should consider: if the bug were within ModeShape, we would normally try to reproduce it and fix it, but if it's with ISPN then we have no control over that. This coupled with the fact that we are moving away from ISPN in ModeShape 5 (see ModeShape 5 and beyond) means that for you to get the fix, the ISPN team would have to backport the/any fix to ISPN 7 since ModeShape 4.x will not be moving to ISPN 8 or newer. This also means we will not be spending significant time anymore into investigating any ISPN bugs ourselves, like we did for ISPN-4810 (which took me almost a week to investigate into the ModeShape codebase and translate it to an ISPN test case)

                 

                Will Modeshape 4.5 or 4.6 work with Wildfly 10, JBoss EAP 7 or will we need to move to Modeshape 5 to be able to use Wildfly 10 or JBoss EAP 7?

                • 5. Re: Concurrent modifications to the repository can result in org.modeshape.jcr.cache.NodeNotFoundInParentException
                  hchiorean

                  This issue is not related to clustering and I have recreated using a local cache

                  If that is the case, then it should be easier to investigate. The scenario should be easy-enough to reproduce; we already have a dedicated test case for user transactions here: modeshape/TransactionsTest.java at 4.x · ModeShape/modeshape · GitHub

                  The reason I mention the previous test case is that it's far easier to investigate something in a standalone environment, not only non-clustered but also outside of Wildfly.

                  • 6. Re: Concurrent modifications to the repository can result in org.modeshape.jcr.cache.NodeNotFoundInParentException
                    ma6rl

                    I have a fairly simple test case now the reproduces the issue fairly consistently when run inside of Wildfly, but does not reproduce the issue when run in a standalone environment.

                     

                    The only difference other the the environment is how I obtain the UserTransaction, when running inside Wildfly I obtain the Tx via injection using @Resource, whereas when I run the code standalone via the Modeshape TestSuite I get TxManager from the JcrRepository.

                     

                    I'm going to push the code to GitHub and create a JIRA for this issue so you can see what it going on.