11 Replies Latest reply on Mar 4, 2008 10:58 AM by brian.stansberry

    NPE in InvalidationInterceptor.getNodeVersion()

    brian.stansberry

      From a Hibernate/JBC integration test case:

      16:15:20,049 WARN InvalidationInterceptor:270 - Unable to broadcast evicts as a part of the prepare phase. Rolling back.
      java.lang.NullPointerException
       at org.jboss.cache.interceptors.InvalidationInterceptor.getNodeVersion(InvalidationInterceptor.java:347)
       at org.jboss.cache.interceptors.InvalidationInterceptor.invalidateAcrossCluster(InvalidationInterceptor.java:336)
       at org.jboss.cache.interceptors.InvalidationInterceptor.invalidateModifications(InvalidationInterceptor.java:372)
       at org.jboss.cache.interceptors.InvalidationInterceptor.broadcastInvalidate(InvalidationInterceptor.java:266)
       at org.jboss.cache.interceptors.InvalidationInterceptor.handleCommitMethod(InvalidationInterceptor.java:208)
       at org.jboss.cache.interceptors.MethodDispacherInterceptor.invoke(MethodDispacherInterceptor.java:117)
       at org.jboss.cache.interceptors.Interceptor.nextInterceptor(Interceptor.java:111)
       at org.jboss.cache.interceptors.NotificationInterceptor.handleCommitMethod(NotificationInterceptor.java:42)
       at org.jboss.cache.interceptors.MethodDispacherInterceptor.invoke(MethodDispacherInterceptor.java:117)
       at org.jboss.cache.interceptors.Interceptor.nextInterceptor(Interceptor.java:111)
       at org.jboss.cache.interceptors.TxInterceptor.handleCommitRollback(TxInterceptor.java:849)
       at org.jboss.cache.interceptors.TxInterceptor.runCommitPhase(TxInterceptor.java:896)
       at org.jboss.cache.interceptors.TxInterceptor$RemoteSynchronizationHandler.afterCompletion(TxInterceptor.java:1240)
       at org.jboss.cache.interceptors.TxInterceptor$LocalSynchronizationHandler.afterCompletion(TxInterceptor.java:1404)
       at org.jboss.cache.interceptors.OrderedSynchronizationHandler.afterCompletion(OrderedSynchronizationHandler.java:95)
       at org.hibernate.test.tm.SimpleJtaTransactionImpl.commit(SimpleJtaTransactionImpl.java:95)
       at org.hibernate.test.tm.SimpleJtaTransactionManagerImpl.commit(SimpleJtaTransactionManagerImpl.java:85)
       at org.hibernate.test.cache.jbc2.functional.bulk.BulkOperationsTest.createContacts(BulkOperationsTest.java:137)
       at org.hibernate.test.cache.jbc2.functional.bulk.BulkOperationsTest.testBulkOperations(BulkOperationsTest.java:87)
       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
       at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
       at java.lang.reflect.Method.invoke(Method.java:585)
       at junit.framework.TestCase.runTest(TestCase.java:154)
       at org.hibernate.junit.functional.FunctionalTestCase.runTest(FunctionalTestCase.java:125)
       at junit.framework.TestCase.runBare(TestCase.java:127)
       at org.hibernate.junit.UnitTestCase.runBare(UnitTestCase.java:63)
       at junit.framework.TestResult$1.protect(TestResult.java:106)
       at junit.framework.TestResult.runProtected(TestResult.java:124)
       at junit.framework.TestResult.run(TestResult.java:109)
       at junit.framework.TestCase.run(TestCase.java:118)
       at junit.framework.TestSuite.runTest(TestSuite.java:208)
       at junit.framework.TestSuite.run(TestSuite.java:203)
       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
       at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
       at java.lang.reflect.Method.invoke(Method.java:585)
       at org.apache.maven.surefire.junit.JUnitTestSet.execute(JUnitTestSet.java:213)
       at org.apache.maven.surefire.suite.AbstractDirectoryTestSuite.executeTestSet(AbstractDirectoryTestSuite.java:138)
       at org.apache.maven.surefire.suite.AbstractDirectoryTestSuite.execute(AbstractDirectoryTestSuite.java:125)
       at org.apache.maven.surefire.Surefire.run(Surefire.java:132)
       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
       at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
       at java.lang.reflect.Method.invoke(Method.java:585)
       at org.apache.maven.surefire.booter.SurefireBooter.runSuitesInProcess(SurefireBooter.java:290)
       at org.apache.maven.surefire.booter.SurefireBooter.main(SurefireBooter.java:818)



      This happens during a commit of a tx that did a bunch of put calls, plus some remove calls for nodes that didn't actually exist in the local cache.

      The getNodeVersion() code that fails looks like this:

      protected DataVersion getNodeVersion(TransactionWorkspace w, Fqn f)
      {
       if (w == null) return null;
       WorkspaceNode wn = w.getNode(f);
       DataVersion v = wn.getVersion();
       ...
      


      NPE in last line because wn is null.

      I think the problem is because the fqn points to a node that doesn't exist in the workspace -- i.e. a nodeRemove() call was made for it, so the Fqn got associated with the tx, but no node w/ that fqn existed, so there's no node in the workspace.

      I tried to write a simple unit test to demonstrate that, but it passes. ??? So, instead of a JIRA I'm opening a forum thread.


      OT: the invalidation messages associated with a tx aren't batched. :-(

        • 1. Re: NPE in InvalidationInterceptor.getNodeVersion()
          brian.stansberry

          JIRA for this: http://jira.jboss.com/jira/browse/JBCACHE-1297 Whether I can reproduce it in a unit test or not, it's a bug. :-)

          • 2. Re: NPE in InvalidationInterceptor.getNodeVersion()
            brian.stansberry

            I've fixed this by returning null from getNodeVersion() if there is no WorkspaceNode for the Fqn. The effect of this is 'null' will be propagated to other nodes in the cluster as the version being invalidated. Looking at how they handle the invalidate() call, I see no problems with this -- they treat an invalidate() call with a null version as meaning "invalidate this Fqn no matter what version you have, but don't update the version".

            Effect of this is, if an Fqn is associated with the optimistic tx, but there is no node w/ that Fqn in the workspace, we tell the cluster to invalidate that Fqn w/o updating the version. In a case where a tx calls cache.removeNode(Fqn.fromString("/locally-non-existent")) that seems to be the desired behavior. I don't see any other reason why an Fqn would be associated with a tx but no node would be in the workspace.

            • 3. Re: NPE in InvalidationInterceptor.getNodeVersion()
              brian.stansberry

              This NPE was due to a call to remove a node that doesn't exist locally. My test to show that yesterday had a bug in it; fixed the test and it's apparent. Test is InvalidationInterceptorTest.testDeleteNonExistentOptimistic().

              I'll close the JIRA once I get a hudson run.

              • 4. Re: NPE in InvalidationInterceptor.getNodeVersion()
                brian.stansberry

                Been conversing with myself as I work through this. :-)

                More issues: http://jira.jboss.com/jira/browse/JBCACHE-1298

                My proposed fix to JBCACHE-1298 is as follows:

                1) CacheImpl.invalidate() should not create a tombstone if argument DataVersion versionToInvalidate is null. Just return. The cache is already in a state consistent with the intent of the invalidate() call; there's no need to do anything further.

                2) If a tombstone does need to be created, the versionToInvalidate should be set as an Option when it is created, rather than putting in a possibly incompatible DefaultDataVersion.

                • 5. Re: NPE in InvalidationInterceptor.getNodeVersion()
                  brian.stansberry

                  You know, DefaultDataVersion causes me all sorts of troubles -- always having to worry about obscure cases where it gets assigned to a node and then conflicts with the custom DataVersion types Hibernate uses. Seems like this is something that can be improved. Perhaps make the "default" DataVersion class configurable. I could then set it to use Hibernate's NonLockingDataVersion (always returns "false" to newerThan()) instead of DefaultDataVersion. It's then my responsibility to manage versioning.

                  • 6. Re: NPE in InvalidationInterceptor.getNodeVersion()
                    manik

                     

                    "bstansberry@jboss.com" wrote:
                    You know, DefaultDataVersion causes me all sorts of troubles -- always having to worry about obscure cases where it gets assigned to a node and then conflicts with the custom DataVersion types Hibernate uses. Seems like this is something that can be improved. Perhaps make the "default" DataVersion class configurable. I could then set it to use Hibernate's NonLockingDataVersion (always returns "false" to newerThan()) instead of DefaultDataVersion. It's then my responsibility to manage versioning.


                    That is a good idea, only it's a bit late in the day to introduce something like this now. Let me have a think, if it does make sense then I'll do it, otherwise will work with your proposed fix.

                    Thanks for bringing this up.

                    • 7. Re: NPE in InvalidationInterceptor.getNodeVersion()
                      brian.stansberry

                      I wasn't thinking about the "configurable default DataVersion" so much in terms of 2.1; you're right it's late in the cycle and that's a big change likely to introduce bugs.

                      • 8. Re: NPE in InvalidationInterceptor.getNodeVersion()
                        jason.greene

                        OK here is a stupid question. Why do we need to allow applications to assume control of the Cache Node version? This just doesn't seem like something we should be exposing.

                        • 9. Re: NPE in InvalidationInterceptor.getNodeVersion()
                          brian.stansberry

                          In the Hibernate case, if an entity has a version in the DB, the cache node representing that entity has the same version. As Steve likes to say, the DB is the truth of the system.

                          • 10. Re: NPE in InvalidationInterceptor.getNodeVersion()
                            jason.greene

                            Why can't this be in the data map, like all other information?

                            • 11. Re: NPE in InvalidationInterceptor.getNodeVersion()
                              brian.stansberry

                              It is. But JBC doesn't know anything about that.

                              As I think about the semantics of what Hibernate actually wants, not sure it wants a changing DataVersion at all. It has a DataVersion impl NonLockingDataVersion that never reports a version conflicts; that's what it uses by default. Only if it the underlying entity is versioned does it try to apply a real version to the node; there it wants to use the db version (which makes sense to me). Perhaps that can be dropped, and the non

                              There's 3 kinds of writes to the cache in the Hibernate entity caching case:

                              1) Write to cache as part of tx that creates a DB record.
                              2) Write to cache as part of tx that updates a DB record.
                              3) Insertion into cache of data that's read from DB.

                              #3 is the one that's most likely to result in trying to write stale data to the cache. But we already handle that by calling putForExternalRead, which will silently fail if the target node even exists. Version doesn't matter -- it just fails.

                              Issue then is more about conflicts related to #1/#2; most simply two concurrent #2 cases. I'll have to think more about whether there is any benefit to storing the version in the JBC node there; i.e. whether any conflicts would already be caught at the db level before the cache write is even attempted.