5 Replies Latest reply on Jul 16, 2012 1:01 PM by jonathandfields

    ISPN error due to NPE during UserTransaction.commit() in bulk load test

    jonathandfields

      Config:

       

      • Modeshape 3.0 Alpha 6, JBoss AS 7.1, out of the box AS kit configuration.
      • Added BDBJE cache loader to the AS7 ISPN module, and modified standalone-modeshape.xml to use the BDBJE cache loader (see attached module.xml and standalone-modeshape.xml)
      • -Xms4096m -Xmx16384m -XX:MaxPermSize=2048m for the JVM (64 bit), running  on a 20 GB server.

       

      Test:

      • Attempting to load a  large number of nodes of real test data - around 5.7 M by my estimates (also trying to get an idea of how much memory is required for this for sizing purposes).
      • Loading is performed by an Asynchronous, BMT SLSB, that is batching the load with UserTransaction.begin(), create around 5000 nodes, Session.save(), UserTransaction.commit().

       

      Result:

      • After loading around 1.9M nodes (30% of the entire data set), an exception is thrown in the UserTransaction.commit(). The exception trace is attached.
        • Root cause appears to be NPE in BDBJE loader, so not sure if I should report this here or to ISPN.
        • Modeshape does not seem to be involved at all, but since it depends upon ISPN this crash prevents Modeshape from successfully loading the data.
      • I'm not sure if I'm running out of memory,  but before the exception, the memory is:
        • Runtime.getRuntime().totalMemory() = 16,429,056 K
        • Runtime.getRuntime().maxMemory() = 16,429,056 K
        • Runtime.getRuntime().freeMemory() = 1,381,335
        • There is no OutOfMemory anywhere, but it

       

      Again, I cannot unfortunately share the actual code or data. I'll see if I can construct a test that generates random data in a similar manner, that reproduces the problem, that I can share.

       

      Please let me know if I am doing something wrong, or I should submit a Modeshape or ISPN JIRA.

        • 1. Re: ISPN error due to NPE during UserTransaction.commit() in bulk load test
          rhauch

          Your configuration files aren't too far from the standard configuration included in the AS7.1 kit, so there's nothing there that looks out of ordinary. The attributes on the Infinispan cache "store" element seem appropriate. Interestingly, the NPE in the ConcurrentHashMap matches a line in my source that implies the value being inserted was null. Stepping up in the stack trace, the calling method appears to be line 308 of BdbjeCacheStore.java, which would imply there was no current transaction. Could the transaction have timed out? Surely there'd be something in the log. Perhaps try bumping the default timeout to something quite a bit larger. In your AS7 configuration, this is set with the following:

           

          <subsystem xmlns="urn:jboss:domain:transactions:1.1">
              ...
              <coordinator-environment default-timeout="300"/>
              ...
          
          </subsystem>
          

           

          If changing that doesn't work, perhaps try testing with another cache loader. I understand that wouldn't help your configuration, but it might help diagnose what's going wrong.

          1 of 1 people found this helpful
          • 2. Re: ISPN error due to NPE during UserTransaction.commit() in bulk load test
            jonathandfields

            Thanks I'll give that a try.

            • 3. Re: ISPN error due to NPE during UserTransaction.commit() in bulk load test
              jonathandfields

              I doubled the transaction timeout to 600 seconds, and eventually got the same exception. I have changed my test to a generic program instead of one that uses confidential data. I also set max-entries on the cache to be sure I was not running out of memory.  I'll keep experimenting with it. I'll set the transaction timeout to some really huge value. If I knew that it was just a transaction timeout (where you get the usual timeout message from arjuna) I'd be OK that's the problem, but that's not what I'm seeing. Also, the transactions are all the same - saving around 10K nodes that are almost identical, so why does it take hours of running before hitting the first failure?

               

              This is just a stab in the dark, but is there any background Lucene processing gonig on at any point, like  index optimization, that might be some interfering?

               

              Thanks,

              Jon

              • 4. Re: ISPN error due to NPE during UserTransaction.commit() in bulk load test
                jonathandfields

                Quick update on this. After setting max-entries in the Infinispan cache config to limit Infinispan memory usage, setting the tx timeout to 36000, and regularly calling System.gc(), I was able to add a much larger amount of nodes without this exception - and without the JVM memory growing to the maximum.

                 

                At first, after just setting max-entries, it did not make sense why the JVM memory still just continued to grow and grow. Finally it occurred to me it might not be Infinispan at all, but regular Java objects not being GC'd because they were being created so quickly. After adding the System.gc() call periodically, everything behaved much more predictably.

                 

                So I cannot say this with any facts to back it up, but it might be that the NPE was due to an out of memory being encountered and not handled in a graceful manner, due to a misconfiguration (unlimited cache size) and the GC issue.

                • 5. Re: ISPN error due to NPE during UserTransaction.commit() in bulk load test
                  jonathandfields

                  Correction: tx timeout = 3600