4 Replies Latest reply on Aug 17, 2012 3:29 AM by apatispelikan

    org.infinispan.CacheException: Could not prepare - caused by XAException

    apatispelikan

      Hello,

       

      I try to tune JBoss AS 7.1.2 for higher through-put and get errors for which I cannot find a reason:

       

      09:59:17,123 WARN  [com.arjuna.ats.arjuna] (http-executor-threads - 41) ARJUNA012125: TwoPhaseCoordinator.beforeCompletion - failed for SynchronizationImple< 0:ffffc29e84b8:-4561d3a3:4fe1818d:43524, SynchronizationAdapter{localTransaction=LocalTransaction{remoteLockedNodes=null, isMarkedForRollback=false, transaction=TransactionImple < ac, BasicAction: 0:ffffc29e84b8:-4561d3a3:4fe1818d:4325a status: ActionStatus.ABORT_ONLY >, lockedKeys=null, backupKeyLocks=null, viewId=1} org.infinispan.transaction.synchronization.SyncLocalTransaction@de0b} org.infinispan.transaction.synchronization.SynchronizationAdapter@de2a >: org.infinispan.CacheException: Could not prepare.

              at org.infinispan.transaction.synchronization.SynchronizationAdapter.beforeCompletion(SynchronizationAdapter.java:70) [infinispan-core-5.1.4.FINAL.jar:5.1.4.FINAL]

              at com.arjuna.ats.internal.jta.resources.arjunacore.SynchronizationImple.beforeCompletion(SynchronizationImple.java:76)

              at com.arjuna.ats.arjuna.coordinator.TwoPhaseCoordinator.beforeCompletion(TwoPhaseCoordinator.java:273)

              at com.arjuna.ats.arjuna.coordinator.TwoPhaseCoordinator.end(TwoPhaseCoordinator.java:93)

              at com.arjuna.ats.arjuna.AtomicAction.commit(AtomicAction.java:164)

              at com.arjuna.ats.internal.jta.transaction.arjunacore.TransactionImple.commitAndDisassociate(TransactionImple.java:1165)

              at com.arjuna.ats.internal.jta.transaction.arjunacore.BaseTransaction.commit(BaseTransaction.java:117)

              ...

      Caused by: javax.transaction.xa.XAException

              at org.infinispan.transaction.TransactionCoordinator.prepare(TransactionCoordinator.java:160) [infinispan-core-5.1.4.FINAL.jar:5.1.4.FINAL]

              at org.infinispan.transaction.TransactionCoordinator.prepare(TransactionCoordinator.java:122) [infinispan-core-5.1.4.FINAL.jar:5.1.4.FINAL]

              at org.infinispan.transaction.synchronization.SynchronizationAdapter.beforeCompletion(SynchronizationAdapter.java:68) [infinispan-core-5.1.4.FINAL.jar:5.1.4.FINAL]

              ... 85 more

       

      These exceptions only occur during stressing JBoss by a test. This test sends (configurable) parallel requests on different webservice-methods (implemented by a slsb) but only on one node (this is a two-node-cluster in domain-mode - please don't ask: yes much more nodes will follow :-) ).

       

      This is my cache configuration:

                      <cache-container name="hibernate" default-cache="local-query" module="org.jboss.as.jpa.hibernate:4" eviction-executor="infinispan-eviction">

                          <transport lock-timeout="60000"/>

                          <invalidation-cache name="local-query" mode="SYNC">

                              <transaction mode="NONE"/>

                              <eviction strategy="LRU" max-entries="5000"/>

                              <expiration max-idle="660000"/>

                              <locking concurrency-level="100"/>

                          </invalidation-cache>

                          <invalidation-cache name="entity" mode="SYNC">

                              <transaction mode="NON_XA"/>

                              <eviction strategy="LRU" max-entries="100000"/>

                              <expiration max-idle="3600000"/>

                              <locking concurrency-level="100"/>

                          </invalidation-cache>

                          <replicated-cache name="timestamps" mode="ASYNC">

                              <transaction mode="NONE"/>

                              <eviction strategy="NONE"/>

                          </replicated-cache>

                      </cache-container>

       

      Furthur changes:

      - I use an apache in front of JBoss and it is configured to serve 600 requests concurrently

      - I added a separat thread-pool for the web-module

                      <bounded-queue-thread-pool name="http-executor">

                          <core-threads count="58"/>

                          <queue-length count="148"/>

                          <max-threads count="58"/>

                          <keepalive-time time="10" unit="seconds"/>

                      </bounded-queue-thread-pool>

      - I increased my database-pool (XA-datasource to Oracle)

                              <xa-pool>

                                  <min-pool-size>2</min-pool-size>

                                  <max-pool-size>100</max-pool-size>

                                  <is-same-rm-override>false</is-same-rm-override>

                                  <interleaving>false</interleaving>

                                  <pad-xid>false</pad-xid>

                                  <wrap-xa-resource>false</wrap-xa-resource>

                              </xa-pool>

      - I adapted infinispan-configuration (cache-type, concurrency-level)

       

      If this exception occurs I can see that JBoss seems to block and no requests are served for approximatly 10-12 seconds. After the requests getting those exceptions are done it works for a while until the next 10-seconds sequence arives.

       

      I did some changes and got different results:

      - Changing query-cache form invalidation delays those exceptions for a long while - but I get them after a while

      - Depending on the concurrency-level of my tests those exceptions arive earlier or later:

         - invalidation-cache/concurrency=25: after 30sec (throughput: ~300 requests/second)

         - local-cache/concurrency=100: after 2-4 minutes (throughput: ~1500 requests/second)

         - local-cache/concurrency=200: after 1-2 minutes (throughput: ~1000 requests/second)

      - Stressing if the second node is offline also delays the occurence of those exceptions (but does not avoid them)

      - system-load increases during my tests (2-4; of course this is normal) but suddenly it raises to a very high (24-40) and at that moment the 10-seconds-period and those exceptions occur.

      - The duration of the requests increases permanently starting at 300ms ending at 800ms (although those requests do the same work on different database-records)

      - One type of request checks for a certain database-record: If this records does not exists it will be created. If I force this situation (by peparing data for the test) those exceptions occur earlier.

       

      So my questions:

      - What causes this exception?

      - What kind of logging can I enable to get further informations?

      - It seems that anythings is to slow or it is to small set. I tried to increase everything in the chain of resources (http, slsb-pool, db-pool). Did I forget anything?

      - I use UDP for cluster-communication. Could this be a bottleneck on the network? (the server has two 1-gigabit network-cards!)

      - Can this be a deadlock on a resource? I noticed infinispan got a deadlock-detection but I cannot find any information how to configure in AS7.

       

      I hope anyone can help me. These is the last hurdle to get my migration from 5.1 to 7.1 done!

       

      Thanks,

      Stephan

        • 1. Re: org.infinispan.CacheException: Could not prepare - caused by XAException
          apatispelikan

          I've done further testing: The 10-seconds-periods occur always - but those exception not after each 10-seconds-period.

           

          If there is no exception I can see, that during my 10-seconds-periods of "no requests are processed" the number of available connections of my DS-Pool is 100%! This is very strange - because if infinispan blocks all connections should be occupied!? This would mean, that anything in the front of the bean (which use the entity-manager which grabs connection from the pool) causes the 10-seconds-pauses. Am I wright?

           

          If there is and exception connections are in use.

          • 2. Re: org.infinispan.CacheException: Could not prepare - caused by XAException
            apatispelikan

            Next test: Using JBoss directly (exclude the apache) eliminates the 10-seconds periods! There is also no system-load-problem any more and for now I didn't get any exception. Will go on...

            • 3. Re: org.infinispan.CacheException: Could not prepare - caused by XAException
              apatispelikan

              I found that using XA-Transactions needs some privileges in Oracle:

               

              GRANT SELECT ON sys.dba_pending_transactions TO user;

              GRANT SELECT ON sys.pending_trans$ TO user;

              GRANT SELECT ON sys.dba_2pc_pending TO user;

              GRANT EXECUTE ON sys.dbms_xa TO user;

              GRANT EXECUTE ON sys.dbms_system TO user;

               

              This eliminates any XA-exceptions.

              • 4. Re: org.infinispan.CacheException: Could not prepare - caused by XAException
                apatispelikan

                https://community.jboss.org/wiki/OptimalModjk12Configuration helped me to find a good workers.config:

                 

                worker.list=status,app

                worker.maintain=30

                 

                worker.status.type=status

                 

                worker.template.type=ajp13

                worker.template.socket_timeout=0

                worker.template.socket_connect_timeout=30

                worker.template.reply_timeout=660000

                worker.template.ping_timeout=1000

                worker.template.ping_mode=A

                worker.template.socket_timeout=10

                worker.template.connection_pool_timeout=600

                worker.template.connection_pool_size=30

                worker.template.connect_timeout=10000

                worker.template.retries=20

                 

                worker.app.type=lb

                worker.app.method=R

                worker.app.balance_workers=app01,app02

                worker.app.sticky_session=1

                 

                worker.app01.reference=worker.template

                worker.app01.host=server01.provider.at

                worker.app01.port=8009

                worker.app01.lbfactor=1

                worker.app01.distance=0

                 

                worker.app02.reference=worker.template

                worker.app02.host=server02.provider.at

                worker.app02.port=8009

                worker.app02.lbfactor=1

                worker.app02.distance=1

                1 of 1 people found this helpful