1 2 Previous Next 20 Replies Latest reply on Oct 4, 2011 2:44 PM by clebert.suconic

    Strange problem with Standalone 2.2.5 journal/pages...

    didka

      Hi!

      We have standalone 2.2.5 Hornetq installation on Ubuntu Server. One active, one backup , static connection between them. Shared store is SAN mounted on Ubuntu as ocfs2. NIO is used for connections, Journal is AIO.

      2 instances of the same application (servlet + ssb on clustered Jboss via JCA) send 1-2K messages to 1 paged queue. Both connection pools grow up to 50 connections each.

      Queue has MDB consumer (10 sessions) and also non exclusive divert to another queue (second queue paging policy is drop). There are no consumers on second queue yet.

      Consumers send messages with 300-700 msg/sec. MDB consumes them. And everything seems nice, but:

       

      In logs we have a lot of such messages:

       

      [Thread-3 (group:HornetQ-scheduled-threads-22262475)] 19:21:18,874 WARNING [org.hornetq.core.transaction.impl.ResourceManagerImpl]  transaction with xid XidImpl (1973652184 bq:55.102.48.48.48.48.48.49.58.57.53.100.52.58.52.101.55.98.49.100.101.55.58.52.99.49.49.100.52 formatID:131075 gtxid:49.45.55.102.48.48.48.48.48.49.58.57.53.100.52.58.52.101.55.98.49.100.101.55.58.52.99.49.49.56.55 timed out

       

      Sometimes such messages:

       

       

      [Thread-3 (group:HornetQ-scheduled-threads-22262475)] 19:21:18,875 SEVERE [org.hornetq.core.transaction.impl.ResourceManagerImpl]  failed to timeout transaction, xid:XidImpl (2021427911 bq:55.102.48.48.48.48.48.49.58.57.53.100.52.58.52.101.55.98.49.100.101.55.58.52.99.49.49.52.97 formatID:131075 gtxid:49.45.55.102.48.48.48.48.48.49.58.57.53.100.52.58.52.101.55.98.49.100.101.55.58.52.99.48.102.50.51

      java.lang.IllegalStateException: Transaction is in invalid state SUSPENDED

                at org.hornetq.core.transaction.impl.TransactionImpl.rollback(TransactionImpl.java:345)

                at org.hornetq.core.transaction.impl.ResourceManagerImpl$TxTimeoutHandler.run(ResourceManagerImpl.java:228)

                at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)

                at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)

                at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)

                at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)

                at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)

                at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)

                at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

                at java.lang.Thread.run(Thread.java:662)

       

       

      Sometimes such:

       

      [New I/O server worker #1-34] 19:30:53,650 WARNING [org.hornetq.core.protocol.core.impl.HornetQPacketHandler]  Reattach request from /192.168.168.197:53634 failed as there is no confirmationWindowSize configured, which may be ok for your system

       

      But confirmation window is configured in ConnectionFactory in hornetq-jms-xml (3Mb)

       

      And after several hours or 1 -2 days server stuck, all consumers and producers stuck also. No failover, restart doesn't help. But if I stop server, clear shared store (remove journal and pages) and start it again, - consumers and producers are able to reconnect and system starts to work again. 

       

      Could you help? What can be the problem?

        • 1. Re: Strange problem with Standalone 2.2.5 journal/pages...
          clebert.suconic

          You should configure Recovery to lookup for XIDs on the remote standalone.

           

          You should add the recovery configuration at the JBoss Application Server.

          • 2. Re: Strange problem with Standalone 2.2.5 journal/pages...
            didka

            I add transaction property to all Jboss instances working with Hornetq as it is described in documentation:

             

            <properties depends="arjuna" name="jta">

                    <property name="com.arjuna.ats.jta.recovery.XAResourceRecovery.JBMESSAGING1"

                              value="org.jboss.jms.server.recovery.MessagingXAResourceRecovery;java:/DefaultJMSProvider"/>

                    <property name="com.arjuna.ats.jta.supportSubtransactions" value="NO"/>

                    <property name="com.arjuna.ats.jta.jtaTMImplementation" value="com.arjuna.ats.internal.jta.transaction.arjunacore.TransactionManagerImple"/>

                    <property name="com.arjuna.ats.jta.jtaUTImplementation" value="com.arjuna.ats.internal.jta.transaction.arjunacore.UserTransactionImple"/>

            <property name="com.arjuna.ats.jta.recovery.XAResourceRecovery.HORNETQ1"                    value="org.hornetq.jms.server.recovery.HornetQXAResourceRecovery;org.hornetq.core.remoting.impl.netty.NettyConnectorFactory,user,xxxx,host=192.168.189.16,port=5445;org.hornetq.core.remoting.impl.netty.NettyConnectorFactory,user,xxxx,host=192.168.189.17,port=5445"/>

                  </properties>

             

             

            This warning still present:

             

            [Thread-4 (group:HornetQ-scheduled-threads-404150953)] 10:41:32,407 WARNING [org.hornetq.core.transaction.impl.ResourceManagerImpl]  transaction with xid XidImpl (538595391 bq:55.102.48.48.48.48.48.49.58.56.55.57.48.58.52.101.55.99.48.56.55.100.58.50.50.98.100.51 formatID:131075 gtxid:49.45.55.102.48.48.48.48.48.49.58.56.55.57.48.58.52.101.55.99.48.56.55.100.58.50.50.98.99.102 timed out

             

             

            And this error also:

             

            [Thread-0 (group:HornetQ-scheduled-threads-404150953)] 10:44:37,402 SEVERE [org.hornetq.core.transaction.impl.ResourceManagerImpl]  failed to timeout transaction, xid:XidImpl (954968496 bq:55.102.48.48.48.48.48.49.58.56.55.57.48.58.52.101.55.99.48.56.55.100.58.52.101.50.49.53 formatID:131075 gtxid:49.45.55.102.48.48.48.48.48.49.58.56.55.57.48.58.52.101.55.99.48.56.55.100.58.52.101.50.49.50

            java.lang.IllegalStateException: Transaction is in invalid state SUSPENDED

                      at org.hornetq.core.transaction.impl.TransactionImpl.rollback(TransactionImpl.java:345)

                      at org.hornetq.core.transaction.impl.ResourceManagerImpl$TxTimeoutHandler.run(ResourceManagerImpl.java:228)

                      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)

                      at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)

                      at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)

                      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)

                      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)

                      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)

                      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

                      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

                      at java.lang.Thread.run(Thread.java:662)

             

            Did I miss something?

            • 3. Re: Strange problem with Standalone 2.2.5 journal/pages...
              clebert.suconic

              You are using the recovery module from JBoss Messaging. You should use the one from HornetQ:

               

              org.hornetq.jms.server.recovery.HornetQXAResourceRecovery

               

               

              I will Andy Taylor to give some insight here.

              • 4. Re: Strange problem with Standalone 2.2.5 journal/pages...
                didka

                At first: Thanx for support!

                 

                Actually I have both properties for JBossMessaging and for HornetQ in transaction properties:

                 

                <properties depends="arjuna" name="jta">

                        <property name="com.arjuna.ats.jta.recovery.XAResourceRecovery.JBMESSAGING1"

                                  value="org.jboss.jms.server.recovery.MessagingXAResourceRecovery;java:/DefaultJMSProvider"/>

                        <property name="com.arjuna.ats.jta.supportSubtransactions" value="NO"/>

                        <property name="com.arjuna.ats.jta.jtaTMImplementation" value="com.arjuna.ats.internal.jta.transaction.arjunacore.TransactionManagerImple"/>

                        <property name="com.arjuna.ats.jta.jtaUTImplementation" value="com.arjuna.ats.internal.jta.transaction.arjunacore.UserTransactionImple"/>

                <property name="com.arjuna.ats.jta.recovery.XAResourceRecovery.HORNETQ1"                    value="org.hornetq.jms.server.recovery.HornetQXAResourceRecovery;org.hornetq.core.remoting.impl.netty.NettyConnectorFactory,user,xxxx,host=192.168.189.16,port=5445;org.hornetq.core.remoting.impl.netty.NettyConnectorFactory,user,xxxx,host=192.168.189.17,port=5445"/>

                      </properties>

                 

                This is because another (quite old) application on this JBoss uses JBoss Messaging. I assumed they will not interfere.

                Should I separate them or it is still possible to use both?

                Anyway I will wait for the insights by Andy Taylor.

                • 5. Re: Strange problem with Standalone 2.2.5 journal/pages...
                  ataylor

                  I dont think its an issue with recovery, looks to me as if you have long running clients using XA, maybe you just need to up the transaction timeout setting

                  • 6. Re: Strange problem with Standalone 2.2.5 journal/pages...
                    didka

                    All transactions are less than 2 second (90% even less than 1 second). Timeout is set to default 5 min.

                    Is my config provided above ( for  Jboss TS) OK? Is it OK to use both JBM and HornetQ properties?

                    I changed NIO to old IO and adjusted thread pool. System became more stable and works online about 3 days. But these warnings and errors with XA timeouts still apear in log.  

                    • 7. Re: Strange problem with Standalone 2.2.5 journal/pages...
                      clebert.suconic

                      It's ok if you use are using both products (although not tested).

                       

                      It seems you are using the wrong module with HornetQ.

                      • 8. Re: Strange problem with Standalone 2.2.5 journal/pages...
                        didka

                        Please advise, how to configure to use proper module?

                        • 9. Re: Strange problem with Standalone 2.2.5 journal/pages...
                          clebert.suconic

                          Never mind: I just realized you have this:

                           

                          <property name="com.arjuna.ats.jta.recovery.XAResourceRecovery.HORNETQ1"                    value="org.hornetq.jms.server.recovery.HornetQXAResourceRecovery;org.hornetq.core.remoting.impl.netty.NettyConnectorFactory,user,xxxx,host=192.168.189.16,port=5445;org.hornetq.core.remoting.impl.netty.NettyConnectorFactory,user,xxxx,host=192.168.189.17,port=5445"/>

                           

                           

                           

                          What version of the application server you have on the client, and what version of the Transaction Manager?

                           

                           

                          There are a few issues with fixed on the transaction manager when we were doing EAP.

                          • 10. Re: Strange problem with Standalone 2.2.5 journal/pages...
                            didka

                            We use Jboss 4.3 EAP CP04

                            and

                            Jboss 5.1.0 GA. (JTA version - tag:JBOSSTS_4_6_1_GA)

                            • 11. Re: Strange problem with Standalone 2.2.5 journal/pages...
                              clebert.suconic

                              Since you are on EAP, why don't you move to EAP 5.1

                               

                              HornetQ is not supported on EAP 4 anyways. There's a tech preview now and the next upcoming version is planned to full support.

                               

                               

                              There was a bug fixed on that version of the TM. There was bug that if the list of XID was returned in a different order than expected by the TM recovery would break.

                               

                              Anyway, after you got a transaction in a bad state like this, you will have to play with Heuristic methods and commit or rollback them manually on HornetQ and on the TM.

                               

                               

                              If this is happening everytime in your test, I suggest you upgrade to the latest TM and the latest EAP version.

                              1 of 1 people found this helpful
                              • 12. Re: Strange problem with Standalone 2.2.5 journal/pages...
                                didka

                                Thanx Clebert!

                                Problem is some legacy application should work on EAP4 (3d party application which is out of our control) and our HornetQ related module calls it via EJB. EJB calls between Jboss 4 and 5 are not working in our case due to class loading and serialization problems.

                                I will try to separate HornetQ related module to Jboss 5.1 and call application on EAP4 via WS.

                                • 13. Re: Strange problem with Standalone 2.2.5 journal/pages...
                                  clebert.suconic

                                  There are some issues on that TM. Maybe you apply the same patches on the Transaction Manager. Since you're dealing with EAP, you would have to talk to support about it.

                                  • 14. Re: Strange problem with Standalone 2.2.5 journal/pages...
                                    clebert.suconic

                                    Clebert Suconic wrote:

                                     

                                    There are some issues on that TM. Maybe you apply the same patches on the Transaction Manager. Since you're dealing with EAP, you would have to talk to support about it.

                                    I mean, if you want to keep the same path you were before.

                                    1 2 Previous Next