7 Replies Latest reply on Mar 6, 2014 4:19 PM by jbertram

    HornetQ timed out rollback leaves messages in DeliveringCount

    gentianhila

      We have an app that does some very dynamic routing of hornetq messages so it reads from one queue and writes to another - within a single transacted session. The idea is that if we cannot move to the destination queue we won't remove it from the original queue. It works fine until we do some testing with network latency - things start to fall apart.

      Because of the latency the transacted cannot be committed and cannot be rolled back.

       

      We see a couple of thousands messages stuck in deliverycount and never clear out of there even after 20 minutes. We have the call timeout for the connection factory left at default - which is 30 seconds. The app has about 10 instances serving 8 different destination queues so you would think that if the delivering was being done sequentially we would not see more than 80 files stuck in deliveringcount but we do see way more than that.

       

      These files get out of the deliveringCount only when the app shuts of or if we kill and restart sessions.

       

      It seems that HornetQ is not able to timeout the transaction (you would think call timeout would do the trick maybe) and remove those files from deliveringcount.

       

      Is there any property that we should look at?

       

      I was looking for a transaction timeout but it comes only for XA transactions which we do not use.

       

      By the way we use a messagelistener to receive the messages.

       

      Here is the error we get on the rollback timeout (there is a similar timeout error on the commit as well) :

       

      javax.jms.JMSException: HQ119014: Timed out waiting for response when sending packet 68

      at org.hornetq.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:379)

      at org.hornetq.core.client.impl.ClientSessionImpl.stop(ClientSessionImpl.java:726)

      at org.hornetq.core.client.impl.ClientSessionImpl.stop(ClientSessionImpl.java:712)

      at org.hornetq.core.client.impl.ClientSessionImpl.rollback(ClientSessionImpl.java:617)

      at org.hornetq.core.client.impl.ClientSessionImpl.rollback(ClientSessionImpl.java:597)

      at org.hornetq.core.client.impl.DelegatingSession.rollback(DelegatingSession.java:479)

      at org.hornetq.jms.client.HornetQSession.rollback(HornetQSession.java:250)

      at comms.HQRouter.MessageMover$MessageListenerImpl.onMessage(MessageMover.java:185)

      at org.hornetq.jms.client.JMSMessageListenerWrapper.onMessage(JMSMessageListenerWrapper.java:98)

      at org.hornetq.core.client.impl.ClientConsumerImpl.callOnMessage(ClientConsumerImpl.java:1085)

      at org.hornetq.core.client.impl.ClientConsumerImpl.access$400(ClientConsumerImpl.java:57)

      at org.hornetq.core.client.impl.ClientConsumerImpl$Runner.run(ClientConsumerImpl.java:1220)

      at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:106)

      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

      at java.lang.Thread.run(Thread.java:744)

      Caused by: HornetQException[errorType=CONNECTION_TIMEDOUT message=HQ119014: Timed out waiting for response when sending packet 68]

      ... 16 more

        • 1. Re: HornetQ timed out rollback leaves messages in DeliveringCount
          spolti

          Did you use memcached or other cache system?

          • 2. Re: Re: HornetQ timed out rollback leaves messages in DeliveringCount
            gentianhila

            No caching system at all is being used.

            • 3. Re: HornetQ timed out rollback leaves messages in DeliveringCount
              jbertram

              These files get out of the deliveringCount only when the app shuts of or if we kill and restart sessions.

              This isn't surprising since as long as the session is still active then you can keep consuming/producing messages as part of the transaction.  If the commit or rollback packet didn't make it from the client to the server and the session is still valid then the server won't abort the transaction.  The only reason I can think that the server would abort the transaction is if it detected the connection was dead at which point it would issue a WARN in the log and clean up all the server-side resources for the connection (e.g. consumers, sessions, etc.).

              • 4. Re: Re: HornetQ timed out rollback leaves messages in DeliveringCount
                gentianhila

                Yes I understand that but since we are dealing with transactions and server does not provide a transaction timeout - I would say that is pretty bad choice if it is done on purpose.

                • 5. Re: Re: HornetQ timed out rollback leaves messages in DeliveringCount
                  jbertram

                  As I understand it, the purpose of a transaction timeout (e.g. as use for XA tranasctions) is to clean up resources on the server (e.g. free memory, release resource locks, etc.) so that other clients and the server itself can continue working normally.  In your situation the transaction is tied directly to the session itself which is, in turn, tied to a connection.  If the connection becomes inactive (e.g. because of network trouble, client crash, etc.) without being properly closed then the server will detect that and clean up any resources including the server-side session resources and any transactions associated with those sessions.  This is functionally equivalent to a transaction timeout.  The server has no reason or ability to abort the transaction as long as it believes the session is active (which yours appear to be).  At this point it seems to me that introducing a transaction timeout here would be redundant and therefore unnecessary.

                   

                  In your situation a javax.jms.JMSException from rollback() or commit() on javax.jms.Session means that an "internal error" has occurred (i.e. the communication failure) which means your client is now in an ambiguous state - did the commit/rollback actually reach the server and the response never reached the client or did the commit/rollback not reach the server at all.  In any case, it's a good idea to scrap the relevant session/connection and recover in a way appropriate for your application which will, in-turn, free up the resources related to the transaction on the server.

                   

                  Perhaps, though, I am missing something.  Do you have a compelling argument for a transaction timeout in this context given what I just outlined?

                  • 6. Re: Re: HornetQ timed out rollback leaves messages in DeliveringCount
                    gentianhila

                    I would be ok if session timed out or even the connection would time out. But for some reason, when the network latency is gone these files hang in the deliveringCount and new files seem to go through just fine. The client has the session free to process other files so as far as client is concerned everything is fine. But unless we bounce the session server is never going to release them. And it's fine by me to bounce the session if that is the only way we can get past it. We are doing it actually right now.

                     

                     

                    However, having two thousand messages - which are byte messages and can be big files sometimes could lead to resource exhaustion on the server. 

                     

                    So while we can get around the issue, it never seemed to me that was the best way to deal with (for the simple reason that it was not the session or connection that timed out)  and it felt as if I was missing something. But if that is the way it has to be - I guess we will deal with it at the session level.

                    • 7. Re: Re: HornetQ timed out rollback leaves messages in DeliveringCount
                      jbertram

                      The client has the session free to process other files so as far as client is concerned everything is fine.

                      Except for the fact that it received a javax.jms.JMSException when it tried to commit/rollback, right?  You can't just catch and ignore the exception.  The proper way to deal with it in this context is to destroy the session (and re-create it if desired).

                       

                      But unless we bounce the session server is never going to release them.

                      That's essentially what I would expect in this scenario.

                       

                      However, having two thousand messages - which are byte messages and can be big files sometimes could lead to resource exhaustion on the server.

                      You can tune the server to deal with potential situations where the message count(s) increase beyond normal expectation (e.g. configuring paging, modifying the min-large-message-size, etc.).

                       

                      So while we can get around the issue, it never seemed to me that was the best way to deal with (for the simple reason that it was not the session or connection that timed out)  and it felt as if I was missing something.

                      I think you may be trying to conceptually divorce the transaction from the session, when the two belong exactly together.  The session is transacted so if you have a failure when you commit/rollback the session then the session is essentially compromised.  The session itself may not have timed out but a critical operation related to the semantics of the session did, and so you should take the proper action to deal with such a failure.