4 Replies Latest reply on Oct 28, 2010 2:13 PM by iandavies

    Channel disconnects running HornetQ embedded in Tomcat6 on Windows

    iandavies

      Hi Everyone,

       

      (using version 2.1.1.Final)

       

      I'm having a problem running HornetQ embedded within a Tomcat6 server on Windows.  I actually have a Grails application that, at BootStrap time, loads the HornetQ server as well as two client connections using the InVM connection factory.  I'm using programmatic configuration to create the HornetQ server and clients and the JMS interface for actually using the queues.  I'm also using Guice and have created a set of classes to all me to inject the necessary queuing classes consumer/queue/session where required.

       

      All I do is start the server and start the clients, and then leave Tomcat running.  Anywhere from 2 to 39 minutes later, the clients and server complain that they have not received data from the other party.  The client complains with:

       

       

      javax.jms.JMSException: HornetQException[errorCode=3 message=Did not receive data from server for org.hornetq.core.remoting.impl.invm.InVMConnection@b689e0]
      at org.hornetq.jms.client.HornetQConnection$JMSFailureListener.connectionFailed(HornetQConnection.java:603)
      at org.hornetq.core.client.impl.FailoverManagerImpl.callFailureListeners(FailoverManagerImpl.java:769)
      at org.hornetq.core.client.impl.FailoverManagerImpl.failoverOrReconnect(FailoverManagerImpl.java:731)
      at org.hornetq.core.client.impl.FailoverManagerImpl.handleConnectionFailure(FailoverManagerImpl.java:581)
      at org.hornetq.core.client.impl.FailoverManagerImpl.access$600(FailoverManagerImpl.java:73)
      at org.hornetq.core.client.impl.FailoverManagerImpl$DelegatingFailureListener.connectionFailed(FailoverManagerImpl.java:1151)
      at org.hornetq.core.protocol.core.impl.RemotingConnectionImpl.callFailureListeners(RemotingConnectionImpl.java:482)
      at org.hornetq.core.protocol.core.impl.RemotingConnectionImpl.fail(RemotingConnectionImpl.java:254)
      at org.hornetq.core.client.impl.FailoverManagerImpl$PingRunnable$1.run(FailoverManagerImpl.java:1209)
      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      at java.lang.Thread.run(Thread.java:619)

      javax.jms.JMSException: HornetQException[errorCode=3 message=Did not receive data from server for org.hornetq.core.remoting.impl.invm.InVMConnection@b689e0]

      at org.hornetq.jms.client.HornetQConnection$JMSFailureListener.connectionFailed(HornetQConnection.java:603)

      at org.hornetq.core.client.impl.FailoverManagerImpl.callFailureListeners(FailoverManagerImpl.java:769)

      at org.hornetq.core.client.impl.FailoverManagerImpl.failoverOrReconnect(FailoverManagerImpl.java:731)

      at org.hornetq.core.client.impl.FailoverManagerImpl.handleConnectionFailure(FailoverManagerImpl.java:581)

      at org.hornetq.core.client.impl.FailoverManagerImpl.access$600(FailoverManagerImpl.java:73)

      at org.hornetq.core.client.impl.FailoverManagerImpl$DelegatingFailureListener.connectionFailed(FailoverManagerImpl.java:1151)

      at org.hornetq.core.protocol.core.impl.RemotingConnectionImpl.callFailureListeners(RemotingConnectionImpl.java:482)

      at org.hornetq.core.protocol.core.impl.RemotingConnectionImpl.fail(RemotingConnectionImpl.java:254)

      at org.hornetq.core.client.impl.FailoverManagerImpl$PingRunnable$1.run(FailoverManagerImpl.java:1209)

      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

      at java.lang.Thread.run(Thread.java:619)

       

       

      And the server complains with:

       

       

      2010-10-25 12:42:22,475 [hornetq-failure-check-thread] WARN  impl.RemotingConnectionImpl  - Connection failure has been detected: Did not receive ping from invm:0. It is likely the client has exited or crashed without closing its connection, or the network between the server and client has failed. The connection will now be closed. [code=3]

      2010-10-25 12:42:22,475 [hornetq-failure-check-thread] WARN  impl.ServerSessionImpl  - Client connection failed, clearing up resources for session 0846f53d-e01f-11df-bfb9-080027910d23

      2010-10-25 12:42:23,336 [hornetq-failure-check-thread] WARN  impl.ServerSessionImpl  - Cleared up resources for session 0846f53d-e01f-11df-bfb9-080027910d23

      2010-10-25 12:42:23,336 [hornetq-failure-check-thread] WARN  core.ServerSessionPacketHandler  - Client connection failed, clearing up resources for session 0846f53d-e01f-11df-bfb9-080027910d23

      2010-10-25 12:42:23,336 [hornetq-failure-check-thread] WARN  core.ServerSessionPacketHandler  - Cleared up resources for session 0846f53d-e01f-11df-bfb9-080027910d23

       

       

      For the connection factory I have set the following:

       

       

      connectionFactory.setConnectionTTL(20000);

      connectionFactory.setClientFailureCheckPeriod(1000);

       

       

      So all i'm doing is starting the server, not sending any messages, and after a seemingly random amount of time (2, 3, 11, or even 39 minutes) the server kills the InVM connections (or the other way around - it's hard to tell).

       

      This only seems to happen on Windows, I cannot get this failure running in Tomcat on Linux.  Any pointers would be greatly appreciated.

       

      Many thanks,

      -ian.

       

      Message was edited by: Ian Davies: Added the fact that I'm using Guice.

        • 1. Re: Channel disconnects running HornetQ embedded in Tomcat6 on Windows
          clebert.suconic

          There's some issue with ClassPaths on your integration.

           

          We didn't provide an official Tomcat configuration yet. You are probably missing some context of the ClassLoaders.

           

          An easy fix now would be to use sockets (even though it's inVM), and look for a proper integration.

          1 of 1 people found this helpful
          • 2. Re: Channel disconnects running HornetQ embedded in Tomcat6 on Windows
            iandavies

            After much digging and a huge amount of help from Clebert, we think we got to the bottom of this problem.  I was running all of the above code on a VM and stripped everything (Grails, Tomcat, Guice) out until I had just a barebones HornetQ example running on my VM - and the problem still happened.  We think that it happened because the VM was running out of memory, or because something string in the VM was stalling the thread scheduling within the JVM itself.

             

            If you set connection-ttl on a client connection it will ping the server every check-failure-period milliseconds.  This is kicked off by the Java ScheduledExecutorService and, in my tests, it appeared this simply stopped scheduling Pings after a while (10 minutes to 2 hours).  This only ever happened on Windows.

             

            How did we fix it?  We didn't!  My VM still has problems with not scheduling the Ping and the connection still gets taken down unnecessarily.  I have made two changes to make this bearable.  Firstly I set connection-ttl to -1 for InVM connections.  Why would my invm-client need to ping the invm-server anyway?  And secondly, for Netty connections, i have configured the HornetQ reconnect and retry parameters so that, when the connection is killed, it immediately gets re-established seemlessly.  These are probably sensible settings anyway.

             

            Clebert confirm through several hours of testing on his Windows box with my exact same code that these problems do not happen, so either I had a flaky VM or somehow was running out of memory for the JVM.

             

            I hope that's helpful to someone.

            • 3. Re: Channel disconnects running HornetQ embedded in Tomcat6 on Windows
              timfox

              Your test program uses the following settings:

               

              connectionFactory.setConnectionTTL(2000);
              connectionFactory.setClientFailureCheckPeriod(100);

               

              With these values I would expect it to randomly close time out connections, due to indeterminacy in GC and thread scheduling (GC's can often take more than 2000 milliseconds).

               

              When you're running on a VM things get even worse since the hypervisor might introduce quite large pauses between scheduling your threads depending on the load on the box.

               

              Try increasing connection ttl to a much larger value, (or just use the defaults).

              1 of 1 people found this helpful
              • 4. Re: Channel disconnects running HornetQ embedded in Tomcat6 on Windows
                iandavies

                Thanks Tim, I tried with many different variations of TTL and check-period, including the defaults and always ran into the ping problem.  I was monitoring the GC and it wasn't happening at the time the sheduled PingRunnable was being kicked off.  I'm sure there's something dodgy with my VM that ultimately is causing this.