2 Replies Latest reply on Jan 30, 2012 10:54 AM by Matthew Robson

    HornetQ Live Backup Failover Under Load

    Matthew Robson Newbie

      Hey,

       

      I am using HornetQ 2.2.5 with JBoss AS 6.1 in a Live Backup setup with 2 JMS queue's.  I have a cluster of app servers with MDBs consuming messages an I am using discovery between all the servers.

       

      My app servers connect fine, they process from the queues fine.  Live Backup failover works with no load, everything is pretty seamless.

       

      The scenario I am trying to run is the following:

       

      App servers are down, but the front end is still accepting requests and putting them onto the Queue.  I queue up 100,000 messages and then I start my app servers backup.  They immediatly start processing the queue as expected.

       

      The first question I have is, when the app servers come back up, I would say each app server takes around 30,000 messages into the delivering state.  Is there any way to limit how many messages each server "grabs" when it starts up so that they can be spread over more servers?

       

      The second piece of the scenario is to do HornetQ failover after 50,000 out of 100,000 messages have been processed (so there are still 50,000 messages in the "delivering state").

       

      I shutdown the live server and everything fail over to the backup.  All the connections on my app server disconnect and then reconnect to the backup server, but I get thousands of errors on the backup server.  Is this a known issue (i couldn't find anything) or is it something I can ignore or is there something I can do to correct it?  From what I can see, even with all the errors, almost all of my requests a still read off the queue and processed.

       

       

      2012-01-28 21:55:41,430 INFO  [org.hornetq.jms.server.impl.JMSServerManagerImpl] (Thread-12) Running cached command for createConnectionFactory for NettyConnectionFactory

      2012-01-28 21:55:41,489 INFO  [org.hornetq.jms.server.impl.JMSServerManagerImpl] (Thread-12) Running cached command for createQueue for RequestQueue

      2012-01-28 21:55:41,489 INFO  [org.hornetq.core.server.impl.HornetQServerImpl] (Thread-12) trying to deploy queue jms.queue.RequestQueue

      2012-01-28 21:55:41,515 INFO  [org.hornetq.jms.server.impl.JMSServerManagerImpl] (Thread-12) Running cached command for createQueue for PendingQueue

      2012-01-28 21:55:41,515 INFO  [org.hornetq.core.server.impl.HornetQServerImpl] (Thread-12) trying to deploy queue jms.queue.PendingQueue

      2012-01-28 21:55:41,519 INFO  [org.hornetq.jms.server.impl.JMSServerManagerImpl] (Thread-12) Running cached command for createQueue for ExpiryQueue

      2012-01-28 21:55:41,519 INFO  [org.hornetq.core.server.impl.HornetQServerImpl] (Thread-12) trying to deploy queue jms.queue.ExpiryQueue

      2012-01-28 21:55:41,523 INFO  [org.hornetq.jms.server.impl.JMSServerManagerImpl] (Thread-12) Running cached command for createQueue for DLQ

      2012-01-28 21:55:41,524 INFO  [org.hornetq.core.server.impl.HornetQServerImpl] (Thread-12) trying to deploy queue jms.queue.DLQ

      2012-01-28 21:55:41,574 INFO  [org.hornetq.core.remoting.impl.netty.NettyAcceptor] (Thread-12) Started Netty Acceptor version 3.2.3.Final-r${buildNumber} 0.0.0.0:5445 for CORE protocol

      2012-01-28 21:55:41,580 WARN  [org.hornetq.core.client.impl.ClientSessionFactoryImpl] (Thread-2 (group:HornetQ-client-global-threads-1657743819)) Failed to connect to server.

      2012-01-28 21:55:41,610 INFO  [org.hornetq.core.server.impl.HornetQServerImpl] (Thread-12) Backup Server is now live

       

       

      2012-01-28 21:55:43,328 ERROR [org.hornetq.core.protocol.core.ServerSessionPacketHandler] (New I/O server worker #1-3) Caught unexpected exception: java.lang.IllegalStateException: 1070171253 Could not find reference on consumerID=0, messageId = 1185162 queue = jms.queue.RequestQueue closed = false

              at org.hornetq.core.server.impl.ServerConsumerImpl.acknowledge(ServerConsumerImpl.java:560) [:6.1.0.Final]

              at org.hornetq.core.server.impl.ServerSessionImpl.acknowledge(ServerSessionImpl.java:574) [:6.1.0.Final]

              at org.hornetq.core.protocol.core.ServerSessionPacketHandler.handlePacket(ServerSessionPacketHandler.java:269) [:6.1.0.Final]

              at org.hornetq.core.protocol.core.impl.ChannelImpl.handlePacket(ChannelImpl.java:474) [:6.1.0.Final]

              at org.hornetq.core.protocol.core.impl.RemotingConnectionImpl.doBufferReceived(RemotingConnectionImpl.java:496) [:6.1.0.Final]

              at org.hornetq.core.protocol.core.impl.RemotingConnectionImpl.bufferReceived(RemotingConnectionImpl.java:457) [:6.1.0.Final]

              at org.hornetq.core.remoting.server.impl.RemotingServiceImpl$DelegatingBufferHandler.bufferReceived(RemotingServiceImpl.java:459) [:6.1.0.Final]

              at org.hornetq.core.remoting.impl.netty.HornetQChannelHandler.messageReceived(HornetQChannelHandler.java:73) [:6.1.0.Final]

              at org.jboss.netty.channel.SimpleChannelHandler.handleUpstream(SimpleChannelHandler.java:100) [:6.1.0.Final]

              at org.jboss.netty.channel.StaticChannelPipeline.sendUpstream(StaticChannelPipeline.java:362) [:6.1.0.Final]

              at org.jboss.netty.channel.StaticChannelPipeline$StaticChannelHandlerContext.sendUpstream(StaticChannelPipeline.java:514) [:6.1.0.Final]

              at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:287) [:6.1.0.Final]

              at org.hornetq.core.remoting.impl.netty.HornetQFrameDecoder2.decode(HornetQFrameDecoder2.java:169) [:6.1.0.Final]

              at org.hornetq.core.remoting.impl.netty.HornetQFrameDecoder2.messageReceived(HornetQFrameDecoder2.java:134) [:6.1.0.Final]

              at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80) [:6.1.0.Final]

              at org.jboss.netty.channel.StaticChannelPipeline.sendUpstream(StaticChannelPipeline.java:362) [:6.1.0.Final]

              at org.jboss.netty.channel.StaticChannelPipeline.sendUpstream(StaticChannelPipeline.java:357) [:6.1.0.Final]

              at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:274) [:6.1.0.Final]

              at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:261) [:6.1.0.Final]

              at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:349) [:6.1.0.Final]

              at org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:281) [:6.1.0.Final]

              at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:201) [:6.1.0.Final]

              at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) [:6.1.0.Final]

              at org.jboss.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:46) [:6.1.0.Final]

              at org.jboss.netty.util.VirtualExecutorService$ChildExecutorRunnable.run(VirtualExecutorService.java:181) [:6.1.0.Final]

              at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) [:1.6.0_29]

              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) [:1.6.0_29]

              at java.lang.Thread.run(Thread.java:662) [:1.6.0_29]

       

       

      2012-01-28 21:55:43,942 ERROR [org.hornetq.core.protocol.core.ServerSessionPacketHandler] (New I/O server worker #1-1) Caught unexpected exception: java.lang.IllegalStateException: 825663073 Could not find reference on consumerID=0, messageId = 1543819 queue = jms.queue.PendingQueue closed = false

              at org.hornetq.core.server.impl.ServerConsumerImpl.acknowledge(ServerConsumerImpl.java:560) [:6.1.0.Final]

              at org.hornetq.core.server.impl.ServerSessionImpl.acknowledge(ServerSessionImpl.java:574) [:6.1.0.Final]

              at org.hornetq.core.protocol.core.ServerSessionPacketHandler.handlePacket(ServerSessionPacketHandler.java:269) [:6.1.0.Final]

              at org.hornetq.core.protocol.core.impl.ChannelImpl.handlePacket(ChannelImpl.java:474) [:6.1.0.Final]

              at org.hornetq.core.protocol.core.impl.RemotingConnectionImpl.doBufferReceived(RemotingConnectionImpl.java:496) [:6.1.0.Final]

              at org.hornetq.core.protocol.core.impl.RemotingConnectionImpl.bufferReceived(RemotingConnectionImpl.java:457) [:6.1.0.Final]

              at org.hornetq.core.remoting.server.impl.RemotingServiceImpl$DelegatingBufferHandler.bufferReceived(RemotingServiceImpl.java:459) [:6.1.0.Final]

              at org.hornetq.core.remoting.impl.netty.HornetQChannelHandler.messageReceived(HornetQChannelHandler.java:73) [:6.1.0.Final]

              at org.jboss.netty.channel.SimpleChannelHandler.handleUpstream(SimpleChannelHandler.java:100) [:6.1.0.Final]

              at org.jboss.netty.channel.StaticChannelPipeline.sendUpstream(StaticChannelPipeline.java:362) [:6.1.0.Final]

              at org.jboss.netty.channel.StaticChannelPipeline$StaticChannelHandlerContext.sendUpstream(StaticChannelPipeline.java:514) [:6.1.0.Final]

              at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:287) [:6.1.0.Final]

              at org.hornetq.core.remoting.impl.netty.HornetQFrameDecoder2.decode(HornetQFrameDecoder2.java:169) [:6.1.0.Final]

              at org.hornetq.core.remoting.impl.netty.HornetQFrameDecoder2.messageReceived(HornetQFrameDecoder2.java:134) [:6.1.0.Final]

              at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80) [:6.1.0.Final]

              at org.jboss.netty.channel.StaticChannelPipeline.sendUpstream(StaticChannelPipeline.java:362) [:6.1.0.Final]

              at org.jboss.netty.channel.StaticChannelPipeline.sendUpstream(StaticChannelPipeline.java:357) [:6.1.0.Final]

              at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:274) [:6.1.0.Final]

              at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:261) [:6.1.0.Final]

              at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:349) [:6.1.0.Final]

              at org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:281) [:6.1.0.Final]

              at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:201) [:6.1.0.Final]

              at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) [:6.1.0.Final]

              at org.jboss.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:46) [:6.1.0.Final]

              at org.jboss.netty.util.VirtualExecutorService$ChildExecutorRunnable.run(VirtualExecutorService.java:181) [:6.1.0.Final]

              at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) [:1.6.0_29]

              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) [:1.6.0_29]

              at java.lang.Thread.run(Thread.java:662) [:1.6.0_29]

       

      Thanks

        • 1. Re: HornetQ Live Backup Failover Under Load
          Clebert Suconic Master

          When the client moves to the backup, another consumer (if you have more than one consumer) may take the message before the original consumer did, what may cause invalid ACKs on the target server, what will throw an exception, and the exception should at that point retry the operation.

           

           

          There was a bug fixed on 2.2.10 though. If the message wasn't found under these circunstances, it would remove all the messages from the server's side and you would lose messages on that circunstance.

          • 2. Re: HornetQ Live Backup Failover Under Load
            Matthew Robson Newbie

            Yes, I had 3 consumers in this test.

             

            I am seeing these exceptions on the HornetQ server.  I see what you're saying... So basically these exceptions are "expected" as part of the failover and the messages in the queue they're pertaining to are retried.

             

            So you're saying in 2.2.10, if this happens during failover the messages will be lost? Its not 100% clear under what circumstances the messages would be removed from the queue...

             

            Also, for the other question, is there anyway to limit the number of messages each consumer picks up from the JMS queue? Right now it seems as if each consumer grabs 30,000 messages off the queue and puts them into the delivering state as soon as it starts up...

             

            Thanks again for the assistance!