1 Reply Latest reply on Nov 23, 2009 11:45 AM by Tim Fox

    silent consumer creation failures during failover

    Jeff Mesnil Master

      one thing I noticed while working on the test for symmetric cluster with backup: when 1 node is stopped and restarted, there are logs about createConsumer failure:

      GRAVE: Failed to create consumer
      HornetQException[errorCode=100 message=Queue notif.a3603d77-d845-11de-8789-001c42000009 does not exist]
       at org.hornetq.core.server.impl.ServerSessionImpl.handleCreateConsumer(ServerSessionImpl.java:390)
       at org.hornetq.core.server.impl.ServerSessionPacketHandler.handlePacket(ServerSessionPacketHandler.java:111)
       at org.hornetq.core.remoting.impl.ChannelImpl.handlePacket(ChannelImpl.java:460)
       at org.hornetq.core.remoting.impl.RemotingConnectionImpl.doBufferReceived(RemotingConnectionImpl.java:382)
       at org.hornetq.core.remoting.impl.RemotingConnectionImpl.access$0(RemotingConnectionImpl.java:352)
       at org.hornetq.core.remoting.impl.RemotingConnectionImpl$1.run(RemotingConnectionImpl.java:344)
       at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:96)
       at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
       at java.lang.Thread.run(Thread.java:637)

      This corresponds to the notification consumer created by the core Bridges (from ClusterConnection.createNewRecord).
      The notif.XXX queue is non-durable and won't be present when the node is restarted.

      The core Bridge will attempt to reconnect to the server once it is restarted.
      Since it can not reattach to the session, it will recreate the session and the consumer. It does this by sending packet directly on the RemotingConnection.
      However, on the node, the notif.XXX queue does not exist and it will fail to create the consumer (hence the exception above). But the bridge will never get the error report back!

      AIUI, this means that the bridge will never receive notifications from the node as there is no queue or consumer corresponding to the bridge.

      I think the issue is more general: e.g. if a client uses a message handler on a non-durable queue and the server is restarted, when the client session will reconnect, the consumer will not be correctly recreated (the queue is no longer there) but no error is reported to the client.
      I'll write a test to check it is indeed the case