HQ 2.4.4: Clients silently losing connection to server, lots of HQ224051 errors on server
noky Oct 20, 2014 12:49 PMWe're experiencing problems with a recent upgrade from HornetQ server 2.2.20 to 2.4.4 (upgrade happened yesterday). The reason for the upgrade was to address a problem whereby sometimes the HornetQ server would erroneously deliver a message to the wrong client (JMS message selectors were matching incorrectly). The selector bug seems to be fixed, but today we experienced some massive connection problems whereby JMS messages were silently not getting delivered to certain clients. It seemed like the clients lost contact with the server but were not detecting this and thus the reconnect could not happen. To remedy the problem, we had to restart the client applications and have downgraded HornetQ server back to 2.2.20.
The HornetQ server logs show tons of messages like these:
06:10:59,577 ERROR [org.hornetq.core.server] HQ224051: Failed to call notification listener: java.lang.IllegalStateException: Cannot find queue info for queue 8e600ed4-3e15-4bfa-af29-b580b528f2144c02d449-5763-11e4-bba5-ff6870e1e70e
at org.hornetq.core.postoffice.impl.PostOfficeImpl.onNotification(PostOfficeImpl.java:292) [hornetq-server.jar:]
at org.hornetq.core.server.management.impl.ManagementServiceImpl.sendNotification(ManagementServiceImpl.java:682) [hornetq-server.jar:]
at org.hornetq.core.postoffice.impl.PostOfficeImpl.removeBinding(PostOfficeImpl.java:543) [hornetq-server.jar:]
at org.hornetq.core.server.cluster.impl.ClusterConnectionImpl$MessageFlowRecordImpl.removeBinding(ClusterConnectionImpl.java:1395) [hornetq-server.jar:]
at org.hornetq.core.server.cluster.impl.ClusterConnectionImpl$MessageFlowRecordImpl.doBindingRemoved(ClusterConnectionImpl.java:1383) [hornetq-server.jar:]
at org.hornetq.core.server.cluster.impl.ClusterConnectionImpl$MessageFlowRecordImpl.handleNotificationMessage(ClusterConnectionImpl.java:1157) [hornetq-server.jar:]
at org.hornetq.core.server.cluster.impl.ClusterConnectionImpl$MessageFlowRecordImpl.onMessage(ClusterConnectionImpl.java:1131) [hornetq-server.jar:]
at org.hornetq.core.client.impl.ClientConsumerImpl.callOnMessage(ClientConsumerImpl.java:1116) [hornetq-core-client.jar:]
at org.hornetq.core.client.impl.ClientConsumerImpl.access$500(ClientConsumerImpl.java:56) [hornetq-core-client.jar:]
at org.hornetq.core.client.impl.ClientConsumerImpl$Runner.run(ClientConsumerImpl.java:1251) [hornetq-core-client.jar:]
at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:104) [hornetq-core-client.jar:]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [rt.jar:1.7.0_51]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [rt.jar:1.7.0_51]
at java.lang.Thread.run(Thread.java:744) [rt.jar:1.7.0_51]
Any ideas here? This seems like a serious problem.
NOTE: Our client applications were still using the HornetQ 2.2.20 libraries to connect to the server. It was not possible to update the client application first due to the fact that the HornetQ protocol does not seem to be downward compatible. It is also a fairly tedious process to update the client software (literally hundreds of individual applications).
Also, we run the HQ server in stand-alone and clustered mode (2 server cluster)
Thanks for your help,
Mike