I am running a single HornetQ 2.2.5.Final message broker instance, with a backup, that services messages with group IDs. Every once in a while in production some network glitch causes the consumers to lose contact with the broker and things fall apart. I would be most grateful for recommendations that increase the robustness of this system. Let me describe what I am experiencing.
Occasionally, due to network flakiness the message broker will show messages like:
RemotingConnectionImpl: Connection failure has been detected: Did not receive data from (...)
impl.ServerSessionImpl: Client connection failed, clearing up resources for session (...)
impl.ServerSessionImpl: Cleared up resources for session (...)
At this point message consumers stop processing messages, and do not restart even when network connectivity is restored. Normally when message consumers reconnect they start picking up messages right away. The tricky part about grouped messages is that they are pinned to a specific consumer. It's as if the pinned consumer stopped picking up messages but HornetQ does not know it's disconnected, so the queue is plugged up until that consumer is restarted.
What kind of settings would help message broker clients recover from transient network problems? I am using blocking I/O with reconnect attempts set to 10. https://issues.jboss.org/browse/HORNETQ-1061 implies that grouped messages do not support redistribution.
Thanks for any suggestions.
You should probably try a newer version (2.3.0.CR1)... or a checkout of 2.2 from git