topic subscribers gone silent
noky Oct 11, 2010 2:56 PMI've been running HornetQ 2.1.1 in a test environment for some time now. Today, I saw a very strange (and troubling) problem whereby all the subscribers for a particular topic suddenly all stopped receiving data. The publishers had no problems publishing. Restarting the subscribers had no effect: they connected to the server and started listening on the topic, but no data came in. The subscribers never reported any exceptions via the JMS ExceptionListener facility. However, other topics were working fine and clients received data normally.
About 10 minutes after the failure, the HornetQ logs showed a lot of the following:
[hornetq-failure-check-thread] 07:33:00,982 WARNING [org.hornetq.core.protocol.core.impl.RemotingConnectionImpl] Connection failure has been detected: Did not receive ping from /aaa.bbb.ccc.ddd:59984. It is likely the client has exited or crashed without closing its connection, or the network between the server and client has failed. The connection will now be closed. [code=3]
[hornetq-failure-check-thread] 07:33:00,982 WARNING [org.hornetq.core.server.impl.ServerSessionImpl] Client connection failed, clearing up resources for session db622b6c-bd63-11df-bb01-0030488a33d0
[hornetq-failure-check-thread] 07:33:00,982 WARNING [org.hornetq.core.server.impl.ServerSessionImpl] Cleared up resources for session db622b6c-bd63-11df-bb01-0030488a33d0
The only way I could fix the problem was to restart the HornetQ server. After startup, the logs showed over 49,000 instances of the following message:
[Thread-19 (group:HornetQ-server-threads9420495-29769356)] 11:45:34,110 WARNING
[org.hornetq.core.postoffice.impl.PostOfficeImpl] Duplicate message detected -
transaction will be rejected
I'm worried this type of problem will happen again once we put HornetQ into production. This seems like a major blocker. I'd like to find out exactly what happened before official deployment time comes.
Any ideas? Anything I can do to help provide more clues? I have JMX enabled, are there any properties I can examine for clues if this happens again?