We discussed this issue on irc this morning but as you suggested here is a forum post with all the details I could gather. Hopefully this is all due to the ping issue that has now been fixed in trunk, but this might be useful to review and double check.
In a nutshell - we have 16 client applications which are publishing messages to and receiving messages from a single topic (over JMS). The jbm-jms.xml config file is here:
(these forums really could do with an "attach file" feature).
The other configuration files aren't really interesting - no persistence, no security, some JMX and standard netty transport. We are running JBM 2.0.0.beta3. At some point early this morning one of the client machines crashed, we can't tell how but the machine required a restart so I think we can assume the JVM would have shutdown abruptly. I don't have logs for the exact time at which this happened as they rolled over but from the earliest I can find the jbm log file was flooded with these log messages (10 a second or more):
09:07:56,796 WARNING [org.jboss.messaging.core.remoting.impl.RemotingConnectionImpl] Connection failure has been detected Did not receive ping from client. It is likely a client has exited or crashed without closing its connection, or the network between the server and client has failed. The connection will now be closed.:3
This went on for hours, and these messages were interspersed with two exception stacks, this one occurred the most:
09:07:57,110 SEVERE [org.jboss.messaging.core.server.impl.ServerSessionPacketHandler] Caught unexpected exception java.lang.NullPointerException at org.jboss.messaging.core.server.impl.ServerSessionImpl.handleCloseConsumer(ServerSessionImpl.java:921)
and there were a few of these:
09:07:59,789 SEVERE [org.jboss.messaging.core.server.impl.MessagingServerPacketHandler] Failed to reattach session java.lang.IllegalStateException: 1571589352 Can't find packet to clear: last received command id 321535 first stored command id 320494 at org.jboss.messaging.core.remoting.impl.ChannelImpl.clearUpTo(ChannelImpl.java:741) at org.jboss.messaging.core.remoting.impl.ChannelImpl.replayCommands(ChannelImpl.java:503)
I have attached some fragments of the log file with the full stack traces here:
I restarted the JBM server and then expected the clients onException() handler to be triggered so they would reconnect (this is usually what happens if I restart the JBM server and after around 30s all the clients are reconnected and sending/receiving messages as normal). However the clients did not seem to pick up the fact that the server was back. I left one of the clients for around 5 hours and it still hadn't reconnected. I then tried to shutdown the clients and they refused to exit gracefully. I ran jstack on the clients and this reported a deadlock which I think was preventing the shutdown. I have put the jstack output from two of our clients which show the deadlock occuring in the JBM client code here:
The deadlocks seem similar but different. Maybe this part of our experience indicates a new bug?
Hope this is useful. If you need any more information just let me know.