Recently I have been doing some testing of Stomp support in hornetq-2.1.2.final and have run into a severe message loss problem (50 - 80%). The test environment is very basic and uses the out-of-the box standalone HQ configuration with some minor changes.
- Server and client machines are running Windows XP SP3
- Installed JDK on the server is 1.6.0_21
- Latest stable HQ release (hornetq-2.1.2.final)
- Running the "Standalone" HQ demo server with the following changes:
- Habari HornetQ Client (Delphi / Stomp)
a) Added new Stomp Acceptor bound to all IPs and the default Stomp port
<param key="protocol" value="stomp" />
<param key="host" value="0.0.0.0" />
<param key="port" value="61613" />
b) Added createDurableQueue and deleteDurableQueue permissions for guest
The Habari HornetQ client comes a demo application that writes messages into a queue and another that consumes the messages. Using these demo appications for testing I have found that the HQ server under certain conditions fails to deliver messages to the target queue resulting in apparent message loss as viewed from the consuming application.
After some experimentation and debugging in Eclipse I have found that the message loss seems to be very timing sensitive and only occurs when the client producing the messages is executed on the same host as the HQ server or is connected by a fast network (e.g. gigabit LAN). Running the client over a slower link such as a WAN, for example, link I do not see any lost messages.
I have ruled out the Habari Stomp client and demo applications as the source of the problem by using Wireshark to trace the network traffic in and out of the HQ server. All inbound messages are properly formatted on the wire and delivered to the NIC on the HQ server. The consumer application also seems to be working correctly and is simply not receiving outbound messages from HQ.
My suspicion is that there is some sort of race condition or similar flaw in HQ that is causing sent messages to be dropped. The only workaround I have found thus far is to set <param key="direct-deliver" value="false"/> on the Stomp acceptor. Changing this setting seems to eliminate the message loss problem under all of our test scenarios thus far.
Has anyone seen this behavior before? I would be interested in hearing other people's experience using Stomp as well as ideas on how to debug the message flow through HQ. I have been experimenting with enabling debug logging, but have yet to determine which loggers shoould be enabled for debugging stomp issues.
Thanks in advance for your help!