1 2 3 Previous Next 31 Replies Latest reply on Oct 6, 2010 9:08 AM by awelynant

    Severe message loss using Stomp with "direct-deliver" enabled

    david.taylor

      Hi,

       

      Recently I have been doing some testing of Stomp support in hornetq-2.1.2.final and have run into a severe message loss problem (50 - 80%). The test environment is very basic and uses the out-of-the box standalone HQ configuration with some minor changes.

       

      Test Environment

       

      - Server and client machines are running Windows XP SP3
      - Installed JDK on the server is 1.6.0_21
      - Latest stable HQ release (hornetq-2.1.2.final)
      - Running the "Standalone" HQ demo server with the following changes:

      - Habari HornetQ Client (Delphi / Stomp)

       

      a) Added new Stomp Acceptor bound to all IPs and the default Stomp port

         <acceptor name="stomp">
             <factory-class>org.hornetq.core.remoting.impl.netty.NettyAcceptorFactory</factory-class>
             <param key="protocol" value="stomp" />
             <param key="host" value="0.0.0.0" />
             <param key="port" value="61613" />
          </acceptor>

       

      b) Added createDurableQueue and deleteDurableQueue permissions for guest

       

      The Habari HornetQ client comes a demo application that writes messages into a queue and another that consumes the messages. Using these demo appications for testing I have found that the HQ server under certain conditions fails to deliver messages to the target queue resulting in apparent message loss as viewed from the consuming application.

       

      After some experimentation and debugging in Eclipse I have found that the message loss seems to be very timing sensitive and only occurs when the client producing the messages is executed on the same host as the HQ server or is connected by a fast network (e.g. gigabit LAN). Running the client over a slower link such as a WAN, for example, link I do not see any lost messages.

       

      I have ruled out the Habari Stomp client and demo applications as the source of the problem by using Wireshark to trace the network traffic in and out of the HQ server. All inbound messages are properly formatted on the wire and delivered to the NIC on the HQ server. The consumer application also seems to be working correctly and is simply not receiving outbound messages from HQ.

       

      My suspicion is that there is some sort of race condition or similar flaw in HQ that is causing sent messages to be dropped. The only workaround I have found thus far is to set <param key="direct-deliver" value="false"/> on the Stomp acceptor. Changing this setting seems to eliminate the message loss problem under all of our test scenarios thus far.

       

      Has anyone seen this behavior before? I would be interested in hearing other people's experience using Stomp as well as ideas on how to debug the message flow through HQ. I have been experimenting with enabling debug logging, but have yet to determine which loggers shoould be enabled for debugging stomp issues.

       

      Thanks in advance for your help!

       

      David

        • 1. Re: Severe message loss using Stomp with "direct-deliver" enabled
          timfox

          Have you tried with TRUNK?

          • 2. Re: Severe message loss using Stomp with "direct-deliver" enabled
            david.taylor

            Yes, I just tested with a build using the latest TRUNK to no effect. I also discovered that disabling direct-deliver does not always prevent message loss. After installing the new jars I did not observe any problems during my initial tests. I then deleted the journal and binding files and immediately was able to reproduce the message loss with direct-deliver enabled or disabled. Adding a 500ms delay between message sends also has no apprent effect on the problem.

             

            What I can say in general at this point is that the issue seems to be very timing sensitive and dependent on network transport speed. Sends performed over slower connections such as a 6 MBps VPN seem to work fine. I also am finding that deleting the journal and binding files tends to elicit good behavior for a period of time. Perhaps the time spent initializing the files affects the timing.

            • 3. Re: Severe message loss using Stomp with "direct-deliver" enabled
              timfox

              Can you provide a simple test program thatb demonstrates the issue? I am curious how you are measuring message "loss"

              • 4. Re: Severe message loss using Stomp with "direct-deliver" enabled
                david.taylor

                Not sure if this is related, but another problem just popped up while doing some more testing. Using the same setup as before I sent a large volume of messages into the queue with the consumer running at the same time. After a few minutes a shower of repeating errors showed up on the console, but nothing was recorded in the HQ log. I did not catch the initial part of the trace, so I was only able to capture the sequence of messages repeated again and again. Here is a snippet to show the pattern:

                 

                        at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:100)
                        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
                        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
                        at java.lang.Thread.run(Unknown Source)java.lang.NullPointerException
                        at org.hornetq.core.protocol.stomp.StompSession.sendMessage(StompSession.java:89)
                        at org.hornetq.core.server.impl.ServerConsumerImpl.deliverStandardMessage(ServerConsumerImpl.java:644)
                        at org.hornetq.core.server.impl.ServerConsumerImpl.handle(ServerConsumerImpl.java:253)
                        at org.hornetq.core.server.impl.QueueImpl.handle(QueueImpl.java:1451)
                        at org.hornetq.core.server.impl.QueueImpl.deliver(QueueImpl.java:1142)
                        at org.hornetq.core.server.impl.QueueImpl.access$800(QueueImpl.java:69)
                        at org.hornetq.core.server.impl.QueueImpl$DeliverRunner.run(QueueImpl.java:1667)
                        at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:100)
                        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
                        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
                        at java.lang.Thread.run(Unknown Source)java.lang.NullPointerException
                        at org.hornetq.core.protocol.stomp.StompSession.sendMessage(StompSession.java:89)
                        at org.hornetq.core.server.impl.ServerConsumerImpl.deliverStandardMessage(ServerConsumerImpl.java:644)
                        at org.hornetq.core.server.impl.ServerConsumerImpl.handle(ServerConsumerImpl.java:253)
                        at org.hornetq.core.server.impl.QueueImpl.handle(QueueImpl.java:1451)
                        at org.hornetq.core.server.impl.QueueImpl.deliver(QueueImpl.java:1142)
                        at org.hornetq.core.server.impl.QueueImpl.access$800(QueueImpl.java:69)
                        at org.hornetq.core.server.impl.QueueImpl$DeliverRunner.run(QueueImpl.java:1667)
                        at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:100)
                        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
                        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
                        at java.lang.Thread.run(Unknown Source)java.lang.NullPointerException
                        at org.hornetq.core.protocol.stomp.StompSession.sendMessage(StompSession.java:89)
                        at org.hornetq.core.server.impl.ServerConsumerImpl.deliverStandardMessage(ServerConsumerImpl.java:644)
                        at org.hornetq.core.server.impl.ServerConsumerImpl.handle(ServerConsumerImpl.java:253)
                        at org.hornetq.core.server.impl.QueueImpl.handle(QueueImpl.java:1451)
                        at org.hornetq.core.server.impl.QueueImpl.deliver(QueueImpl.java:1142)
                        at org.hornetq.core.server.impl.QueueImpl.access$800(QueueImpl.java:69)
                        at org.hornetq.core.server.impl.QueueImpl$DeliverRunner.run(QueueImpl.java:1667)
                        at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:100)

                       ....

                • 5. Re: Severe message loss using Stomp with "direct-deliver" enabled
                  david.taylor

                  My definition of message loss is straightforward:

                   

                  1) Start a "consumer" to listen on a given queue

                  2) Start a "producer" that sends a batch of messages to the queue

                  3) Observe that not all of the messages sent are received by the consumer

                   

                  The observation in 3) is based on watching the consumer console output and also watching data move over the wire in Wireshark packet traces. The same observations apply to the producer application.

                   

                  Regarding a test application, the Stomp "ProducerTool" and "ConsumerTool" applications I am using are small demo applications included with the Habari HornetQ Client. You can download the demo zip archive that includes compiled versions of the demo applications (see: /demo/producertool and /demo/consumertool folders).

                   

                  http://www.habarisoft.net/download/HabariHornetQ-demo.zip

                   

                  The applications take a variety of command line parameters, but the problem appears with default settings. The only change I made was to use an alternate server URL:

                   

                  e.g. ProducerTool.exe --URL=stomp://192.168.1.123:61613

                   

                  Please let me know if you require any additional information.

                   

                  Regards,

                  David

                  • 6. Re: Severe message loss using Stomp with "direct-deliver" enabled
                    timfox

                    Unfortunately I do not have access to a Windows machine in order to run the Habari client.

                     

                    If this is really a problem with the server, it should be replicable using one of the other STOMP clients that can be run on Windows. If you can provide some source code for one of those that replicates this issue, I'll take a look.

                    • 7. Re: Severe message loss using Stomp with "direct-deliver" enabled
                      timfox

                      David Taylor wrote:


                      My suspicion is that there is some sort of race condition or similar flaw in HQ that is causing sent messages to be dropped. The only workaround I have found thus far is to set <param key="direct-deliver" value="false"/> on the Stomp acceptor. Changing this setting seems to eliminate the message loss problem under all of our test scenarios thus far.

                      That's very odd.

                       

                      When using STOMP, this parameter is actually ignored, so it's not possible that it could make a difference.

                       

                      Direct deliver is hardcoded to true when using STOMP, see StompProtocolManager.onSend():

                       

                      stompSession.getSession().send(message, true);

                      • 8. Re: Severe message loss using Stomp with "direct-deliver" enabled
                        mjustin

                        Testing with 2.1.2.Final ona a Core 2 Duo with Vista I receive this stack trace when sending and receiving messages using the Delphi Stomp clients - I will do further tests and come back later.

                         

                        Update: it looks like this exception occurs when the consumer disconnects before the server has sent all pending messages.

                         

                        ********************************************************************************

                        ***

                        "java  -XX:+UseParallelGC  -XX:+AggressiveOpts -XX:+UseFastAccessorMethods -Xms5

                        12M -Xmx1024M -Dhornetq.config.dir=..\config\stand-alone\non-clustered  -Djava.u

                        til.logging.config.file=..\config\stand-alone\non-clustered\logging.properties -

                        Djava.library.path=. -classpath ..\config\stand-alone\non-clustered;..\schemas\;

                        C:\Java\hornetq-2.1.2.Final\lib\hornetq-bootstrap.jar;C:\Java\hornetq-2.1.2.Fina

                        l\lib\hornetq-core-client-java5.jar;C:\Java\hornetq-2.1.2.Final\lib\hornetq-core

                        -client.jar;C:\Java\hornetq-2.1.2.Final\lib\hornetq-core.jar;C:\Java\hornetq-2.1

                        .2.Final\lib\hornetq-jboss-as-integration.jar;C:\Java\hornetq-2.1.2.Final\lib\ho

                        rnetq-jms-client-java5.jar;C:\Java\hornetq-2.1.2.Final\lib\hornetq-jms-client.ja

                        r;C:\Java\hornetq-2.1.2.Final\lib\hornetq-jms.jar;C:\Java\hornetq-2.1.2.Final\li

                        b\hornetq-logging.jar;C:\Java\hornetq-2.1.2.Final\lib\hornetq-twitter-integratio

                        n.jar;C:\Java\hornetq-2.1.2.Final\lib\jboss-jms-api.jar;C:\Java\hornetq-2.1.2.Fi

                        nal\lib\jboss-mc.jar;C:\Java\hornetq-2.1.2.Final\lib\jnp-client.jar;C:\Java\horn

                        etq-2.1.2.Final\lib\jnpserver.jar;C:\Java\hornetq-2.1.2.Final\lib\netty.jar;C:\J

                        ava\hornetq-2.1.2.Final\lib\twitter4j-core.jar org.hornetq.integration.bootstrap

                        .HornetQBootstrapServer hornetq-beans.xml"

                        ********************************************************************************

                        ***

                        [main] 10:42:38,398 INFO [org.hornetq.integration.bootstrap.HornetQBootstrapServ

                        er]  Starting HornetQ Server

                        [main] 10:42:39,515 WARNING [org.hornetq.core.deployers.impl.FileConfigurationPa

                        rser]  AIO wasn't located on this platform, it will fall back to using pure Java

                        NIO. If your platform is Linux, install LibAIO to enable the AIO journal

                        [main] 10:42:39,572 INFO [org.hornetq.core.server.impl.HornetQServerImpl]  live

                        server is starting..

                        [main] 10:42:39,612 INFO [org.hornetq.core.persistence.impl.journal.JournalStora

                        geManager]  Using NIO Journal

                        [main] 10:42:39,631 WARNING [org.hornetq.core.server.impl.HornetQServerImpl]  Se

                        curity risk! It has been detected that the cluster admin user and password have

                        not been changed from the installation default. Please see the HornetQ user guid

                        e, cluster chapter, for instructions on how to do this.

                        [main] 10:42:43,206 INFO [org.hornetq.core.remoting.impl.netty.NettyAcceptor]  S

                        tarted Netty Acceptor version 3.2.1.Final-r2319 localhost:5455 for CORE protocol

                         

                        [main] 10:42:43,210 INFO [org.hornetq.core.remoting.impl.netty.NettyAcceptor]  S

                        tarted Netty Acceptor version 3.2.1.Final-r2319 localhost:61613 for STOMP protoc

                        ol

                        [main] 10:42:43,214 INFO [org.hornetq.core.remoting.impl.netty.NettyAcceptor]  S

                        tarted Netty Acceptor version 3.2.1.Final-r2319 localhost:5445 for CORE protocol

                         

                        [main] 10:42:43,217 INFO [org.hornetq.core.server.impl.HornetQServerImpl]  Horne

                        tQ Server version 2.1.2.Final (Colmeia, 120) started

                        java.lang.NullPointerException

                                at org.hornetq.core.protocol.stomp.StompSession.sendMessage(StompSession

                        .java:89)

                                at org.hornetq.core.server.impl.ServerConsumerImpl.deliverStandardMessag

                        e(ServerConsumerImpl.java:644)

                                at org.hornetq.core.server.impl.ServerConsumerImpl.handle(ServerConsumer

                        Impl.java:253)

                                at org.hornetq.core.server.impl.QueueImpl.handle(QueueImpl.java:1384)

                                at org.hornetq.core.server.impl.QueueImpl.deliverDirect(QueueImpl.java:1

                        294)

                                at org.hornetq.core.server.impl.QueueImpl.add(QueueImpl.java:1347)

                                at org.hornetq.core.server.impl.QueueImpl.addLast(QueueImpl.java:247)

                                at org.hornetq.core.postoffice.impl.PostOfficeImpl.addReferences(PostOff

                        iceImpl.java:971)

                                at org.hornetq.core.postoffice.impl.PostOfficeImpl.access$200(PostOffice

                        Impl.java:75)

                                at org.hornetq.core.postoffice.impl.PostOfficeImpl$1.done(PostOfficeImpl

                        .java:958)

                                at org.hornetq.core.persistence.impl.journal.OperationContextImpl.execut

                        eOnCompletion(OperationContextImpl.java:158)

                                at org.hornetq.core.persistence.impl.journal.JournalStorageManager.after

                        CompleteOperations(JournalStorageManager.java:394)

                                at org.hornetq.core.postoffice.impl.PostOfficeImpl.processRoute(PostOffi

                        ceImpl.java:947)

                                at org.hornetq.core.postoffice.impl.PostOfficeImpl.route(PostOfficeImpl.

                        java:668)

                                at org.hornetq.core.server.impl.ServerSessionImpl.doSend(ServerSessionIm

                        pl.java:1175)

                                at org.hornetq.core.server.impl.ServerSessionImpl.send(ServerSessionImpl

                        .java:1000)

                                at org.hornetq.core.protocol.stomp.StompProtocolManager.onSend(StompProt

                        ocolManager.java:545)

                                at org.hornetq.core.protocol.stomp.StompProtocolManager.doHandleBuffer(S

                        tompProtocolManager.java:178)

                                at org.hornetq.core.protocol.stomp.StompProtocolManager.handleBuffer(Sto

                        mpProtocolManager.java:145)

                                at org.hornetq.core.protocol.stomp.StompConnection.bufferReceived(StompC

                        onnection.java:152)

                                at org.hornetq.core.remoting.server.impl.RemotingServiceImpl$DelegatingB

                        ufferHandler.bufferReceived(RemotingServiceImpl.java:459)

                                at org.hornetq.core.remoting.impl.netty.HornetQChannelHandler.messageRec

                        eived(HornetQChannelHandler.java:67)

                                at org.jboss.netty.channel.SimpleChannelHandler.handleUpstream(SimpleCha

                        nnelHandler.java:100)

                                at org.jboss.netty.channel.StaticChannelPipeline.sendUpstream(StaticChan

                        nelPipeline.java:362)

                                at org.jboss.netty.channel.StaticChannelPipeline$StaticChannelHandlerCon

                        text.sendUpstream(StaticChannelPipeline.java:514)

                                at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:30

                        2)

                                at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessage

                        Received(FrameDecoder.java:317)

                                at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDeco

                        der.java:299)

                                at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(Fram

                        eDecoder.java:214)

                                at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(S

                        impleChannelUpstreamHandler.java:80)

                                at org.jboss.netty.channel.StaticChannelPipeline.sendUpstream(StaticChan

                        nelPipeline.java:362)

                                at org.jboss.netty.channel.StaticChannelPipeline.sendUpstream(StaticChan

                        nelPipeline.java:357)

                                at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:27

                        4)

                                at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:26

                        1)

                                at org.jboss.netty.channel.socket.oio.OioWorker.run(OioWorker.java:90)

                                at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnabl

                        e.java:108)

                                at org.jboss.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.j

                        ava:46)

                                at org.jboss.netty.util.VirtualExecutorService$ChildExecutorRunnable.run

                        (VirtualExecutorService.java:181)

                                at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source

                        )

                                at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

                                at java.lang.Thread.run(Unknown Source)

                        • 9. Re: Severe message loss using Stomp with "direct-deliver" enabled
                          mjustin

                          My first reproducable test case:

                           

                          * start ConsumerTool so that it will wait 'forever' (or a high value like 1000 messages)

                          * start ProducerTool and send a couple of messages (for example 100), then disconnect normally

                          * kill the ConsumerTool process (Ctrl+C) - which means that the Stomp DISCONNECT frame will not be sent

                          * start ConsumerTool and send messages again

                           

                          ConsumerTool output will be:

                           

                          Connecting to URL: stomp://localhost:61613
                          Consuming queue: ExampleQueue
                          Using a non-durable subscription
                          We are about to wait until we consume: 100 message(s) then we will shutdown
                          Received: Message: 1 sent at: 21.09.2010 11:05:11           ...
                          Received: Message: 3 sent at: 21.09.2010 11:05:11           ...
                          Received: Message: 5 sent at: 21.09.2010 11:05:11           ...
                          Received: Message: 7 sent at: 21.09.2010 11:05:11           ...
                          Received: Message: 9 sent at: 21.09.2010 11:05:11           ...
                          Received: Message: 11 sent at: 21.09.2010 11:05:11          ...
                          Received: Message: 13 sent at: 21.09.2010 11:05:11          ...
                          Received: Message: 15 sent at: 21.09.2010 11:05:11          ...
                          Received: Message: 17 sent at: 21.09.2010 11:05:11          ...
                          Received: Message: 19 sent at: 21.09.2010 11:05:11          ...

                           

                          Conclusion: it seems to me like because of the missing DISCONNECT frame of the first ConsumerTool run, there is still a "consumer" running on the broker side which swallows every second message sent to the queue.

                           

                          Possible fix on client side, ensure that a DISCONNECT frame will be sent always (using a Ctrl-C shutdown hook)

                          Possible fix on server side: detect irregular disconnects and terminate associated consumers

                          • 10. Re: Severe message loss using Stomp with "direct-deliver" enabled
                            timfox

                            Yes, on failure, server side consumers are not removed until connection-ttl. This is normal behaviour and is discussed in the user manual.

                            • 11. Re: Severe message loss using Stomp with "direct-deliver" enabled
                              david.taylor

                              Tim,

                               

                              Can you please explain the sematics of connection-ttl? The part that makes zero sense to me is that a failed consumer connection affects the delivery of messages sent to a given queue. If a failed consumer is no longer actively participating in message dequeuing, how is it that messages are apparently being consumed? The socket level connection presumably no longer exists after the TCP session fails and certainly cannot pass any data to the consumer.

                               

                              We have worked with many commercial queuing environments including Tibco, MSMQ and MQ Series and have never seen this sort of behavior. Hopefully we have missed something since this issue is a complete show stopper that renders HQ unusable in a mission critical environment. Our primary interest in HornetQ, beyond its excellent performance, is reliable delivery of messages.

                               

                              Regards,

                              David Taylor

                              • 12. Re: Severe message loss using Stomp with "direct-deliver" enabled
                                timfox

                                David Taylor wrote:

                                 

                                Tim,

                                 

                                Can you please explain the sematics of connection-ttl?

                                This is explained in detail in the user manual. http://hornetq.sourceforge.net/docs/hornetq-2.1.2.Final/user-manual/en/html/connection-ttl.html

                                 

                                I've also expained in several times on this forum in other threads, one just a few days ago IIRC. I should probably add something to the FAQ.

                                 

                                So, here goes again:

                                 

                                How does a server know that a client has crashed or disappeared? Naively (and this is a very common misconception), you might think that, if a client has crashed, or the network has failed, (e.g. someone cut the ethernet cable) you might get an exception on the server?

                                 

                                Wrong! TCP is a "reliable" protocol. In this context, "reliable" means that that TCP will cope with temporary loss of packets, and it will retransmit them when it detects they have not been acked.

                                 

                                If you pull out the ethernet cable, or you client machine blows up, or your router explodes, you will *not* get an exception on the server. This is because TCP has no way of distinguish some temporary packet loss from something more cataclysmic (until some very long timeout anyway).

                                 

                                In the eyes of TCP and the OS for that matter, the connection is still alive, it's just that we haven't received any packets on it for a while.

                                 

                                So, how does networking software detect client failure? This is done by "pinging", (a.k.a. heartbeats). The idea is the client periodically sends pings to the server. If the server does not receive a "ping" within a certain period, it will assume the client has crashed (or the network crashed), and it will close the connection from the server side and release any sessions/consumers that might be consuming from queues.

                                 

                                That period is called connection-ttl in HornetQ.

                                 

                                This is no different from how any other messaging software works and is a result of the way the TCP protocol works and it's reliable nature. Nothing we can do about that.

                                 

                                As explained in the user manual, you can always set connection-ttl to a lower value if you want.

                                • 13. Re: Severe message loss using Stomp with "direct-deliver" enabled
                                  david.taylor

                                  Tim,

                                   

                                  Thank you for your response.

                                   

                                  I have a detailed understanding of the semantics of the TCP protocol and the way it handles unreliable/lost connections. The issue I have is not with the TCP timeout behavior or the way resource cleanup is handled on the server, but rather the fact that messages are apparently permanently lost while the HornetQ broker is deciding if the client is still alive. Where are these messages going? Why are undeliverable messages not rolled back into the queue? The expectation is that the queuing mechanism is supposed to be reliable even in the face of unexpected failures.

                                   

                                  Looking at this another way, when a consumer fails, the TCP session stays open for a period of time until TCP decides the connection is dead. During this time, the flow of data effectively comes to a halt rather quickly since the TCP windowing mechanism by design allows only a finite number of unacknowledged packets to exist on the wire. Given this fact, how is it possible that HornetQ continues to deliver messages to the failed consumer? The actual consumer at this point is long gone and is clearly not receiving messages. It would be understandable if a message or perhaps two were termporarily held in limbo when the client fails, but we are seeing literally hundreds of messages dissapearing into the ether.

                                   

                                  At this juncture, barring any substantive changes, I am left with the conclusion that HornetQ simply fails to meet basic reliability expecations for a messaging broker. Mandating the a consumer always disconnect cleanly to avoid message loss is a non-starter.

                                   

                                  Regards,

                                  David

                                  • 14. Re: Severe message loss using Stomp with "direct-deliver" enabled
                                    timfox

                                    David Taylor wrote:

                                     


                                     

                                    Looking at this another way, when a consumer fails, the TCP session stays open for a period of time until TCP decides the connection is dead. During this time, the flow of data effectively comes to a halt rather quickly since the TCP windowing mechanism by design allows only a finite number of unacknowledged packets to exist on the wire. Given this fact, how is it possible that HornetQ continues to deliver messages to the failed consumer? The actual consumer at this point is long gone and is clearly not receiving messages. It would be understandable if a message or perhaps two were termporarily held in limbo when the client fails, but we are seeing literally hundreds of messages dissapearing into the ether.

                                     

                                    Sigh. Firstly, no messages are "lost". They are in the process of delivery, and will happily get requeued if they are not acked and the session closes.

                                     

                                    If you read the chapter on flow control, you will see that each consumer maintains a window. This is the total size of messages that can be sent to the consumer without the consumer requesting more credits. It's completely configurable and it's default value is 1 MiB. This determines how many messages can be queued to be sent to a consumer.

                                     

                                    Depending on the size of your messages, 1 MiB of messages may well correspond to thousands of messages.

                                     

                                    Secondly, TCP send buffer size. Default value for this is 32kiB. On faster networks it is recommended to set this to a higher figure e.g. 1 MiB.

                                     

                                    If you really understand TCP as well as you say you do, you will understand that, in the absence of consumer flow control, it's TCP flow control that determines how many messages are "lost in the ether". With a 1 MiB TCP send buffer that's more than "one or two messages", unless of course your messages are very large.

                                     

                                    Again, this is fully configurable.

                                     

                                    Before you start making ranting claims about HornetQ reliability I suggest you fully read the relevant chapters in the user manual, and preferably a book on TCP.

                                     

                                    As I said before, HornetQ behaviour is completely configurable in this regard, and any buffering due to the way TCP works is unavoidable and would be the same with any other messaging system.

                                    1 2 3 Previous Next