5 Replies Latest reply on Sep 6, 2011 8:52 PM by clebert.suconic

    "Jammed" messages on queues with paging, non-zero redelivery-delay and sequential consumers

    stevehunt

      Hi,

       

      Getting a small fraction of messages being "jammed" on a queue in testing, which seems to depend on redelivery-delay being set to a non-zero value (10 and 1000 tested), paging being enabled and multiple consumers reading from the single queue.

      My environment is

       

          HornetQ 2.2.5-FINAL (and HornetQ 2.2.7-FINAL)

          Ubuntu 10.4 LTS

          Java(TM) SE Runtime Environment (build 1.6.0_22-b04)

       

      The server config is just the standalone/non-clustered version with the following set in the address properties

       

         <address-settings>

            <address-setting match="#">

               <dead-letter-address>jms.queue.DLQ</dead-letter-address>

               <expiry-address>jms.queue.ExpiryQueue</expiry-address>

               <redelivery-delay>10</redelivery-delay>

               <page-size-bytes>10485760</page-size-bytes>      

               <max-size-bytes>104857600</max-size-bytes>      

               <message-counter-history-day-limit>10</message-counter-history-day-limit>

               <address-full-policy>PAGE</address-full-policy>

            </address-setting>

         </address-settings>

       

      30000 messages are written onto the exampleQueue, and 3 processes then read 10,000 messages off the queue, one after the other (just chained together on the command line).

       

      When running this, the 30,000 messages are all written to the queue successfully, but when reading from the queue we first get a sequence of the following on the server console

       

      [Thread-6 (group:HornetQ-server-threads1589377628-418652512)] 11:44:39,011 WARNING [org.hornetq.core.server.impl.QueueImpl]  Error on checkDLQ

      java.lang.IllegalStateException: Cannot find add info 10542

                at org.hornetq.core.journal.impl.JournalImpl.appendUpdateRecord(JournalImpl.java:910)

                at org.hornetq.core.persistence.impl.journal.JournalStorageManager.updateScheduledDeliveryTime(JournalStorageManager.java:546)

                at org.hornetq.core.server.impl.QueueImpl.checkRedelivery(QueueImpl.java:1788)

                at org.hornetq.core.server.impl.QueueImpl$RefsOperation.afterRollback(QueueImpl.java:2172)

                at org.hornetq.core.transaction.impl.TransactionImpl.afterRollback(TransactionImpl.java:509)

                at org.hornetq.core.transaction.impl.TransactionImpl$4.done(TransactionImpl.java:375)

                at org.hornetq.core.persistence.impl.journal.OperationContextImpl$1.run(OperationContextImpl.java:239)

                at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:100)

                at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

                at java.lang.Thread.run(Thread.java:662)

       

      (all refer to Thread-6 in this case, with the address value increasing by 2 each time, from 10450 to 10542), and these are followed by lots of these

       

      [Old I/O server worker (parentId: 1099518668, [id: 0x418952cc, localhost/127.0.0.1:5445])] 11:44:40,460 WARNING [org.hornetq.core.server.impl.QueueImpl]  Error on checkDLQ

      java.lang.IllegalStateException: Cannot find add info 30564

                at org.hornetq.core.journal.impl.JournalImpl.appendUpdateRecord(JournalImpl.java:910)

                at org.hornetq.core.persistence.impl.journal.JournalStorageManager.updateScheduledDeliveryTime(JournalStorageManager.java:546)

                at org.hornetq.core.server.impl.QueueImpl.checkRedelivery(QueueImpl.java:1788)

                at org.hornetq.core.server.impl.QueueImpl$RefsOperation.afterRollback(QueueImpl.java:2172)

                at org.hornetq.core.transaction.impl.TransactionImpl.afterRollback(TransactionImpl.java:509)

                at org.hornetq.core.transaction.impl.TransactionImpl$4.done(TransactionImpl.java:375)

                at org.hornetq.core.persistence.impl.journal.OperationContextImpl.executeOnCompletion(OperationContextImpl.java:188)

                at org.hornetq.core.persistence.impl.journal.JournalStorageManager.afterCompleteOperations(JournalStorageManager.java:424)

                at org.hornetq.core.transaction.impl.TransactionImpl.rollback(TransactionImpl.java:363)

                at org.hornetq.core.server.impl.ServerConsumerImpl.close(ServerConsumerImpl.java:340)

                at org.hornetq.core.server.impl.ServerSessionImpl.closeConsumer(ServerSessionImpl.java:1038)

                at org.hornetq.core.protocol.core.ServerSessionPacketHandler.handlePacket(ServerSessionPacketHandler.java:426)

                at org.hornetq.core.protocol.core.impl.ChannelImpl.handlePacket(ChannelImpl.java:474)

                at org.hornetq.core.protocol.core.impl.RemotingConnectionImpl.doBufferReceived(RemotingConnectionImpl.java:496)

                at org.hornetq.core.protocol.core.impl.RemotingConnectionImpl.bufferReceived(RemotingConnectionImpl.java:457)

                at org.hornetq.core.remoting.server.impl.RemotingServiceImpl$DelegatingBufferHandler.bufferReceived(RemotingServiceImpl.java:459)

                at org.hornetq.core.remoting.impl.netty.HornetQChannelHandler.messageReceived(HornetQChannelHandler.java:73)

                at org.jboss.netty.channel.SimpleChannelHandler.handleUpstream(SimpleChannelHandler.java:100)

                at org.jboss.netty.channel.StaticChannelPipeline.sendUpstream(StaticChannelPipeline.java:362)

                at org.jboss.netty.channel.StaticChannelPipeline$StaticChannelHandlerContext.sendUpstream(StaticChannelPipeline.java:514)

                at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:287)

                at org.hornetq.core.remoting.impl.netty.HornetQFrameDecoder2.decode(HornetQFrameDecoder2.java:169)

                at org.hornetq.core.remoting.impl.netty.HornetQFrameDecoder2.messageReceived(HornetQFrameDecoder2.java:134)

                at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80)

                at org.jboss.netty.channel.StaticChannelPipeline.sendUpstream(StaticChannelPipeline.java:362)

                at org.jboss.netty.channel.StaticChannelPipeline.sendUpstream(StaticChannelPipeline.java:357)

                at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:274)

                at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:261)

                at org.jboss.netty.channel.socket.oio.OioWorker.run(OioWorker.java:90)

                at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)

                at org.jboss.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:46)

                at org.jboss.netty.util.VirtualExecutorService$ChildExecutorRunnable.run(VirtualExecutorService.java:181)

                at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

                at java.lang.Thread.run(Thread.java:662)

       

      (again sequence increases by 2 each time, 30564-30954).

       

      The order of those two blocks (the threads and the old server i/o) can be the other way around (io error first with 10xxx addresses, thread errors with the 30xxx addresses after).

       

      Also I don't think I'm getting the thread errors appearing in 2.2.7 (only the old server i/o messages).

       

      The read processes don't download the full complement of messages from the queue (though always download over 29,000), and when message counters are enabled there are still a number of messages which have been both added to the queue (count increases) and remaining on the queue (depth stays non-zero) at the end.

       

      These "jammed" messages on the queue can't be consumed from the queue however, even after the server has cleaned up stale connections.

       

      Also we don't get this issue if any of these hold

        * redelivery-delay is set to zero

         * we using BLOCK instead of PAGE (with max size increase appropriately)

         * all 30,000 messages are read off by a single consumer

       

      Anyhow, setting redelivery-delay to zero may be a good enough work-around for us, but I thought I'd log this in case anyone else stumbles onto it (and apologies in advance for the shonkiness of the test code!)

       

      Cheers!
      Steve