7 Replies Latest reply on Jul 6, 2011 11:55 AM by clebert.suconic

    Paging related OutOfMemoryError

    carl.heymann

      Hi

       

      I've been running load tests on our system, feeding messages into an initial "distribution" queue at about 400/second. Messages get consumed from there by 100 "distribution consumers", and then sent to other queues in JMS local transactions. This happily for 25 minutes, after which consumers of some queues slow down (normal in this case). At this point, the initial "distribution" queue suddenly starts to grow in size, with the distribution consumers processing only 200 messages/sec.

       

      After another hour of running, with a queue size of about 500k messages, hornet starts throwing this:

       

      [Old I/O server worker (parentId: 448511246, [id: 0x1abbbd0e, /10.0.4.92:5445])] 18:54:27,142 SEVERE [org.hornetq.core.protocol.core.ServerSessionPacketHandler]  Caught unexpected exception

      java.lang.OutOfMemoryError

                at sun.misc.Unsafe.allocateMemory(Native Method)

                at java.nio.DirectByteBuffer.<init>(Unknown Source)

                at java.nio.ByteBuffer.allocateDirect(Unknown Source)

                at org.hornetq.core.paging.impl.PageImpl.read(PageImpl.java:119)

                at org.hornetq.core.paging.cursor.impl.PageCursorProviderImpl.getPageCache(PageCursorProviderImpl.java:184)

                at org.hornetq.core.paging.cursor.impl.PageCursorProviderImpl.getPageCache(PageCursorProviderImpl.java:136)

                at org.hornetq.core.paging.cursor.impl.PageCursorProviderImpl.getMessage(PageCursorProviderImpl.java:113)

                at org.hornetq.core.paging.cursor.impl.PageSubscriptionImpl.queryMessage(PageSubscriptionImpl.java:527)

                at org.hornetq.core.paging.cursor.PagedReferenceImpl.getPagedMessage(PagedReferenceImpl.java:71)

                at org.hornetq.core.paging.cursor.impl.PageSubscriptionImpl.getPageTransaction(PageSubscriptionImpl.java:812)

                at org.hornetq.core.paging.cursor.impl.PageSubscriptionImpl.ackTx(PageSubscriptionImpl.java:440)

                at org.hornetq.core.server.impl.QueueImpl.acknowledge(QueueImpl.java:794)

                at org.hornetq.core.server.impl.ServerConsumerImpl.acknowledge(ServerConsumerImpl.java:576)

                at org.hornetq.core.server.impl.ServerSessionImpl.acknowledge(ServerSessionImpl.java:574)

                at org.hornetq.core.protocol.core.ServerSessionPacketHandler.handlePacket(ServerSessionPacketHandler.java:269)

                at org.hornetq.core.protocol.core.impl.ChannelImpl.handlePacket(ChannelImpl.java:474)

                at org.hornetq.core.protocol.core.impl.RemotingConnectionImpl.doBufferReceived(RemotingConnectionImpl.java:496)

                at org.hornetq.core.protocol.core.impl.RemotingConnectionImpl.bufferReceived(RemotingConnectionImpl.java:457)

                at org.hornetq.core.remoting.server.impl.RemotingServiceImpl$DelegatingBufferHandler.bufferReceived(RemotingServiceImpl.java:459)

                at org.hornetq.core.remoting.impl.netty.HornetQChannelHandler.messageReceived(HornetQChannelHandler.java:73)

                at org.jboss.netty.channel.SimpleChannelHandler.handleUpstream(SimpleChannelHandler.java:100)

                at org.jboss.netty.channel.StaticChannelPipeline.sendUpstream(StaticChannelPipeline.java:362)

                at org.jboss.netty.channel.StaticChannelPipeline$StaticChannelHandlerContext.sendUpstream(StaticChannelPipeline.java:514)

                at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:287)

                at org.hornetq.core.remoting.impl.netty.HornetQFrameDecoder2.decode(HornetQFrameDecoder2.java:169)

                at org.hornetq.core.remoting.impl.netty.HornetQFrameDecoder2.messageReceived(HornetQFrameDecoder2.java:134)

                at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80)

                at org.jboss.netty.channel.StaticChannelPipeline.sendUpstream(StaticChannelPipeline.java:362)

                at org.jboss.netty.channel.StaticChannelPipeline.sendUpstream(StaticChannelPipeline.java:357)

                at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:274)

                at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:261)

                at org.jboss.netty.channel.socket.oio.OioWorker.run(OioWorker.java:90)

                at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)

                at org.jboss.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:46)

                at org.jboss.netty.util.VirtualExecutorService$ChildExecutorRunnable.run(VirtualExecutorService.java:181)

                at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)

                at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

                at java.lang.Thread.run(Unknown Source)

       

      Changed hornetq configuration:


         <address-settings>

            <!--default for catch all-->

            <address-setting match="#">

               <dead-letter-address>jms.queue.DLQ</dead-letter-address>

               <expiry-address>jms.queue.ExpiryQueue</expiry-address>

               <redelivery-delay>0</redelivery-delay>

               <max-size-bytes>104857600</max-size-bytes>

               <page-size-bytes>1048576</page-size-bytes>

               <address-full-policy>PAGE</address-full-policy>

            </address-setting>

         </address-settings>

        <jmx-management-enabled>true</jmx-management-enabled>

        <message-counter-enabled>true</message-counter-enabled>

        <message-counter-sample-period>60000</message-counter-sample-period>

       

      We're using blocking network IO, but would like to use NIO. If we configure NIO, hornetq breaks down quickly, but I think that's another issue.

       

      My questions:

      1. What would cause the out-of-memory errors, and can we do anything to prevent it, except sending in fewer messages?
      2. Is there a ByteBuffer.allocateDirect(..) invocation for every page read? Doesn't this go against the recommendations at http://download.oracle.com/javase/6/docs/api/java/nio/ByteBuffer.html?
      3. Is there any recommended NIO-enabled netty configuration? E.g. a ratio between thread-pool-max-size and nio-remoting-threads to keep in mind?
      4. Would smaller page sizes be better? Currently using 1MiB page sizes, and 100MiB memory per queue.

       

      Thanks

      Carl

        • 1. Re: Paging related OutOfMemoryError
          clebert.suconic

          - What version of HQ?

          - How many queues do you have?

            (understand that the max-size is per queue, if you have 10 queues at 100MiB, you will have 1GiB in memory before you start paging).

          • 2. Re: Paging related OutOfMemoryError
            carl.heymann

            Version: 2.2.5.Final

             

            HornetQ runs with -Xms4G -Xmx10G, on a machine with 16GiB of memory and 8 cores.

             

            There are about 50 queues, and 100MiB memory per queue * 50 = 5GiB, so I figured that if half the JVM memory is allocated to queues, then the other half is available for general work. The VM did grow it's heap to close to the 10GiB before failing.

             

            Each queue has on average 1 consumer, but about 5 queues are very busy, growing to a max of 100 consumers.

             

            Only one queue grows to a large size, all the others remained around zero. One or two queues grow to a few thousand messages, then reduce to zero again. The messages are relatively small, each being a 10-50kiB.

             

            All messages are persistent.

            • 3. Re: Paging related OutOfMemoryError
              carl.heymann

              Clebert Suconic wrote:

               

              - What version of HQ?

              - How many queues do you have?

                (understand that the max-size is per queue, if you have 10 queues at 100MiB, you will have 1GiB in memory before you start paging).

              I thought that the per-queue limit meant that each queue goes into paging mode independently when it hits the limit. That would mean that paging could start even with just 100MiB memory used, if only one of the 10 queues grow to that size, or?

              • 4. Re: Paging related OutOfMemoryError
                clebert.suconic

                We have a JIRA for total size of the server.

                 

                You should plan ahead for resources. It wasn't intended to have the entire system in paging mode. If you do however, you should plan for proper resources and proper max-sizes.

                • 5. Re: Paging related OutOfMemoryError
                  carl.heymann

                  That's fine, I'll happily plan ahead for resources, I'm just trying to understand what the limits are to plan against. I have only one queue that goes into paging mode. The page files are only 500MiB on disk. I gave hornetq 10GiB of memory, with a 100MiB limit per queue. If I restart hornetq, the system continues again without problems.

                   

                  Could the problem be with the large number of consumers (100)? There are over 54000 messages in "delivering" state.

                   

                  Regarding the byte buffers: there are open bugs on garbage collection of large byte buffers (http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6296278 and related). If each page read causes a buffer to be allocated, then I suspect it would be better to use smaller page sizes.

                   

                  Another strange thing: after restarting the test, the "MessagesAdded" count is fluctuating, increasing and decreasing almost in sine wave fashion. Why would "MessagesAdded" ever decrease? When I query message counts, it looks like all timestamps are at the unix epoch (1970/01/01) so I assume something is zero. Also, the message count decreased to negative after all messages were consumed.

                  • 6. Re: Paging related OutOfMemoryError
                    clebert.suconic

                    The messagesAdded is changed as we depage from disk into memory. That's why.

                     

                     

                    Try using consumerWindowSize=0 on your test.

                    • 7. Re: Paging related OutOfMemoryError
                      clebert.suconic

                      BTW: I had a mistype in one of my messages.. the maxSize is per Address (not per Queue).

                       

                       

                      So, if the address is a JMS topic, the maxSize is the maximum size you could have on that topic.