1 2 Previous Next 29 Replies Latest reply on Feb 27, 2017 5:55 PM by spatwary04

    Memory problems in paging mode makes server unresponsive

    mlange

      We are currently testing the paging mode (HornetQ 2.2.23 part of eap 6.0.1). In our use cases it could happen that consumers become temporarily unavailable. For queues with high throughputs this might end up with millions of messages paged to disk. I am not able to configure the system to be stable in such a scenario.

       

      We have a 3-node clustered setup with clients being load-balanced to talk to all nodes using plain JMS. NIO is used for the journal and netty. The servers have max heaps of 1,5G each. The nodes are configured to page when 500MB of memory are consumed:

       

      <clustered>true</clustered>

      <persistence-enabled>true</persistence-enabled>

      <jmx-management-enabled>true</jmx-management-enabled>

      <persist-id-cache>false</persist-id-cache>

      <failover-on-shutdown>true</failover-on-shutdown>

      <shared-store>true</shared-store>

      <journal-type>NIO</journal-type>

      <journal-buffer-timeout>16666666</journal-buffer-timeout>

      <journal-buffer-size>5242880</journal-buffer-size>

      <journal-file-size>10485760</journal-file-size>

      <journal-min-files>10</journal-min-files>

       

      <netty-acceptor name="netty" socket-binding="messaging">

        <param key="use-nio"  value="true"/>

        <param key="batch-delay" value="50"/>

        <param key="direct-deliver" value="false"/>

      </netty-acceptor>

       

      <!-- 500MB memory limit per address -->

      <max-size-bytes>524288000</max-size-bytes>

      <page-size-bytes>52428800</page-size-bytes>

      <address-full-policy>PAGE</address-full-policy>

       

      In the test only one address spawning 30 queues is used.

       

      Producing and consuming with a lot of clients works like a charm. Producing messages without consuming them causes big problems. The old generation space is exhausted way too fast so that the server is doing nothing but garbage collection. The server logs reveal these timeouts on the QueueImpl:

      20:27:37,216 WARN  [org.hornetq.core.server.impl.QueueImpl] (New I/O server worker #2-22) Timed out on waiting for MessageCount: java.lang.IllegalStateException: Timed out on waiting for MessageCount

       

      The clients get timeouts:

      javax.jms.JMSException: Timed out waiting for response when sending packet 43

              at org.hornetq.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:302)

       

      It seems that the problems start once the messages are paged to disk (paging directory has a size of ~300MB). At that point it is almost impossible to start a new consumer on the queues. Message producing works only partly and very slowly compared to the rates when messages are consumed in parallel. Message consuming is considerably slow for the consumers still available.

       

      Thread dump reveals some of these:

      "New I/O server worker #2-1" prio=10 tid=0x000000000144a800 nid=0x1dea waiting on condition [0x00007fb68de08000]

        java.lang.Thread.State: WAITING (parking)

           at sun.misc.Unsafe.park(Native Method)

           - parking to wait for  <0x00000000b0ab59f8> (a java.util.concurrent.Semaphore$NonfairSync)

           at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)

           at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)

           at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)

           at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)

           at java.util.concurrent.Semaphore.acquire(Semaphore.java:317)

           at org.hornetq.core.persistence.impl.journal.JournalStorageManager.beforePageRead(JournalStorageManager.java:1689)

       

      "New I/O server worker #2-13" prio=10 tid=0x0000000001f30800 nid=0x1e3e runnable [0x00007fb69972b000]

         java.lang.Thread.State: RUNNABLE

          at sun.misc.Unsafe.setMemory(Native Method)

          at sun.misc.Unsafe.setMemory(Unsafe.java:529)

          at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:132)

          at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)

          at org.hornetq.core.journal.impl.NIOSequentialFileFactory.allocateDirectBuffer(NIOSequentialFileFactory.java:108)

          at org.hornetq.core.persistence.impl.journal.JournalStorageManager.allocateDirectBuffer(JournalStorageManager.java:1709)

          at org.hornetq.core.paging.impl.PageImpl.read(PageImpl.java:119)

          at org.hornetq.core.paging.cursor.impl.PageCursorProviderImpl.getPageCache(PageCursorProviderImpl.java:190)

          at org.hornetq.core.paging.cursor.impl.PageCursorProviderImpl.getMessage(PageCursorProviderImpl.java:126)

          at org.hornetq.core.paging.cursor.impl.PageSubscriptionImpl.queryMessage(PageSubscriptionImpl.java:607)

          at org.hornetq.core.paging.cursor.PagedReferenceImpl.getPagedMessage(PagedReferenceImpl.java:73)

          - locked <0x00000000d588d2d0> (a org.hornetq.core.paging.cursor.PagedReferenceImpl)

       

      Is there anything that can be done so that the memory is not exhausted in such a quick manner? Am I missing sth. important to configure for this paging use case?

       

      Thanks!

       

      Marek

       

      Message was edited by: Marek Neumann added HornetQ version

        • 1. Re: Memory problems in paging mode makes server unresponsive
          jbertram

          Have you tried any of the following:

          • Increasing max heap size
          • Decreasing <max-size-bytes>
          • Decreasing <page-size-bytes>

           

          If so, what were the results.  If not, can you?

           

          Aside from that, can you elaborate on your test a bit more?

          • 2. Re: Memory problems in paging mode makes server unresponsive
            clebert.suconic

            500M is pretty high I think..

             

            you have a single address? Notice that the maxpagesize is per address.. not per server.

            • 3. Re: Memory problems in paging mode makes server unresponsive
              mlange

              Hi Justin,

               

              increasing max-heap does only defer the problem. Decreasing the max-size-bytes only leads to an earlier start of paging. What I am asking myself: why is the heap consumed so quickly when messages are not consumed? I tested with sending 180.000 messages à 1k and the old generation is filled within 30s (leading to immediate garbage collections which cause the clients to fail with timeouts).

               

              @clebert.suconic: yes I have one address which is used by 30 queues. I have read that the max-size-bytes is per address. There is no overall limitation for all addresses available? This makes provisioning quite hard since the exact number of addressed created at runtime will not be known.

               

              What I am doing in the test (30 queues, 1 address, message size 1k):

               

              1) Starting consumers

              2) Sending messages with 100 client threads (messages are consumed in time so the queues remain almost empty)

              3) Stopping consumers (heap is filled very quickly, paging starts ~1min later)

              4) Starting consumers (does not work well, timeouts on clients, timeouts on servers probably due to extensive GC)

               

              Is there any way to avoid that servers fill up with so much memory? Are the messages always kept in memory even if paging occurs? To me this looks like an unwanted behavior and with this behavior a stable operations cannot be guaranteed once paging occurs (and this could always happen in case of temporary consumer failures).

               

              Marek

              • 4. Re: Memory problems in paging mode makes server unresponsive
                mlange

                I have retested with a reduced max-size-bytes (100MB) and this did not make any difference.

                 

                As long as the consumers are available and the queues remain empty, memory looks really good:

                 

                memory-with-active-consumers.png

                 

                After some minutes the consumers are stopped for some time:

                memory-with-stopped-consumers.png

                The heap is completely occupied with objects originating from paging implementation:

                memory-with-stopped-consumers-paging-classes.png

                 

                Is this the intended paging behavior? Does it make sense to hold all paged messages in memory? When a max-size-bytes of 100MB is configured why are 700MB of heap used for the messages? Due to the heap exhaustion the whole system is unresponsive and it is not recovering from this state....

                 

                Thanks for your valuable input!

                Marek

                • 5. Re: Memory problems in paging mode makes server unresponsive
                  mlange

                  I have reduced the <page-size-bytes> to 1MB per file. This reduced the memory consumption a bit. However there comes the time when the heap is completely filled with PagedSubscriptionImpl's and the consuming is very slow.

                   

                  oldgen-full-with-page-objects.png

                  oldgen-jconsole.png

                   

                  I have no idea how to prevent the system from running into this state (apart from always having consumers, but this could not be the case sometimes).

                  • 6. Re: Memory problems in paging mode makes server unresponsive
                    mlange

                    One important thing I can add is that the producer and consumer performance is extremely degraded when paging occurs. Even if there are no messages produced anymore the consumers act very slowly - and this happens although the heap is not completely filled. What can be done so that this does not happen? I am afraid that once the server is in paging mode it will never recover again from it due to the degraded delivery.

                    • 7. Re: Memory problems in paging mode makes server unresponsive
                      clebert.suconic

                      You're using EAP.. are you using the latest patches available?

                      • 8. Re: Memory problems in paging mode makes server unresponsive
                        clebert.suconic

                        Paging is not to be exercised like a Database.. is a palliative for when you eventually are out of sync. So everything goes through disk and stays temporarily on the memory.

                         

                         

                        We have plans to make it better.. but it's been designed around this prerrogative that messaging use cases are still MOM where ou have consumers for your queues, and not DB-like.

                        • 9. Re: Memory problems in paging mode makes server unresponsive
                          jbertram

                          One way to avoid this is to use BLOCK or FAIL mode so that producers are unable to flood the server with messages while the consumers recover.

                           

                          Another way to mitigate the problem is to set a producer-max-rate so that the producers can't send message so quickly so that if the consumers drop off the server doesn't get overwhelmed so quickly.

                          • 10. Re: Memory problems in paging mode makes server unresponsive
                            clebert.suconic

                            One thing just hit me on my memory... How are you sending these messages? Are you using transaction on sending?

                             

                            How you're consuming?

                             

                             

                            Are you sure you're committing the acks? (aren't you or whoever wrote the code forgetting to ack by accident?)

                             

                            Transactions on sending will force us to use pageTransactions to hold on their state until the message is consumed.. if you use transactions it will always use memory.

                            • 11. Re: Memory problems in paging mode makes server unresponsive
                              mlange

                              You're using EAP.. are you using the latest patches available?

                              No, we can only use 6.0.1 with 2.2.x due to the introduction of JBoss Logging in 2.3 (org.jboss.logging.Logger is duplicated in JBoss 4.3 and we still have to support HornetQ clients running in JBoss 4.3). Do you think of specific bugs fixed onwards which might prevent this problem?

                               

                              >We have plans to make it better.. but it's been designed around this prerrogative that messaging use cases are still MOM where ou have consumers for your queues, >and not DB-like.


                              I am aware of that. I don't aim at using it like a DB. Just testing the worst case scenario where messages arrive and consumers are unavailable due to whatever reason.


                              >One way to avoid this is to use BLOCK or FAIL mode so that producers are unable to flood the server with messages while the consumers recover.

                              >Another way to mitigate the problem is to set a producer-max-rate so that the producers can't send message so quickly so that if the consumers drop off the server >doesn't get overwhelmed so quickly.

                               

                              Sounds like workarounds. Throttling the producers would mean to degrade overall performance just for preventing that failure scenario. This would also limit the scalability of the whole system. BLOCK might be an option in case paging does not work at all.

                               

                              >One thing just hit me on my memory... How are you sending these messages? Are you using transaction on sending?

                               

                              Yes, messages are sent transactionally (JMS-style).

                               

                              >How you're consuming?

                              >Are you sure you're committing the acks? (aren't you or whoever wrote the code forgetting to ack by accident?)

                               

                              Commits are done on the transacted JMS session. But it uses auto-acknowledge. Is it required to acknowledge on the client?

                               

                              >Transactions on sending will force us to use pageTransactions to hold on their state until the message is consumed.. if you use transactions it will always use memory.

                               

                              Not sure if I understand this completely. The transactional state is hold in memory (is it the "consumedMessages" tree map?) as long as the message is not consumed. Would it help to decrease heap usage when sending and consuming non-transactionally? However transactions are a requirement to be able to rollback sending and consuming in case of business logic errors.

                               

                              Thanks Marek

                              • 12. Re: Memory problems in paging mode makes server unresponsive
                                jbertram

                                No, we can only use 6.0.1 with 2.2.x due to the introduction of JBoss Logging in 2.3 (org.jboss.logging.Logger is duplicated in JBoss 4.3 and we still have to support HornetQ clients running in JBoss 4.3).

                                Older clients should work with newer servers.  Have you tried using your existing 2.2.x clients on JBoss 4.3 with, for example, JBoss EAP 6.1 or 6.2?

                                 

                                Sounds like workarounds.

                                Using BLOCK or FAIL could potentially be a work-around for the issue you're experiencing with PAGE.  I'm not familiar enough with your use-case to say what would be best.  I'm just trying to give you options.

                                 

                                Throttling the producers would mean to degrade overall performance just for preventing that failure scenario. This would also limit the scalability of the whole system. BLOCK might be an option in case paging does not work at all.

                                With the current design you'll need to pick which is the lesser evil, so to speak.

                                 

                                However, it's worth noting that setting a producer rate limit wouldn't necessarily "degrade overall performance."  If by "performance" you simply mean "message throughput" then I would tend to agree, but if by "performance" you meant "message latency" (or some mix of latency and throughput) then throttling producers could provide a performance gain.

                                 

                                Yes, messages are sent transactionally (JMS-style).

                                Can you elaborate on the use-case here?  Are you sending multiple messages in your transaction?  By "JMS-style" do you mean you aren't using a JTA transaction on the sender?

                                 

                                Commits are done on the transacted JMS session. But it uses auto-acknowledge. Is it required to acknowledge on the client?

                                When a JMS client uses a transacted session the acknowledgeMode is ignored.  The messages are only acknowledge when the session is committed.

                                • 13. Re: Memory problems in paging mode makes server unresponsive
                                  mlange
                                  Older clients should work with newer servers.  Have you tried using your existing 2.2.x clients on JBoss 4.3 with, for example, JBoss EAP 6.1 or 6.2?

                                  That came also to my mind yesterday. This would be definitely an option when the old clients are compatible with 2.3 or 2.4 HornetQ versions.

                                  Can you elaborate on the use-case here?  Are you sending multiple messages in your transaction?  By "JMS-style" do you mean you aren't using a JTA transaction on the sender?

                                  The clients lookup a connection factory from the resource adaptor:

                                   

                                  <tx-connection-factory>

                                      <jndi-name>HornetqProducerPooledConnectionFactory</jndi-name>

                                      <xa-transaction/>

                                      <rar-name>hornetq-ra.rar</rar-name>

                                      <connection-definition>org.hornetq.ra.HornetQRAConnectionFactory</connection-definition>

                                      <config-property name="SessionDefaultType" type="java.lang.String">javax.jms.Topic</config-property>

                                      <config-property name="ConnectionParameters" type="java.lang.String">host=event-svc-01-test;port=11830</config-property>

                                      <config-property name="ConnectorClassName" type="java.lang.String">org.hornetq.core.remoting.impl.netty.NettyConnectorFactory</config-property>

                                   

                                  This should use JTA transactions by default right? I was wrong - JMS transactions are not used on the sender but only on the consumer to provide rollback semantics for the user.

                                  • 14. Re: Memory problems in paging mode makes server unresponsive
                                    jbertram

                                    This should use JTA transactions by default right? I was wrong - JMS transactions are not used on the sender but only on the consumer to provide rollback semantics for the user.

                                    If the code getting the connection from this <tx-connection-factory> is running in an active JTA transaction (e.g. in an EJB method with REQUIRES or REQUIRES_NEW, etc.) then any operation done with that connection will be part of that JTA transaction.

                                    1 2 Previous Next