9 Replies Latest reply on Sep 24, 2010 10:22 AM by Aspi Engineer

    JMS messages are getting trapped inside of the HQ system, when paging.

    Aspi Engineer Newbie

      JMS messages are getting trapped inside of the HQ system. That is a publisher can still publish messages, but consumers are unable to receive those messages until some specific events happen.


      Conside the following scenario:


      Scenario A:
      1)Create 2 durable subscriptions called S1 and S2 on a single topic destination D1.
      2)Shut down the durable consumers.
      3)Publish 500 messages on destination D1. The messages should be of sufficient size to ensure that some of them get paged. In my specific case, 63 messages were in memory, and the remaining 437 got paged.
      Each message was about 50K, and the settings were:


      4)Create a non-durable consumer 'C1' on the same destination D1.


      5) Restart durable consumer S1. You may need to wait for some time to have elapsed between steps (2) and step (5) else you may get a:
      javax.jms.IllegalStateException: Cannot create a subscriber on the durable subscription since it already has subscriber(s)


      Observation: Durable S1 only receives those 63 messages that are in memory. Instead it should have received all 500.


      6) Publish a single message to destination D1. At this time the total number of messages published is 501.


      Observation: Neither durable S1 nor does non-durable C1 receive this last message.


      7) Restart durable consumer S2.


      Observation: Durable S2 receives the first 63 messages (those that were in memory). And after that, both S1 and S2 start receiving the remaining (501-63) messages. Also, non-durable C1 receives 438 messages (the 437 that had been paged plus the one that was published in step 6).


      Two of the consumers did not work as expected:
      i) Durable consumer S1 should have received all 500 messages when it was started up in step 5, and it should have received the single message that was published in step 6 . Instead it only received those messages that were in memory.
      ii) Non durable consumer C1 should have only received those messages that were published after it was created. In this case it would be the single message published in step 6. Instead it receives all paged messages + the message published after it was created.


      I suspect that this behavior is related to the way the paging system works. Messages get paged and are not really associated with any JMS destination until they come back into memory. But this design has some unintended consequences. For example, we have already seen that the message count as reported back by the Queue mbean is incorrect (https://jira.jboss.org/browse/HORNETQ-31). Additionally other operations such as the "RemoveMessages" operation on the Queue mbean do not work as expected. You would expect that all messages would be removed, but in reality only those messages that were in memory get removed. You have to keep repeating the operation until you get back a result of 0 messages removed.


      Aspi Engineer
      Putnam Investments

        • 2. Re: JMS messages are getting trapped inside of the HQ system, when paging.
          Aspi Engineer Newbie

          Hi Clebert,


          Thank you for your response.


          I suspect that this issue will be taken up at a higher level between Redhat/JBoss and my company.


          However, the fact that this behavior is documented and expected does not make it optimal. Its my opinion that there are some consequences in the current paging design and these need to be revisited.


          Each one of items 24.6, 24.7, 24.8, 24.9 (http://hornetq.sourceforge.net/docs/hornetq-2.1.0.Final/user-manual/en/html/paging.html#d0e4784) are a limitation which make it very difficult to use HQ in an enterprise environment. Plus you add the other limitation of not being able to accurately report the number of messages on a single JMS queue or topic (durable subscription level count), and some minor ones likes you cannot purge all messages from a JMS destination in a single operation, all make the system less than ideal for a  enterprise deployment.



          Aspi Engineer

          Putnam Investments

          • 3. Re: JMS messages are getting trapped inside of the HQ system, when paging.
            Tim Fox Master

            Paging works as advertised.


            As already discussed, paging occurs *before* messages are routed to destinations, the messages haven't reached the destinations yet, so are not reflected in any queue count. That is correct since the messages are not in the queue.


            Purging also works correctly since the messages are not in the queue. It only deletes the messages in the queue.


            If you don't like paging you can use producer blocking instead.


            Please note that not all "enterprise" messaging systems include something similar to paging, so saying that paging makes HornetQ difficult to use in an enterprise environment is somewhat strong.


            This is a high end feature, use it like it is meant to be used and it will work well for you.

            • 4. Re: JMS messages are getting trapped inside of the HQ system, when paging.
              Tim Fox Master

              AIUI Sun MQ does not have an equivalent of paging, they have two options when a destination is full: Drop message or block producers.


              HornetQ also supports those two options (as well as paging). If you want same behaviour as Sun MQ you can do that.

              • 5. Re: JMS messages are getting trapped inside of the HQ system, when paging.
                Aspi Engineer Newbie

                Before I comment, can you please help me understand what would happen if say I had:

                - a heap of 2G

                - <max-size-bytes> set to 1G

                - 10 destinations that each published messages equal to 1/2 G.

                Would the JMS server run out of memory (10 * 1/2G = 5G > 2G)?


                Or would HQ automatically start paging messages to disk on a as-needed basis?


                - Aspi

                • 6. Re: JMS messages are getting trapped inside of the HQ system, when paging.
                  Tim Fox Master

                  As clebert explained in an earlier post, and as it's also explained in the user manual, paging chapter, paging parameters are *per address".


                  http://hornetq.sourceforge.net/docs/hornetq-2.1.1.Final/user-manual/en/html/paging.html (chapter 24.3)


                  In the case of JMS queues that means per queue, since each JMS queue has a unique address.


                  So if you set the max size of each queue to Y bytes and you have Z queues, then clearly you need to ensure your server has *at least* Y x Z bytes of RAM available for the queues.

                  • 7. Re: JMS messages are getting trapped inside of the HQ system, when paging.
                    Aspi Engineer Newbie

                    I would say that there are two separate features that we are talking about.


                    a) How do you set a max. capacity for a single JMS destination

                    b) How do you ensure that the JMS server which is running with a finite heap size does not run out of memory.


                    A JMS server with a finite heap should be able to manage many-many messages by transparently paging messages out to disk. At no point should messages get stuck nor should messages counts be inaccurate. If you wish to compare with Sun Message Queue, then that is exactly what it does. I can publish many tens of thousands of messages and the JMS server will transparently page messages to disk. It will always report back the correct number of messages for each destination, and at no point have we seen messages getting stuck. Paging is done purely to manage finite memory resources and is completely transparent to the end user. Its not even user configurable. The JMS server just does it on a as-needed basis.


                    You are correct, that Sun MQ has no concept of "paging per destination". That is, there is no feature of being able to define a destination max capacity and configuring the destination to page any messages in excess of the max. But from a memory management perspective, that feature is not needed. I could simply configure the destination to have no max capacity, and the server will automatically page messages to disk on a as-needed basis. Setting max-capacity for a JMS destination is to prevent situations of a producer going into some kind of loop and publishing many millions of messages before any one realizes that we have a problem. Its not about memory management.


                    Given that memory is finite, when using HQ I have to configure the addresses for paging. If I confure the <max-size-bytes> too low, paging occurs and I have the challenges detailed in sec 24 of the user guide.


                    If I configure <max-size-bytes> too high, and if I have many busy producers and many dead/slow consumers, then I face the risk of my server running out of memory.


                    Consider these numbers:
                    We typically run our production JMS server with a 4G heap and we have about 200 destinations. If I were to assume even distribution of data, I essentially have 4G/200 = 20 Meg for each destinations messages. I could configure the address to have a <max-size-bytes> setting of 20 Meg. But for a message size of 50K, 20Meg only equates to 410 messages. Beyond that I either need to page or block the producer. And neither option is ideal.


                    Aspi Engineer

                    • 8. Re: JMS messages are getting trapped inside of the HQ system, when paging.
                      Tim Fox Master

                      I understand your concerns, but I think you're expecting paging to be something that it was never designed to be. Don't confuse it with Sun's feature that drops message bodies from memory when memory is low.


                      Paging is something that happens *before* messages are routed to queues, consequently message counts of queues are not affected, this is quite correct behaviour - messages are not in the queues so they should not be reflected on the message count.


                      Advantages of paging before routing are:


                      1) We only need keep one copy of the message on disk, not one copy per durable subscription - this can make a huge difference in performance and disk usage when there are many durable subscriptions on a topic


                      2) HornetQ paging allows paging of virtually *unlimited* amounts of messages. Since nothing is stored in RAM - not even a stub, the only limit is available disk space. Contrast this with SunMQ where a "stub" remains in memory and the body is paged to disk. This means when the body of the message is very small this feature is next to useless and in all cases there will be some limit to the amount of messages in the queue since the stub and properties will take up RAM too.


                      Paging at the address level has a lot of advantages over the paging you would expect to many of our users. I don't think they would be happy changing it to the way you expect. (Loss of performance etc). I don't think there is any product on the market that can maintain the message throughput with truly unlimited numbers of messages as HornetQ.


                      Having said that, you're requirements are valid, even if not representative of the majority of users so we can should consider your requirements as a feature request, and users can choose via configuration what behaviour they require.


                      Implementing your requirements would be a non trivial task though.

                      • 9. Re: JMS messages are getting trapped inside of the HQ system, when paging.
                        Aspi Engineer Newbie

                        In order to help with any design enhancements that the HQ team may be considering, I would like to propose how we believe message paging should work. This is based on our experiences supporting JMS clients within our enterprise.


                        For a discussion on what changes are being proposed by the HQ team, please refer to "http://community.jboss.org/thread/156674"


                        For reference, our JMS environment consists of (all numbers are approximate and expected to grow):
                        - 300+ individual JMS clients
                        - 300+ individual JMS connections
                        - 400+ individual JMS producers
                        - 400+ individual JMS consumers
                        - 250+ destinations
                            - Mostly topics, but some queues also
                        - Almost exclusive use of JMS durable subscribers
                            - Number of durable subscribers per topic ranges from 1 - 15
                        - Each producer/consumer/subscriber has its own up and down time. So there is no saying when a client may come up or shut down
                        - Its perfectly possible for any client to be down for an extended period of time. For example, durable A may be down for 24 hours, but during that same time period, other durables on the same JMS topic would be up and running.


                        Max Bytes in Memory Versus Max. Queue/Topic Depth:
                        There needs to be a distinction between "max-size-bytes" (the max number of bytes that HQ will keep in memory before paging occurs), and maximum limits on a queue/topic depth.


                        All queues/topics should have a user configurable max limit - in terms of number or messages and/or bytes. Once this limit is reached, HQ can apply its limit policy ("address full policy"). One primary reason a limit is required is to prevent a producer from looping and publishing many, many messages before the problem is detected. You could run out of disk space and the whole JMS server would effectively crash. Typically this limit will be quite high to support extended time periods when we have a busy producer and slow or dead consumers.


                        On the other hand, "max-size-bytes" needs to be low to prevent the server from running out of memory. If its too high, and we have many busy producers and many dead/slow consumers, then the server could run out of memory.


                        Feature Request:
                        a) Paging should not skew message counts:
                        At any time, it should be possible to determine the accurate and total number of messages on a single JMS queue/topic/durable subscription. Irrespective of messages being paged or not.


                        b) Paging should not hold up message delivery
                        At no point should message delivery be suspended to an active durable subscriber just because there is another durable subscriber with a large number of pending messages.


                        c) Paging should (preferably) be transparent
                        The user should not have to configure, either on a per-address or a per queue/topic level, how many bytes the server should keep in memory. The server should page to disk messages on a as needed basis. If memory is running low, the server can page the message body and keep a handle to the message in memory. If memory is available, the server can keep the whole message in memory.
                        With the current design of configuring the "max-size-bytes" on a per address level, you could very well end up with a situation where a address is considered "full", even though the server on a whole has free memory available.


                        Its very likely that to support some of these requests, the HQ server will need to keep in memory a small handle to each and every unconsumed message. Possibly duplicated for each and every durable subscriber to a JMS topic. So if a single topic has 10 subscribers, each with 1 pending message, then maybe the HQ server needs to maintain 10 message handles in memory. This may seem like a cause for concern, but the numbers indicate otherwise. Consider a HQ server with a 2G heap. Lets assume that 1/2 G is reserved for proper server operation, leaving 1.5 G for message management. If you consider a message handle to be 128 bytes, then the server can support 1.5 G/128 Bytes = 12.5 million unconsumed messages. That is a lot. If you add more memory, or reduce the handle size, you have much more capacity.


                        What this means is that for a given heap size, HQ has a theoretical upper limit to the number of messages the system can hold at any point time. Beyond these theoretical limits, its possible that the HQ server will run out of memory. But that is fine - all resources have limits - memory as well as disk space.