1 2 Previous Next 23 Replies Latest reply on Nov 30, 2005 4:59 PM by timfox

    JbossMessaging message store - spilling messages to disk.

    timfox

      I've been thinking about how best to implement the spillover of messages onto disk in JBoss Messaging. My initial thoughts are pretty unstructured and go long the following lines:

      AFAICT the thing that we really need to spillover onto disk when memory gets tight is the "payload" of the message. (The body of the message in JMS speak). The headers and properties are useful to the message reference (the actual thing we pass around in the server). In fact it seems to me this payload can be thought of completely opaque from the moment it leaves the producer until it reaches the consumer.

      If we can make that assumption, this might make our life a lot easier. This means we can convert (serialize) the payload into a byte array before we send it from the consumer, and it can stay as a byte[] until the moment (if ever) it actually gets delivered to the consumer.

      If the message cache is just storing byte arrays, then it is a trivial matter to calculate how much memory is being used. We don't have to worry about doing some kind of sizeof, or mess around with soft references and all the pain that brings in order to manage our memory.

      So, in effect, the message cache effectively becomes one big byte[] which we can allocate at start up, and then store the message payload byte[] in that at a particular offset.

      When used memory passes some kind of high water mark, we can page a chunk of the array onto disk. Perhaps using memory mapped files (java.nio.FileChannel) for high performance.

      Another advantage, when messages are store in the db, or when sent to the consumer, having the payload as a byte[] should also lessen the serialization hit.

        • 1. Re: JbossMessaging message store - spilling messages to disk
          genman


          A couple thoughts...

          You don't want to try to reinvent "malloc" and OS-style paging by having a single array for all message payloads.

          You also lose some possible optimizations passing things around in the same JVM as a byte[] when you don't need to make a deep copy, e.g. TextMessage, since the body is an immutable object.

          I also think you need to eventually design for paging out entire bodies, since you should be able to lazy-load extremely large queues at start-up (think 2 million messages.)

          • 2. Re: JbossMessaging message store - spilling messages to disk

             

            "timfox" wrote:
            We don't have to worry about doing some kind of sizeof, or mess around with


            http://java.sun.com/j2se/1.5.0/docs/api/java/lang/instrument/Instrumentation.html#getObjectSize(java.lang.Object)

            • 3. Re: JbossMessaging message store - spilling messages to disk

               

              "genman" wrote:

              You don't want to try to reinvent "malloc" and OS-style paging by having a single array for all message payloads.


              Agreed. There is zero chance you can do better than the GC using java code.
              You don't have the information.

              • 4. Re: JbossMessaging message store - spilling messages to disk

                 

                "timfox" wrote:
                Perhaps using memory mapped files (java.nio.FileChannel) for high performance.


                And high resource usage!

                Forget "performance". This is messaging.

                These should be your priorities:

                1) Stability - does it stay up rather than crash?
                2) Reliability - is there a single point of failure, does it lose messages?
                3) Scalability - how many messages, senders, receivers can you cope with before
                you have introduce new boxes, how much cpu/memory/disk does it use?
                4) Throughput - what primitives can you put in place to keep the receivers busy and not block senders too much
                5) Performance - how long does it take send an individual message from end to end

                There are always tradeoffs between these issues and some people will
                want configuration to change the default priorites I have given above :-)

                • 5. Re: JbossMessaging message store - spilling messages to disk

                   

                  "genman" wrote:

                  I also think you need to eventually design for paging out entire bodies, since you should be able to lazy-load extremely large queues at start-up (think 2 million messages.)


                  I prefer to think of "paging queues".
                  My solution would be to use a marker (dummy message reference)
                  in the queue to represent where it should
                  go the database to get the next messages.

                  With the number of real messages in the queue configurable and dependent
                  upon whether there is activity and what throughput there is.

                  Queue:
                  1) Message 45
                  2) Message 46
                  ...
                  100) Message 144
                  101) Marker to get starting from 145 from the persistent store
                  


                  Idle (but not empty) Queue:
                  1) Marker to get starting from 212 from the persistent store
                  


                  • 6. Re: JbossMessaging message store - spilling messages to disk

                     

                    "timfox" wrote:

                    Another advantage, when messages are store in the db, or when sent to the consumer, having the payload as a byte[] should also lessen the serialization hit.


                    If you keep the payload as a byte array, there is no serialization hit on the server.
                    serializaton = object <-> byte[]


                    • 7. Re: JbossMessaging message store - spilling messages to disk
                      timfox

                       

                      "adrian@jboss.org" wrote:
                      "genman" wrote:

                      You don't want to try to reinvent "malloc" and OS-style paging by having a single array for all message payloads.


                      Agreed. There is zero chance you can do better than the GC using java code.
                      You don't have the information.


                      Don't worry, I'm certainly not intending to "reinvent malloc" or do my own gc. I agree that would be stupid.

                      • 8. Re: JbossMessaging message store - spilling messages to disk
                        timfox

                         

                        "adrian@jboss.org" wrote:
                        "timfox" wrote:

                        Another advantage, when messages are store in the db, or when sent to the consumer, having the payload as a byte[] should also lessen the serialization hit.


                        If you keep the payload as a byte array, there is no serialization hit on the server.
                        serializaton = object <-> byte[]


                        Exactly my point :)

                        • 9. Re: JbossMessaging message store - spilling messages to disk
                          timfox

                           

                          "adrian@jboss.org" wrote:
                          "genman" wrote:

                          I also think you need to eventually design for paging out entire bodies, since you should be able to lazy-load extremely large queues at start-up (think 2 million messages.)


                          I prefer to think of "paging queues".
                          My solution would be to use a marker (dummy message reference)
                          in the queue to represent where it should
                          go the database to get the next messages.

                          With the number of real messages in the queue configurable and dependent
                          upon whether there is activity and what throughput there is.

                          Queue:
                          1) Message 45
                          2) Message 46
                          ...
                          100) Message 144
                          101) Marker to get starting from 145 from the persistent store
                          


                          Idle (but not empty) Queue:
                          1) Marker to get starting from 212 from the persistent store
                          


                          In JBossMessaging we have a single message store and potentially multiple queues/durable subscriptions which contain message references that reference the messages in the messages store.

                          What I'm talking about here is purely the loading/paging of messages from the message store to disk (or wherever), I'm not talking about the lazy loading/paging of message references which I agree we also have to tackle at some point if we need to support queues/subs with millions of messages.

                          Right now these are treated as separate tasks in JIRA.

                          • 10. Re: JbossMessaging message store - spilling messages to disk
                            alexfu.novell

                            Correct me if I'm wrong.

                            The InMemoryMessageStore serves as:
                            (1) Cache of PersistentMessageStore (for fast access);
                            (2) non-reliable message store.

                            Can we use JBoss TreeCache to achieve the above functionalities? Then we don't need to worry about memory management.
                            I think TreeCache has file system based backup storage.

                            • 11. Re: JbossMessaging message store - spilling messages to disk
                              timfox

                              InMemoryMessageStore is a MessageStore implementation that only caches messages in memory.

                              PersistentMessageStore is a MessageStore implementation that also writes the messages (and retrieves them) from the db.

                              I'm not sure how TreeCache would help us. Our data is not tree-like in structure.

                              One of the key things that needs to be done is the spillover of messages onto disk when memory gets tight. This is the hard bit.

                              I don't believe TreeCache will help us in this area.

                              • 12. Re: JbossMessaging message store - spilling messages to disk
                                alexfu.novell

                                TreeCache has FileCacheLoader which can be configured to do passivation so evicted messages will go to file system. They can be reloaded into memory when accessed later.

                                And I don't think we need to worry about the "tree" structure.

                                • 13. Re: JbossMessaging message store - spilling messages to disk

                                  TreeCache is a possible way to implement the MessageStore.

                                  The whole point of having an interface is that you can implement it using
                                  whatever technology you like.
                                  If you write the implementation, you must follow the contract of the interface.

                                  So

                                  > /dev/null
                                  would not be appropriate. :-)

                                  • 14. Re: JbossMessaging message store - spilling messages to disk
                                    timfox

                                    The code that stores and retrieves the messages is already in place. (JDBCPersistenceManager)

                                    The bit that isn't done is the code that works out if memory is low/high and triggers the load/store code.

                                    The questions we need to ask are:

                                    Do we use SoftReferences a la JBossMQ, do we use the new Java 5 instrumentation stuff that Adrian pointed out, or do we cache byte[] so it is trivial to work out used memory.

                                    Also we need to determine how to prioritise messages for eviction - do we use LRU or some other policy.





                                    1 2 Previous Next