9 Replies Latest reply on Aug 27, 2009 6:06 PM by clebert.suconic

    Fast persistence, EIP and other internals

      Hi,
      HornetQ sounds very interesting, I plan on testing it in the next day or two. I have several questions about the internals for HornetQ

      1. The fast persistence mechanism seems to be fairly well integrated with the HornetQ product. It sounds like it would be a great stand-alone product. Any plan to let us create a jar for the persistence code? I have a great need for very fast persistence for my own code.
      2. Something similar to Apache Camel (http://camel.apache.org/)? From what I understand, Apache Camel is not very fast (not ideal for low latency stuff). Are there any plans to integrate such "Enterprise Integration Patterns?"
      3. End-points. It looks like STOMP, JMS, AMQP are either supported or in the pipeline. What if I wanted use my own protocol. Say I wanted to leverage the robustness and speed of HornetQ as a layer below a FIX protocol implementation (protocol for stock trading software).

      Ideally, I'd like to be able to write some custom protocol logic in Netty. Easily embed message transformers, filters, routers, etc., in the server, rely on HornetQ's ability to persist data very quickly (and be able to recover from crashes)...you get the idea.

      If there are architecture or design documents (for developers who wish to grok the source code), they will be very helpful.

      Thanks

        • 1. Re: Fast persistence, EIP and other internals
          clebert.suconic

           

          It sounds like it would be a great stand-alone product.


          We thought about separating the Journal as another project.


          There is an example on how to use the Journal:

          http://anonsvn.jboss.org/repos/hornetq/trunk/tests/src/org/hornetq/tests/util/JournalExample.java

          The journal only has methods to recover the data during what we call loading time. It's not a database where you can recover data using primary keys or anything like that.

          For HornetQ, we need to guarantee persistence, and we need to be able to reload the data in case of a restart. So.. it' s a little bit different from a database point of view.

          • 2. Re: Fast persistence, EIP and other internals

            >>The journal only has methods to recover the data during what we call loading time. It's not a database where you can recover data using primary keys or anything like that.

            That is basically the essence of my requirement. I think there are many projects that could benefit from this.

            • 3. Re: Fast persistence, EIP and other internals
              lorban

              Have you checked http://howl.ow2.org/ which is the same kind of journal you have in hornetq?

              • 4. Re: Fast persistence, EIP and other internals
                timfox

                Funnily - Jeff Mesnil one of our core developers used to be the lead of JOTM IIRC.

                Anyway, we're confident our journal is the best possible and unique in our usage of AIO.

                • 5. Re: Fast persistence, EIP and other internals
                  clebert.suconic

                  I just made a blog post explaining some of the internals:

                  http://hornetq.blogspot.com/2009/08/persistence-on-hornetq.html

                  • 6. Re: Fast persistence, EIP and other internals

                    lorban,
                    http://how.ow2.org looks interesting. There is lots of talk of objects, web and logging, which is probably why I would not have expected it to do what I need.

                    So far I have been looking at ways of writing and updating data very fast (BDB, Tokyo Cabinet, etc.), but didn't realize that I may actually need what is apparently called a journaling system. The problem I am currently trying to solve is to recover state after a crash...sharing/distributing data is really a different problem.

                    Clebert,
                    Interesting blog post, can't say I understand all of it. Are there any performance numbers available? If I'm writing a transaction engine, I'd like to get a ball park figure of how many messages I can "crash-proof" per second...given that databases generally hover between 100 inserts/second and 400 inserts/second (I realize that these are not at all precise and depend on many factors).

                    Secondly, I understand why avoiding random-access into a file will improve performance, but what about when several processes are writing to disk. For example, process A writes lots of logs to a file and may move the disk head all over the place. At the same time, process B is only doing sequential writes and avoiding random-access. Doesn't this scenario mean that process B can't assume that the disk head is where it last left it?

                    • 7. Re: Fast persistence, EIP and other internals
                      clebert.suconic

                      Clebert,

                      Interesting blog post, can't say I understand all of it. Are there any performance numbers available? If I'm writing a transaction engine, I'd like to get a ball park figure of how many messages I can "crash-proof" per second...given that databases generally hover between 100 inserts/second and 400 inserts/second (I realize that these are not at all precise and depend on many factors).



                      I can do about 30K inserts (1024 bytes records) per second on my laptop, about 80K / second on my scsi disk, and about 100K records / second in our perf lab.

                      Secondly, I understand why avoiding random-access into a file will improve performance, but what about when several processes are writing to disk. For example, process A writes lots of logs to a file and may move the disk head all over the place. At the same time, process B is only doing sequential writes and avoiding random-access. Doesn't this scenario mean that process B can't assume that the disk head is where it last left it?


                      If you share your disk with other process.. .sure.. a resource is shared. But users will have the choice to give us dedicated resouces. (Just like some DBAs will give dedicated disks to databases).


                      The data structure allows us to write data with minimal head movement. If other processes are using the disk for other reasons.. fine.. but at least our process is optimized for that.

                      • 8. Re: Fast persistence, EIP and other internals
                        clebert.suconic

                         

                        I can do about 30K inserts (1024 bytes records) per second on my laptop, about 80K / second on my scsi disk, and about 100K records / second in our perf lab.


                        This is not an official benchmark. And those number refer to a testcase writing to the journal directly.


                        I have been able to maximize the disk write capacity when using the journal directly.

                        • 9. Re: Fast persistence, EIP and other internals
                          clebert.suconic

                           

                          Clebert,
                          Interesting blog post, can't say I understand all of it.


                          BTW: Why you don't join us on IRC @ irc://freenode.net:6667/hornetq

                          My timezone is US Central (Texas).