10 Replies Latest reply on Mar 15, 2011 12:46 PM by joe.developer

    Hornetq Evaluation

    joe.developer

      Hi

       

      I'm busy evaluating HornetQ for use as part of a high throughput publish-subscribe system and have a few questions:

       

      1. (Most important issue for our requirements) If I use HornetQ for publish-subscribe how are slow readers/consumers handled? Section 19.1.1 of your user guide states that slow readers can hold up a queue. I'm developing a system with high throughput, approx 10k messages a second. If one of the subscribers to the system is reading messages slowly I don't want the rest of the subscribers to be impacted. To be clear each subscriber does not "remove" items from the queue, each message is published to all interested subscribers. If a subscriber is reading too slowly the requirement is that that subscriber gets booted.

       

      2. What algorithm is used for matching messages to subscriptions? (link to the source would be great).

      The following may be of interest:

      http://www.rabbitmq.com/blog/2010/09/14/very-fast-and-scalable-topic-routing-part-1/

       

      Thanks

        • 1. Hornetq Evaluation
          clebert.suconic

          1. That's fixed on the next version

           

           

          2. We distributed the messages on topics through PostOfficeImpl::route.

          1 of 1 people found this helpful
          • 2. Hornetq Evaluation
            joe.developer

            Thanks for the reply Clebert!

             

            I have a few more questions

             

            a) When do you expect 1) to be fixed (which version number and when is the the intended release date?)

             

            b) I notice your matcher uses java regex and your matching code is in Matcher.java. It would be interesting to see how the performance of this regex approach compares to:

            http://www.rabbitmq.com/blog/2010/09/14/very-fast-and-scalable-topic-routing-part-1/ (they seem to have some good ideas, but would prefer a java solution as I can imbed it).

             

            c) If I understand your documentation correctly a subscription is represented by a queue and only one pattern can be associated with a single queue. Perhaps I misread it, but I need each subscriber to be represented by a queue which can match multiple patterns e.g. a.*.b AND f.c.*d. Each subscriber needs to subscribe to multiple data items.

             

            d) Does HornetQ support parellism in cases where reordering doesn't matter e.g. if I have data.x.abc and data.y.abc (I would like all data containing x to remain ordered with respect to other tuples with x (or any specific ID really) whereas I done care if data items identified by 'y' are reordered with respect to those identified by 'x'.

             

            Thanks

            • 3. Hornetq Evaluation
              clebert.suconic

              a) We are closing the latest issues before we can put 2.2 out

               

               

              c) Look at chapter 12 and 13. The wildcards are at the producer's side.

               

              d) The ordering is defined by the producer. We keep ordering of a single producer. If you have multiple producers they will go in parallel to the queue. Is that what you're asking?

              • 4. Hornetq Evaluation
                timfox

                HornetQ *does not* use regexp for matching messages to subscriptions.

                • 5. Hornetq Evaluation
                  joe.developer

                  Hi Tim

                   

                  Perhaps I'm looking at the wrong class, but Match.java appears to do regex matching for the a.*.c.d.# syntax used for subscriptions (see below). If I'm wrong could you please point me to the class and line number where the matching algorithm is actually implemented, would appreciate it.

                   

                  package org.hornetq.core.settings.impl;

                   

                  import java.util.regex.Pattern;

                   

                  /**

                      a Match is the holder for the match string and the object to hold against it.

                  */

                  public class Match<T>

                  {

                     public static String WORD_WILDCARD = "*";

                   

                     private static String WORD_WILDCARD_REPLACEMENT = "[^.]+";

                   

                     public static String WILDCARD = "#";

                   

                     private static String WILDCARD_REPLACEMENT = ".+";

                   

                     private static final String DOT = ".";

                   

                     private static final String DOT_REPLACEMENT = "\\.";

                  • 6. Hornetq Evaluation
                    timfox

                    Yes, but this is only evaluated once for a particular address, not every time a message is routed.

                     

                    At routing time the hit is basically just a String lookup in a map, IIRC.

                     

                    One of the HQ team should be able to explain in more detail.

                    1 of 1 people found this helpful
                    • 7. Hornetq Evaluation
                      clebert.suconic

                      +1

                       

                       

                      Look at BindingsImpl::route

                      • 8. Re: Hornetq Evaluation
                        joe.developer

                        Thanks for the replies guys, I was able to identify BindingsImpl as the code which uses Match.java.

                         

                        I like the overall idea of using regex and caching the matches so that each message doesn't have to result in a regex match. My only concern is for use cases like ours where subscribers come and go and change their subscriptions fairly often (where a subscription maps onto a queue in hornetq terminology). Whenever there is a new subscription, or a subscrition changes BindingsImpl clears the cache, so if there are fairly frequent changes to the subscriptions all lookups will have to go through all the regexes to check for matches (which isn't crazy expensive). This is particularly painful if messages are posted to many different/unique topics and there are dozens of subscriptions. It would be great if you had a mitigation strategy to reduce this effect.

                         

                        In terms of my question regarding parallel processing I have the following...

                        A single producer is producing a stream of events, certain events can be reordered based on an id associated with each event. For example a1, a2 and a3 must remain in order with respect to each other, similarly b1, b2, b3 must remain in order, but it doesn't matter if events from the "a" set arrive before events from the "b" set or vice versa, the ordering between a and b doesn't matter. This can be exploited to provide publication of these messages in parallel to subsribers using multiple threads, only preserving ordering where it matters, leading to higher throughput.

                         

                        The ability to boot slow readers is a core requirement to ensure the throughput of our application, so we won't be able to consider HQ till at least the next release. Having said that I do believe that HornetQ is pretty much the queue to beat at the moment. All indications are that RabbitMQ is unsuitable for production use due to instability, particularly under memory pressure (see reddit discussions about their problems using rabbitmq, among other sources). One suggestion I would make is that you include more figures in your documentation and on your website regarding the different messaging models supported, for example see the numbered diagrams with examples on the left of this page:

                        http://www.rabbitmq.com/tutorials/tutorial-one-python.html

                         

                        Thanks again

                        • 9. Re: Hornetq Evaluation
                          ataylor

                          In terms of my question regarding parallel processing I have the following...

                          A single producer is producing a stream of events, certain events can be reordered based on an id associated with each event. For example a1, a2 and a3 must remain in order with respect to each other, similarly b1, b2, b3 must remain in order, but it doesn't matter if events from the "a" set arrive before events from the "b" set or vice versa, the ordering between a and b doesn't matter. This can be exploited to provide publication of these messages in parallel to subsribers using multiple threads, only preserving ordering where it matters, leading to higher throughput.

                          I'm not sure i understand this properly, when a producer sends a message it writes it straight to the channel i'm not sure where any re ordering could take place, could you be more explicit please?

                           

                          The ability to boot slow readers is a core requirement to ensure the throughput of our application, so we won't be able to consider HQ till at least the next release. Having said that I do believe that HornetQ is pretty much the queue to beat at the moment. All indications are that RabbitMQ is unsuitable for production use due to instability, particularly under memory pressure (see reddit discussions about their problems using rabbitmq, among other sources). One suggestion I would make is that you include more figures in your documentation and on your website regarding the different messaging models supported, for example see the numbered diagrams with examples on the left of this page:

                          http://www.rabbitmq.com/tutorials/tutorial-one-python.html

                          If i ever get time i may try and add some

                          • 10. Re: Hornetq Evaluation
                            joe.developer

                            Andy Taylor wrote:

                             

                            In terms of my question regarding parallel processing I have the following...

                            A single producer is producing a stream of events, certain events can be reordered based on an id associated with each event. For example a1, a2 and a3 must remain in order with respect to each other, similarly b1, b2, b3 must remain in order, but it doesn't matter if events from the "a" set arrive before events from the "b" set or vice versa, the ordering between a and b doesn't matter. This can be exploited to provide publication of these messages in parallel to subsribers using multiple threads, only preserving ordering where it matters, leading to higher throughput.

                            I'm not sure i understand this properly, when a producer sends a message it writes it straight to the channel i'm not sure where any re ordering could take place, could you be more explicit please?

                             

                            I haven't had a chance to check the exact flow which HornetQ uses to process messages, so basing things on a few assumptions here. Basically a producer writes messages to a queue. When delivering those messages to subscribers a pool of threads could be used to feed off this queue in such a way that messages which can be reordered can be pulled off that queue in parallel and posted to the (remote) subscribing application. Doing this in parallel may be useful in cases where regex matching is required if matching in parallel is supported for events where ordering rules are relaxed.