8 Replies Latest reply on May 21, 2015 5:41 PM by tony36

    hornetq cluster - how to get server side message load balancing metrics?

    tony36

      Hornet collects metrics in JMX beans for messages that arrive/leave queues.

       

      Eg, MessagesAdded increments every time a message arrives, and MessageCount has the current number of messages in the queue.

       

      But what is it *really* counting when we have a cluster? Each cluster member has its own mbean, so presumably each member tracks its own data. Eg, MessagesAdded could be one of:

       

      - A message that directly arrives at that queue from a client.

      - A message that directly arrives at that queue from a client plus any messages that arrive at that queue from other members of the cluster (due to load-balancing).

       

      And actually, what i'm really interested in, is the difference between the two, ie, i'd like to look at counters such as:

       

      - How many messages arrived at queue X on instance I1 directly from clients?

      - How many messages arrived at queue X on instance I1 from other members of the cluster?

       

      Ie, i want to understand how much load-balancing is going on vs messages arriving directly.

        • 1. Re: hornetq cluster - how to get server side message load balancing metrics?
          jbertram

          I don't believe any such statistics are available.

          • 2. Re: hornetq cluster - how to get server side message load balancing metrics?
            tony36

            Ok, so the stats i'm looking for can't be broken down, but as far as the current jmx mbean values, what do they mean? Eg, MessagesAdded could mean:

             

            - A message that directly arrives at that queue from a client.

            OR

            - A message that directly arrives at that queue from a client plus any messages that arrive at that queue from other members of the cluster (due to load-balancing).

             

             

            Note that if the latter, and i were to count MessagesAdded cluster-wide, messages would be double counted if they were getting load-balanced.

            • 3. Re: hornetq cluster - how to get server side message load balancing metrics?
              jbertram

              As I understand it, the MessagesAdded simply counts messages added to the queue no matter where they come from.  It shouldn't matter if the message came directly from a client or indirectly from a client through another node in the cluster.

               

              However, if you were to add up all the MessagesAdded cluster-wide you would not get a double count.  That's because the messages are load-balanced before they actually land on the queue on the server doing the load-balancing.

              • 4. Re: hornetq cluster - how to get server side message load balancing metrics?
                tony36

                Ok, that makes sense.

                 

                Here's the gist of the problem i'm trying to figure out. I have a cluster of 4 instances with a few queues. I have clients that write to the queues, and clients that read from the queues. All producers explicitly round-robin across all 4 instances, and consumers explicitly listen to all 4 instances.

                 

                For all of the queues except one, i see even counts for MessageAdded. Eg, if i see 100 on instance I1, i see 100 on instance I2, 100 on instance I3, etc.

                 

                However, for one queue, i see MessagesAdded=0 for all of the instances except one. Ie, it looks like a single instance is receiving all of the messages for one particular queue.

                 

                I verified the following using tcpdump:

                 

                - The producers write to all 4 instances.

                - The consumers receive data from the problem queue only from the single instance that is receiving all of the messages.

                - Looking at the traffic between cluster members (via tcpdump), i do see all members send the member in question these msgs. Ie, it looks like "server load balancing" is kicking in.

                 

                So the fact that we see only one instance with MessagesAdded=N, and all others with MessagesAdded=0, jives with the tcpdump output.

                 

                But of course, it's odd because things aren't really balanced; the opposite in fact, one instance is handling all of the messages for one particular queue.

                 

                The only thing different about this queue is the producer is using stomp to publish the message AND persistent=true. We do use stomp for another queue, with persistent=false, and that queue is fine (evenly distributed messages).

                 

                So what could explain this behavior?

                • 5. Re: hornetq cluster - how to get server side message load balancing metrics?
                  jbertram

                  One possible explanation would be that your cluster connection's "address" value doesn't match the address for which you see no load-balancing.  I can't think of any other reasons off the top of my head.

                   

                  Can you narrow this down to a simple use-case (e.g. 2-node cluster with one address that doesn't load-balance)?

                  • 6. Re: hornetq cluster - how to get server side message load balancing metrics?
                    tony36

                    The other thing to note is that the single member that receives all of the messages changes from run to run, so it's not a config thing.

                     

                    We run a pretty massive load test where we observe this behavior, and everything gets restarted at the beginning. I will try to repro in a much simpler way.

                     

                    Thanks.

                    • 7. Re: hornetq cluster - how to get server side message load balancing metrics?
                      jbertram

                      The other thing to note is that the single member that receives all of the messages changes from run to run, so it's not a config thing.

                      I don't think that's necessarily true.

                       

                      I will try to repro in a much simpler way.

                      Ok.

                      • 8. Re: hornetq cluster - how to get server side message load balancing metrics?
                        tony36

                        Ok, i finally got to the bottom of things (sort of).

                         

                        The issue is that on the consuming side, i create a HornetQConnectionFactory *per* server instance in the cluster. Eg, with the 4 instances, i'd have 4 HornetQConnectionFactory instances, each configured to point to a different hornet server.

                         

                        What i was doing was sharing the List<ConnectionFactory> across queues. The consumer "initializes" the queues in order. By "initialize" it basically means establish the connections to all of the servers. The first queue was fine; the consumer had one connection to each of the 4 instances.

                         

                        However, when init got to the 2nd queue, all of the connections established were to a single server instance.

                         

                        What's clearly happening is that the ConnectionFactory is finding out that there are 4 members of the cluster, and somehow it's deciding to pick one of the cluster members rather than the one i specified in the config. Since there are 4 different ConnectionFactory instances, presumably the same algo kicks in, and that's why the same server instance gets picked.

                         

                        I can fix this by using a different List<ConnectionFactory> per queue, but now i'm worried about future disconnects/reconnects causing an imbalance. The whole point of the ConnectionFactory/server-instance is to be in control.

                         

                        SO, is there any way to instantiate the HornetQConnectionFactory so that it honors my config all the time, and does not use the the cluster info that it "finds out"?

                         

                        Thx.