3 Replies Latest reply on Jul 18, 2013 10:18 AM by jbertram

    Load balancing - communicating vessels policy

    lowang

      I've 3 nodes clustered, each node has its own message processors and standard round robin insertion works fine most of the time, but I've problem when 1 node lags behind

      Eg:

       

      node1: 500k messages

      node2: 0 messages

      node2: 0 messages

       

      In such scenario node2,3 won't help processing messages from node1 because there are workers connected. Even worse when messages hit node2,3 they can be distributed to node1 further enlarging message pile.

      Is there any way built in into HornetQ to force redistribution messages, sth like "communicating vessels" so balancing will always try to have equal number of messages in queues on each node?

        • 1. Re: Load balancing - communicating vessels policy
          jbertram

          The first thing to do is ensure your <redistribution-delay> is small enough (e.g. 0) to ensure quick message redistribution.  Then you should make sure any consumers which are routinely slower than the other uses a smaller consumer-window-size than the others so they buffer fewer messages.

          • 2. Re: Load balancing - communicating vessels policy
            lowang

            already got those settings, they don't help balance the load after messages was loaded to queues.

            AFAIK redistribution-delay can only help when no consumers are connected, however in my case all consumers are present all the time, they're just slow.

            I can think about simple cron script that can redistribute messages across all nodes to keep queues about same size until they're empty

            but this seems counter intuitive to do it manually. I would expect such behaviour to be already included in Hornet, but if its not I can go for a cron job without any further hesitation.

            • 3. Re: Load balancing - communicating vessels policy
              jbertram

              Assuming your slow consumers are arbitrary then they should be randomly distributed across the cluster (assuming you're using the default load-balancing-policy).  If they are all clumped up on a single server then this problem could definitely occur.  At this point I see 2 options:

               

              1. Make sure your slow consumers are distributed across the cluster and not clumped up on one node.  Like I said, I would expect that to happen anyway, but the way you are connecting may prevent that.
              2. If your consumers are really slow then you could close the consumer while the message is being processed.  That would allow redistribution to happen.