5 Replies Latest reply on Jun 19, 2012 1:27 PM by craigm123

    Advantages and Disadvantages of a Clustered Topology

    craigm123

      I'm debating between a clustered hornetq topology vs. writing some client software that can attach to many standalone, non-clustered hornetq instances.  I'd like to see if others have had the same debate and see what conclusion they came to.

       

      To start, here are some requirements I have:

      - Many producers being able to load balance messages across many hornetq nodes

      - Many consumers being able to consume messages off of many hornetq nodes

      - The ability to manage which hornetq servers a producer or consumer is connected dynamically (ie. the ability to take hornetq servers in and out of the load, much like a load balancer)

      - Queues and topics that can page to disk when messages can not be consumed fast enough, but cannot page itself to death (ie. run out of disk space or memory)

      - Works on inexpensive disks (no SAN or SSD disks)

      - Handles 20k+ 1KB messages per second

       

      I've been using a clustered topology for about 2 months now and have had mixed results.  The hornetq cluster consists of two nodes that have static connectors to each other.  Each node has a 10GB ramdisk provisioned on it for journals and a SAS drive for paging (journalling on disk was not fast enough).  Each of the client applications use Spring's JmsTemplate and DefaultMessageListenerContainer, along with a CachingConnectionFactory to get around some of the anti-patterns introduced by JmsTemplate.  Because we use the CachingConnectionFactory, each client only makes one connection to the hornetq cluster (ie. to one of the two nodes, but not both).  This means that all traffic from a particular client is not load balanced across available nodes, but instead sent entirely to one node. 

       

      We push 20k messages per second through the cluster and it seems to be handling it quite well, but about once a week, we see an OOM exception, which seems to be related to paging and dead client connections (see https://community.jboss.org/thread/200960 for more details).  I've been able to artificially reproduce the problem in our test environment by killing the second cluster member while a large amount paging is going on, which leads me to believe the combination of paging, plus the core bridge between the hornetq nodes is causing the issue (please feel free to debate this with me on the other thread).  I, however, was not able to reproduce the problem under similar load on just one hornetq standalone node.

       

      Has anyone debated over this too?  What do you think of using and managing a set of standalone message brokers vs. using a clustered topology?

       

      Thanks,

      Craig

        • 1. Re: Advantages and Disadvantages of a Clustered Topology
          gaohoward

          Re: This means that all traffic from a particular client is not load balanced across available nodes, but instead sent entirely to one node.

           

          If your two nodes are clustered, the messages will be redistributed between the 2 nodes even if you send all messages to one node. There is a 'clustered-queue' example that just demonstrates this.

          • 2. Re: Advantages and Disadvantages of a Clustered Topology
            ataylor

            If you do this yourself you won't get

             

            1) message distribution and redistribution when no consumers on node

            2) session/producer/consumer recreation or reattach

            3) connection loadbalancing

            4) Failover (if you decide to have backup servers)

             

            amongst other things, Im not sure there are any advantages of doing it yourself, you would just be implementing the same thing we have given you for free.

            • 3. Re: Advantages and Disadvantages of a Clustered Topology
              craigm123

              Thanks for the responses.

               

              Yes, I agree message redistibution is not possible without a clustered setup.  This is very useful if you have clients that are connected only to one node in a cluster.  However, if each listening client is connected to every standalone node and your sending client is able to load balance across each standalone node, re-distribution isn't really necessary.

               

              I'm not sure I quite understand why session/producer/consumer recreation on a clustered setup is any different from a standalone.  Client reattachment should happen on both a clustered and standalone setup, right?

               

              I've read about the client side loadbalancing and I understand it does round robin or random connections.  Is there a way to guarentee that the connections created are evening distibuted over all available nodes?  For example, if I use the round robin policy, have two hornetq nodes, and create two connections it will be evening distibuted.  However, if I take one node away and still have two connections, both connections will be forced to be on one node.  If I add that node back to the cluster without forcing re-connect on the clients, will the connections automatically get redistributed?

               

              I'm not really considering failover part of a clustered topology since it's an active/passive setup and doesn't distribute load.

               

              Some advantages I see of having the client application with many connections to several standalone nodes are -

              - Guarentee that all connections are evenly distributed across all nodes (assuming I have some logic for looking up what nodes are available)

              - Get's me around the OOM issue I'm experiencing (again, if you see I'm doing something incorrect with that setup, please repond to that thread)

              - Ability to control draining, then shutting down a node without affecting the rest of the cluster

               

              Craig

              • 4. Re: Advantages and Disadvantages of a Clustered Topology
                ataylor

                Yes, I agree message redistibution is not possible without a clustered setup.  This is very useful if you have clients that are connected only to one node in a cluster.  However, if each listening client is connected to every standalone node and your sending client is able to load balance across each standalone node, re-distribution isn't really necessary.

                Yes, but then you have to manage that, if you have n nodes then you need to make sure you have n clients. Also you need to know where the nodes are. Why implement this when we give it to you for free.

                 

                I'm not sure I quite understand why session/producer/consumer recreation on a clustered setup is any different from a standalone.  Client reattachment should happen on both a clustered and standalone setup, right?

                reattachment will work but as the name suggests only on the same node, failing over to a new node and recreating the session state will only hapen in a clustered env where you have backup servers configured.

                I've read about the client side loadbalancing and I understand it does round robin or random connections.  Is there a way to guarentee that the connections created are evening distibuted over all available nodes?  For example, if I use the round robin policy, have two hornetq nodes, and create two connections it will be evening distibuted.  However, if I take one node away and still have two connections, both connections will be forced to be on one node.  If I add that node back to the cluster without forcing re-connect on the clients, will the connections automatically get redistributed?

                Once a connection is created you cannot change it unless you close it.

                I'm not really considering failover part of a clustered topology since it's an active/passive setup and doesn't distribute load.

                ok, so you dont want any HA, in that case be aware that you wont have any failover capability, remember connections cant migrate from 1 live node to another.

                - Guarentee that all connections are evenly distributed across all nodes (assuming I have some logic for looking up what nodes are available)

                They will be round robined as you mentioned earlier.

                - Get's me around the OOM issue I'm experiencing (again, if you see I'm doing something incorrect with that setup, please repond to that thread)

                probably the wrong reason to change your topology, however I will take a look at your other post

                - Ability to control draining, then shutting down a node without affecting the rest of the cluster

                Why dont you just configure backup servers then you won't need to drain them.

                 

                Can I ask your use case is, It seems to me like you want to be bringing up/down all the time, from experience the more static and stable a cluster is the better, obviously you want to be able to bring nodes down for maintenance or add nodes to deal with extra load but this shouldnt happen too often.

                • 5. Re: Advantages and Disadvantages of a Clustered Topology
                  craigm123

                  Thanks again for the reponse.

                   

                  If I'm reading your answer correct, the client load balancing policy does not gaurentee that all connections are evenly spread after a node is removed, then added back.  It's for this reason I would want to have application logic there to guarentee this. 

                   

                  I'm hesitant to use backup servers because they currently require shared storage, like SAN.  I've tried out DRDB, but this doesn't seem to have adequate speed to handle the traffic I'm throwing at this hornetq journals.  In fact, I'm keeping all of the journals on a ramdisk so I can get the throughput I need.  Then again, maybe I'm trying to push too much through one individual node and the real solution to this is to load balance my traffic amongst more hornetq nodes, in which case I could potentially use DRDB or something similar.

                   

                  My use case invovles handling a large number of asyncronous messags from our mobile clients that continues to grow in number as our business grows.  We use JMS to route and buffer traffic as it's being processed.  Removing servers from the load may not happen all that often, but we may chose to do maintenance on servers like you mentioned.  We certainly would like to add more hornetq nodes as load increases.