Advantages and Disadvantages of a Clustered Topology
craigm123 Jun 15, 2012 9:29 PMI'm debating between a clustered hornetq topology vs. writing some client software that can attach to many standalone, non-clustered hornetq instances. I'd like to see if others have had the same debate and see what conclusion they came to.
To start, here are some requirements I have:
- Many producers being able to load balance messages across many hornetq nodes
- Many consumers being able to consume messages off of many hornetq nodes
- The ability to manage which hornetq servers a producer or consumer is connected dynamically (ie. the ability to take hornetq servers in and out of the load, much like a load balancer)
- Queues and topics that can page to disk when messages can not be consumed fast enough, but cannot page itself to death (ie. run out of disk space or memory)
- Works on inexpensive disks (no SAN or SSD disks)
- Handles 20k+ 1KB messages per second
I've been using a clustered topology for about 2 months now and have had mixed results. The hornetq cluster consists of two nodes that have static connectors to each other. Each node has a 10GB ramdisk provisioned on it for journals and a SAS drive for paging (journalling on disk was not fast enough). Each of the client applications use Spring's JmsTemplate and DefaultMessageListenerContainer, along with a CachingConnectionFactory to get around some of the anti-patterns introduced by JmsTemplate. Because we use the CachingConnectionFactory, each client only makes one connection to the hornetq cluster (ie. to one of the two nodes, but not both). This means that all traffic from a particular client is not load balanced across available nodes, but instead sent entirely to one node.
We push 20k messages per second through the cluster and it seems to be handling it quite well, but about once a week, we see an OOM exception, which seems to be related to paging and dead client connections (see https://community.jboss.org/thread/200960 for more details). I've been able to artificially reproduce the problem in our test environment by killing the second cluster member while a large amount paging is going on, which leads me to believe the combination of paging, plus the core bridge between the hornetq nodes is causing the issue (please feel free to debate this with me on the other thread). I, however, was not able to reproduce the problem under similar load on just one hornetq standalone node.
Has anyone debated over this too? What do you think of using and managing a set of standalone message brokers vs. using a clustered topology?
Thanks,
Craig