JBoss Messaging Clustering 1.4.x Overview
This page is an overview for the Clustering model underneath JBossMessaging 1.4.x. This includes 1.4.0 sp3, which is included in EAP 4.3.
Server side setup
Say we have a queue called queueA and it's clustered. Each peer in the cluster will have a partial queue. This means that each peer will have a piece of queueA(some of the messages).
Client side Connection Load Balancing
When a client looks up the connection factory, if they look up /ClusteredConnectionFactory, they will get a smart proxy that contains all the addresses of all of the JBM servers in the cluster. Each time you ask the clustered connection factory that you looked up for a connection, it will round robin the connections to each server in the cluster. Each time you ask for another connection, you will get a connection to the next server. So you may have 3 servers and 3 client connections(each to a different server). The client starts to send messages. What ever connection it uses, the messages will go to that server only. On a connection, if the server stops reponding(stops the ping), the connection will automatically fail over to another one of the servers in the cluster. No load balancing happens from the client side, unless you count the round robin of servers in the connection factory.
Server Side Failover - It's good to have a buddy watch your back
Every peer(node) has a buddy that will pick up all of it's messages if it fails. If a server fails, the buddy will look in the database and recover all of it's buddies messages. This only works for persistent messages. From the docs - "If the node you are connected to fails, you will automatically fail over to another node and will not lose any persistent messages. You can carry on with your session seamlessly where you left off. Once and only once delivery of persistent messages is respected at all times."
There are two drawbacks to the buddy system.
1. Messages will be stranded if two buddies go down at the same time.
If two nodes go down that are buddies at the same time, one of the buddies messages will not be picked up. For instance, if you have nodes 1, 2, and 3. If 3 is a failover node for 2 and 1 is a failovernode for 3, if 2 and 3 goe down, 2's messages will be lost, becuase 3 is not there to load 2's messages. The messages in Queue 2 will be stranded.
2. Clustered queues will only process messages from running nodes.
When running multiple nodes, the nodes need to be up at the same time to register with one another in the cluster. If you bring the nodes up and down at different times, the messages will not roll over from node to node. For example. Bring node 1 up and process messages. Bring node 1 down. Bring node 2 up and process messages. You will notice that no messages have been pulled from node 1, until that node is brought up. Even though the nodes hit the same database, they will only pull the messages for their own peer id, unless they are registered as a failover node for another peer and failover occurs.
Server Side Loadbalancing
JBM is set up with a peer system. There is no central coordinator. Remember our example, where we have queueA and there are pieces(messages) of queueA on each Peer(node). Each one of these partial queues establishes a sucker connection to it's buddy. If you have a consumer locally, that local consumer will always get the message, if it's not busy (i.e. if it's buffer is full). If all the local consumers are busy, JBM will consider sending the message to a remote partial queue on a different server. This gives an optimal use of network resources, since it avoids unnecessary network traffic.
Another way to explain this is as follows:
JBM contains message redistribution with clustered queues.
This means, if you have a clustered queue over several nodes in a cluster, and consumers on one node are faster than consumers on another node, then messages will be pulled from one node to another transparently to satisfy demand.
The idea here is it should give optimal use of server resources for processing messages.
This behaviour can sometimes cause confusion. The classic case is a user sets up a cluster of N nodes with a clustered queue, then sends messages to a single node and wonder why all their consumers on different nodes aren't processing the messages. This is because the local node still has spare cpu cycles so there is no point in allowing other nodes to consume, since that would involve unnecessary network round tripping. So it's actually the optimal use of resources.
Do not use /ClusteredConnectionFactory inside the application server
The clustered connection factory is to be used by external clients that need failover. If you are inside of an application server and the peer goes down, most likely the entire node is going down. So you don't really need connection failover inside of your own application server. You only need it for external clients. Please, use the java:/JmsXA for you connection factory inside of the application server. The JmsXA is the JMS adapter that wraps a local connection going to your local peer.
Note: You cannot see all the messages in the queue if it's clusterd
Many times users will use the QueueBrowser to browse messages or get queue counts to see how many messages are in the queue. Using a peer model vs a coordinator model, it not practical to browse the entire queue or even get the count of messages in the total queue. This is one of the limitations. You can browse each partial queue on each node, but you will only be able to see messages in the local partial queue. It may be possible to wire up something that will go to each peer and pull all the messages in order to facilitate a total queue browse, but it's not a feature that has been in high demand.
Note: You cannot use selectors for the entire queue if it's clustered
For the reason give above, each peer can only see its local messages, it's near impossible and not practical to implement message selectors for this model. You can implement a selector, but it will only work for messages inside of that particular peer/partial queue. It will not work across the cluster/entire queue.