NewClusteringDesign

Version 8

Created by timfox on Jul 18, 2006 9:57 AM. Last modified by timfox on Oct 11, 2006 2:32 PM.

Please note that parts of this is out of date, particular with regard to the intelligent balancer.

The current implementation of balancing does not require an HA singleton, and balancing functionality is smoothly

spread across the cluster.

I will update the documentation shortly - Tim

Introduction

This page describes the high level design of clustering for JBoss Messaging, currently scheduled to be released with JBoss Messaging 1.2

This design only considers operation of the cluster in a single Local Area Network (LAN). Wide Area Network (WAN) support will be added in later versions of JBoss Messaging.

The document is intended as a starting point for development. It is not within the scope of this (or any other) page to specify the design down to the level of code.

Persistent Storage

A distributed destination stores its persistent data in a persistent store.

A persistent store is not owned by any particular node. At any one time a shared persistent store can be used by multiple distributed channels to store their reliable data.

All persistent stores are accessible by any node in the LAN.

Each destination can be configured to use a specific persistent store.

An example of a shared persistent store would be a RDBMS.

+JBoss Messaging does not currently support unshared persistent stores, e.g. local file based stores.

This is due to the extra overhead that would be incurred in a distributed topic where each node is using a local file based store and there are durable subscriptions on each node.. This would require each persistent message sent to be written into each file based store using a 2PC protocol.

The restriction could potentially be dropped for queues.+

Distributed Queue

A distributed queue is a logical queue that is distributed across the cluster. This allows processing load to be smoothly balanced across the cluster.

From the perspective of the application developer it is used in exactly the same way and has the same semantics as a non-distributed queue.

Receivers can consume messages from the queue while attached to any of the nodes on which the queue is deployed.

Similarly producers of messages can send messages to the queue while attached to any of the nodes on which the queue is deployed.

The ordering semantics of a JBoss Messaging queue are somewhat looser than the ordering semantics of a standard First In First Out (FIFO) queue, and are modelled around the ordering semantics of a JMS queue.

With a JBoss Messaging queue, we only guarantees the order of delivery of messages from a particular JMS session (JMS 1.1 specification, section 4.4.10.2), not a total ordering of messages.

This gives us the freedom to implement the distributed queue as a set of partial queues, one on each node of the cluster.

Each partial queue maintains its own set of messages.

As messages are sent to the distributed queue via a particular node, they arrive at the partial queue corresponding to that distributed destination for that node.

A queue can have multiple competing consumers on each node.

Any one of the consumers of the queue on that same node may then consume the message.

In the simplest configuration, the partial queues can be completely independent of each other. Messages arrive on a partial queue and can only be consumed by consumers of the queue on that same node.

This solution can be highly performant and works well if you know that your sending load can be distributed smoothly across all nodes, and you also have consumers distributed smoothly across all nodes, consuming at approximately the same rate.

This is likely to be true for many MDB style deployments where the same MDB is deployed on each node and consumes messages from the queue, and on the front end load balancing is used to distribute many JMS connections across the nodes of the cluster.

However, completely independent queues are not appropriate in all situations and suffer from two problems:

Starvation of receivers. If messages are not arriving smoothly at partial queues across the cluster, there can be some partial queues with little or no messages and receivers sitting idle, thus wasting clock cycles that could be used for processing.

Stranded messages. If some partial queues do not have any receivers attached, but messages are arriving at the partial queue then they will go unconsumed until a receiver attaches.

To solve these issues, JBoss Messaging introduces the notion of an Intelligent Balancer (IB).

The IB continually collects statistics for each distributed queue on each node, and based on those statistics may decide to instruct a partial queue to move some of its messages to another node, in order to balance the load across the cluster better.

The IB resides on only one node at any one time. If that node fails, another node picks up the responsibility of housing the IB.

Statistics may include number of messages, size of messages, consumption rate, deposition rate etc.

The particular policy the IB uses is pluggable, and fully configurable.

The interval at which the IB collects statistics from each node is also configurable.

Any persistent messages moved from one node to another after receiving instructions from the IB must be moved reliably.

This is accomplished by moving the messages transactionally using a Two Phase commit protocol (2PC) and using a transaction manager that fully supports recovery.

Without using 2PC, messages can be added to the receiving node, then not removed from the sending node, due to the sending node crashing, which break the once and only once delivery guarantee for reliable messages.

It is not necessary to use 2PC for non persistent messages.

JBoss Messaging internally will use JBoss Transactions, a world class transaction manager, to co-ordinate the 2PC.

Figure 1 shows a schematic of a JBoss Messaging distributed queue.

The figure shows a queue distributed across three nodes, node A, node B and node C.

Node A has two receivers and one sender. Node B has three receivers and no senders. Node C has no receivers and three senders.

In the absence of balancing, the receivers on node B would starve, and the messages on node C would be stranded.

In this case, the IB has instructed node C to send a block of messages to node B to correct the situation.

It has made this decision on the basis of statistics gathered from each node. Statistics are gathered (pulled) from the nodes by the IB on a continual (periodic) basis.

Distributed Topic

A distributed topic is a logical topic that is distributed across the cluster.

Like a queue, from the perspective of the application developer it is used in exactly the same way and has the same semantics as a non-distributed topic.

Clients can create non durable subscriptions while attached to any of the nodes on which the topic is deployed.

A non durable subscription exists on only one node of the cluster, and is only accessible from that node.

Each non durable subscription will receive all messages sent to the topic from any node while the subscription is in existence.

Clients can also create durable subscriptions while attached to any of the nodes on which the topic is deployed.

The same durable subscription can be consumed from any node in the cluster on which the topic is deployed.

This feature is similar to having multiple consumers on a single queue distributed across the cluster, and allows the processing load from a single durable subscription to be distributed.

In many ways a durable subscription is treated the same as a distributed queue.

Each node contains a partial durable subscription (like a partial queue), and the IB is used to move messages between partial durable subscriptions according to statistics from each node.

Figure 2 shows a schematic of a distributed topic.

By allowing both queues and durable subscriptions to be truly balanced across the cluster and allowing multiple receivers per node for each queue and durable subscription allows processing load to be very smoothly spread across the cluster.

Producers of messages can send messages to the topic while attached to any of the nodes on which the queue is deployed.

When a message (or messages in the case of a transaction containing multiple messages to send) is sent to the topic by a sender attached to a particular node, it is reliably multicast using JGroups across the LAN and is picked up by each subscription, both durable and non-durable.

The durable case adds some complexity since the durable subscription exists on every node of the cluster (unlike a non-durable subscription), but only one of the partial durable subscriptions should accept the message.

If the partial durable subscription on a particular node has no receivers it should always reject the message. If it has at least one receiver then it will accept the message according to the value of a particular incrementing value which will be present in the packet of data.

The sender will not multicast the send if no subscriptions will accept the message due to it not matching any filters, thus preventing unnecessary network traffic.

In the case of persistent messages, these are persisted to any durable subscriptions on the sending node, before the data is broadcast across the LAN.

Transactional Operation

Transactional operations (sending or acknowledging messages) can be performed on multiple distributed destinations in the same transaction.

Each partial queue or partial durable subscription can potentially use its own persistent store.

To preserve ACID semantics, each distributed destination provides a XAResource implementation which can be enlisted in a global transaction coordinated by the sending node&146;s transaction manager.

JBoss Transactions JTA implementation is used as the transaction manager, giving an excellent reliability guarantee since it implements a full recovery protocol.

Failover and High Availability

In the event of failure of a particular node, any non persistent messages in a partial queue or subscription on that node will be lost. This is conformant with the JMS 1.1 specification.

It has been considered whether to provide a better degree of reliability for non persistent messages by replicating in memory onto other nodes, but it was decided not to implement this, for the moment at least, due to the extra overhead in replicating the non persistent messages onto other nodes, and in most cases with non persistent messages performance is paramount.

However, for persistent messages, once and only once guaranteed delivery is implemented.

When a particular node fails, the responsibility for hosting any partial queues or partial durable subscriptions on the failed node fails over to another node in the cluster.

The persistent state of those partial queues or partial durable subscriptions is then reloaded from persistent storage onto the new node.

Any JMS connections that were using the failed node will also be failed over transparently to the new node. Users will be able to go on using their JMS sessions as if nothing had happened.

distributed_queue.jpg 71.7 KB
distributed_topic.jpg 90.7 KB

JBossDeveloper

NewClusteringDesign

Please note that parts of this is out of date, particular with regard to the intelligent balancer.

Comments