Buddy Replication - the clustering concept where state is replicated in-memory to a specific other server in a cluster rather than the entire cluster of servers - has always been seen as important for any application server to have in its clustering arsenal. High scalability while maintaining transparent failover are the usual benefits, but the biggest drawback has always been complexity in load balancers.

 

 

When a server crashes, it is the load balancer's job to locate the failed server's buddy, and redirect requests meant for the original server, to its buddy. Several solutions have been employed by the industry, and have so far included techniques such as encoding the address of a buddy in each request and maintaining a static lookup table of servers and their buddies. These approaches have been considered unnecessarily complex or inflexible, but always seen as a necessary evil to enable buddy replication.

 

 

JBoss Cache 1.4.0 "Jalapeno", due for release in May 2006, will have an implementation of buddy replication which uses a concept of Data Gravitation to deal with the problem of recovery after failure. JBoss Cache is the technology behind JBoss Application Server's HTTP and EJB session replication.

 

 

With Data Gravitation, after the failure of a server in a cluster, the load balancer does not care where the buddy is, but simply uses its load balancing algorithm to direct the request to the cluster as though it were a brand new request. Consider the following setup:

 

 

 

 

Lets assume each node serves HTTP requests and needs to maintain HTTP session data. Node A's session data is backed up on Node B, Node B's session data is backed up on Node C, etc. Now lets say Node A crashes.

 

 

 

 

Node B takes over Node A's data and treats it as its own. And now when requests come in for sessions originally on Node A, the load balancer picks a node to redirect the request to - lets say this is Node D. Node D, realising that it does not own the session requested, attempts to find the new data owner - in this case Node B - and transfers the session data from Node B to Node D. Node D now becomes the new owner of this session and Node B removes it from its in-memory state. Node D, now the new owner of this session, will be able to deal with future requests involving this state. Data is also always backed up - i.e., when Node B takes ownership of failed Node A's sessions, it will start to back it up (onto Node C) along with its own sessions. Once Node D takes ownership of this session, again it will be treated the same as the rest of Node D's sessions and get backed up (onto Node E).

 

 

It must be noted that this gravitation would only occur if and when needed (i.e., lazy gravitation), and even then, will only occur once per session since the requesting node would take ownership of this session and its data.

 

 

This peer based mechanism allows load balancers to work with no extra configuration or setting up, as data will always gravitate to the server that needs it. The other benefit is that when a server crashes, its data and load can get evenly distributed across the remaining servers as the load balancer directs requests based on its configured load balancing policy for new requests. In the other setups mentioned above, the buddy server would suddenly have its load doubled when a server failure occurs.

 

 

The only requirement placed on such a setup is that sticky sessions are used, to prevent data from changing owners repeatedly if requests attached to a single session are directed to different servers in the cluster each time. Given that sticky sessions are very commonly used as an architectural pattern anyway and most application servers tend to use sticky sessions by default, this is a relatively painless requirement.

 

 

The designs of Buddy Replication in JBoss Cache 1.4.0 "Jalapeno" are available on this wiki page, and can be discussed on this forum thread.

 

 

--
Manik Surtani
Lead, JBoss Cache