Controlled cluster shutdown with data restore from persistent storage

Version 4

Created by mircea.markus on Jul 16, 2013 10:05 AM. Last modified by mircea.markus on Jul 22, 2013 12:55 PM.

This is tracked by ISPN-3351.

Purpose

Recently it has been more and more a requirement from the users to be able to:

- controlled shutdown a cluster and flush data to persistent storage

- restart the whole cluster and preload the data from the storage (no data loss)

Design

Here's a suggested design to achieve this functionality:

Controlled shutdown

- when a *ClusterShutdown* operation (JMX) is triggered:

- the coordinator doesn't initiate any rebalance on nodes leaving

- each node:

- stops accepting client requests (in future we might get smarter and have a timeout for ongoing transactions)

- flushes whatever state it has in memory to the cache store: e.g. if async store waits for it to stop, if SingleFileCacheStore is used directly forces a sync

- writes a "successfulShutdown" flag to the *LocallyRegistry*

- process exits

Restart

- each individual node is restarted with a "ClusterRestart" option indicating they want to restore cluster state

- each node:

- reads the "successfulShutdown" and "shutDownView" information from the *LocalRegistry* and sends it to the coordinator

- if the node doesn't have have the "successfulShutdown" flag present it wipes out the cache store assuming the data is stale

- the coordinator doesn't start a rebalance on nodes joining, but:

- the coordinator waits for all nodes from the previously stored shutDownView to join again - ignoring new nodes

- when each of the expected nodes have rejoined the coordinator informs the cluster to preload

- after each nodes finished preloading, the coordinator install the "shutDownView" in all the nodes in the cluster (no state transfer yet if there are new nodes as well)

- enable new nodes to join (enable normal state transfer)

- in order to support the case in which not all nodes are being restarted the following JMX operations are available

- restartStatus - provides the following information

- list of the expected cluster nodes: "shutDownView"

- current cluster state: how many nodes have joined up to this moment, which ones are we still waiting on

- noDataLost: a boolean indicating weather enough nodes joined so that no state is lost (e.g. numOwners = 3, if shutDownView.size -2 nodes have joined; this flag could become smarter in future by looking at the actual segment distribution)

- forceRestart - allows an explicit restart even if not all nodes from the "shutDownView" are present

- the nodes that have currently joined preload the state from cache store

- after all have preloaded a rebalance is triggered in which old topology == "shutDownView" and thenew topology == current existing topology based on the available nodes

LocalRegistry

Implemented either as a file on the local disk (backed by the SingleFileCacheStore)

  <global>
     <localRegistry store="file:/path/to/my/local/regitry"/>
  <global>

Or in environments where a local store is not available (e.g. clouds):

<global>
   <localRegistry store="cahe:/myCache/myCacheStore"/>
<global>
...
<namedCache name="myCache">
  <loaders>
    <loader name="myCacheStore" class="...">
    ...
    </loader>
  </loaders>
</namedCache>

JBossDeveloper