JBoss Atomic Farm Deployment
smarlow May 2, 2005 11:15 AMI'm trying to help with Farm deployment and would like to start a forum discussion on issue JBCLUSTER-26. Ben Wang and I discussed the need for atomic deployment support last week, the information below is partly based on our conversation and my understanding of the task. These are my words and not his (okay, disclosure is complete :-)
Deployment should be atomic or atomic like. Deployment should only complete on any machine if the application can be copied to all nodes and deployed on all nodes. If the deployment fails on any one machine, the deployment should rollback on every machine to its previous state. If a user farm deploys a newer application that is already farm deployed, the new application replaces the old one, unless we have to rollback (in which case we stick with the old app version.) The user that initiates the deployment should be considered the administrator for the deployment operation, results from the cluster nodes should be delivered to the administrator machine in some form (something like ?CrimePortalBeans.jar successfully copied to node1, CrimePortalBeans.jar deployed on node1, CrimePortalBeans.jar failed to be copied to node2, rollback...?.) The results for each node should also be logged locally on each node to help with troubleshooting.
Does this sound right? Should we go with a two phase approach backed by a transaction log or take a lighter approach to tighten up the current support.
Some interesting cases might be:
1.Reboot server during farm deploy (after file is copied into farm folder). rollback should occur after reboot. Do the same for all nodes in the cluster, rollback should occur after reboot.
2.Same as #1 but previous copy of the application exists already and needs to be restored to the farm folder without appearing as a new update.
3.Read only farm folders (http url) may present some challenges, not sure how they fit into the atomic model.
4.Start cluster node1 with app already in farm folder, then start node2, make sure that application changes deploy in the right direction. This is difficult because the current information is ambiguous (should the app be removed from node1 or added to node2). I propose that the decision should always be to add the app to the cluster rather than remove it from the cluster. If you want to remove an application from a cluster, you will have to remove it from a ?live? cluster node (node that is currently part of the cluster).
5.If we use a transaction log (would contain changes to the cluster) how does the user manage it?
6.Rollback operation may fail, how do we handle rollback failures?
A third approach might be to support farm deployment from a source control system. This might be nice as you would know exactly what is in use on the cluster and have a nice history of deployed archives. This solves different problems than atomic deployment but wanted to mention it in case others thought it would help. The objective would be to maintain a ?single truth? as to what should be running on the cluster.