1 2 Previous Next 18 Replies Latest reply on Apr 18, 2006 8:19 PM by brian.stansberry

    JBoss Atomic Farm Deployment

    smarlow

      I'm trying to help with Farm deployment and would like to start a forum discussion on issue JBCLUSTER-26. Ben Wang and I discussed the need for atomic deployment support last week, the information below is partly based on our conversation and my understanding of the task. These are my words and not his (okay, disclosure is complete :-)

      Deployment should be atomic or atomic like. Deployment should only complete on any machine if the application can be copied to all nodes and deployed on all nodes. If the deployment fails on any one machine, the deployment should rollback on every machine to its previous state. If a user farm deploys a newer application that is already farm deployed, the new application replaces the old one, unless we have to rollback (in which case we stick with the old app version.) The user that initiates the deployment should be considered the administrator for the deployment operation, results from the cluster nodes should be delivered to the administrator machine in some form (something like ?CrimePortalBeans.jar successfully copied to node1, CrimePortalBeans.jar deployed on node1, CrimePortalBeans.jar failed to be copied to node2, rollback...?.) The results for each node should also be logged locally on each node to help with troubleshooting.

      Does this sound right? Should we go with a two phase approach backed by a transaction log or take a lighter approach to tighten up the current support.

      Some interesting cases might be:
      1.Reboot server during farm deploy (after file is copied into farm folder). rollback should occur after reboot. Do the same for all nodes in the cluster, rollback should occur after reboot.
      2.Same as #1 but previous copy of the application exists already and needs to be restored to the farm folder without appearing as a new update.
      3.Read only farm folders (http url) may present some challenges, not sure how they fit into the atomic model.
      4.Start cluster node1 with app already in farm folder, then start node2, make sure that application changes deploy in the right direction. This is difficult because the current information is ambiguous (should the app be removed from node1 or added to node2). I propose that the decision should always be to add the app to the cluster rather than remove it from the cluster. If you want to remove an application from a cluster, you will have to remove it from a ?live? cluster node (node that is currently part of the cluster).
      5.If we use a transaction log (would contain changes to the cluster) how does the user manage it?
      6.Rollback operation may fail, how do we handle rollback failures?

      A third approach might be to support farm deployment from a source control system. This might be nice as you would know exactly what is in use on the cluster and have a nice history of deployed archives. This solves different problems than atomic deployment but wanted to mention it in case others thought it would help. The objective would be to maintain a ?single truth? as to what should be running on the cluster.

        • 1. Re: JBoss Atomic Farm Deployment

          This discussion really belongs in the deployment forum, so I will move it there.
          It also deals with a number of "cross cutting concerns" so I will deal with
          each individually and give a sort of road-map of our thinking for JBoss5.

          • 2. Re: JBoss Atomic Farm Deployment

            First here's a critique of the farm service.
            The major concern with the farm service is that It lacks the ability to stage deployments.

            1) It just distributes the deployments on a best effort across the cluster
            with no reconciliation on whether the deployment works on each peer.
            2) There is the potential that every node in the cluster could be redeploying the
            application, making it generally unavailable.
            3) There are race conditions where two nodes could deploy things in different orders
            based on farm deployments to different nodes "at the same time".

            • 3. Re: JBoss Atomic Farm Deployment

              The first direction we will be taking in JBoss5 is to make the notion
              farm, singleton, replication, federation, etc. an aspect of each deployment rather than
              have different folders for these notions.

              This allows the features to be "mixed and matched" more easily.
              e.g. being able to farm a singleton where you deploy it to one node, it is distributed
              to all nodes but only activated on one.

              This is our "aspectized" deployment framework, here's the original requirements
              docs if you are not familiar with it:
              http://wiki.jboss.org/wiki/Wiki.jsp?page=JBossKernel

              These features will be configurable on individual services, not just whole deployments
              so you can have for example an EAR that has some components as singletons
              and others doing what farm does today.

              • 4. Re: JBoss Atomic Farm Deployment

                The second direction is versioned deployment which will eventually lead
                to "atomic deployments".

                The original aim of this feature is to allow a reference machine (not a production
                machine) to be configured from a known production configuration (e.g. version 5).
                When the admin is happy with the new config, he can make it the production config (version 6).

                At any point, if problems are found, the admin can "rollback" to a known config version.

                • 5. Re: JBoss Atomic Farm Deployment

                  On Atomic Deployments. The aim here is to replace
                  redeploy == undeploy/deploy
                  with something as near atomic and undistruptive as possible.

                  The redeploy will become something like:
                  * See whether the deployment can be constructed
                  * If it works, wait for current requests to complete and hold new requests (this is called the "valve")
                  * Once current requests are done, switch the from old to new deployment, e.g. flip
                  the jndi bindings and other outward facing references - this may also require some
                  handoff of state, e.g. handing over cache/session objects
                  * Remove the old/replaced deployments

                  NOTE: Because of classloading requirements, the cache/session handover
                  will almost certain require some of form serialzation. Either passivation of the
                  sessions to disk or replication from the other cluster nodes.

                  • 6. Re: JBoss Atomic Farm Deployment

                    On Transactional Deployment.

                    This is where you want the deployment to work on all nodes or none at all.

                    In general this will require the atomic deployment described above, but it also
                    needs to take into staging and recovery features, both should be pluggable
                    policy features.

                    • 7. Re: JBoss Atomic Farm Deployment

                      On Staging

                      The most obvious solution is to take each machine to the point where we
                      know the new deployment will work, but then flip each individually in a round robin
                      manner.
                      This means only one node is not serving requests and state can be retrieved
                      from the rest of the cluster.

                      • 8. Re: JBoss Atomic Farm Deployment

                        On recovery

                        This is a bit more complicated, but again a simple solution is to use the
                        elected cluster "co-ordinator" as a reference (assuming you don't have a reference
                        node as described above in the version processing).

                        If the co-ordinator can deploy something and another node fails, this will generally
                        mean that it has got out-of-sync or has some other problem.

                        The obvious solution then is for that machine to be automatically restarted so
                        it can recover its deployment state from the correct versioned deployment
                        either from the co-oridinator or the reference node.

                        • 9. Re: JBoss Atomic Farm Deployment

                          Like I said, a lot of this should be subject to policy.

                          e.g. in a cluster you don't need to hold new requests while the switch is taking place,
                          instead you can force them to failover to a machine that is not currently making a
                          switch between deployment versions.

                          e.g. you may not be bothered about some state surviving redeployment

                          • 10. Re: JBoss Atomic Farm Deployment

                            Finally, the cross-cutting and deployment versioning must eventually allow
                            for node specific configuration where the nodes heterogenous.

                            In this case, some nodes will need their own local configuration
                            overrides because they are not as capable as other nodes.

                            You can also imagine nodes co-operating more to manage resources.
                            e.g. If you one node has idle db connections while another is running out,
                            the idle connections could be closed in favour of the busier node.
                            But we are getting slightly off the topic of deployment at here. :-)

                            • 11. Re: JBoss Atomic Farm Deployment
                              belaban

                              All of these features are going to be available in JBoss 5.
                              However, we may need something simple(r) for 3.x and 4.x.

                              I'm proposing the following:
                              - On deployment, call _deploy() across the cluster
                              This returns a list of values, if one of them is an exception, showing that the deployment was not successful, we call _undeploy() across the cluster
                              - This is configurable, e.g. through an attribute "atomicDeployments" in the FarmService, which is off by default

                              This solution would be trivial to implement. Do you guys think this adds value to Farming ?

                              Bela

                              • 12. Re: JBoss Atomic Farm Deployment

                                Without proper versioning, _deploy can be problematic as well though since there is no guarantee that _undeploy will succeed afterwards if values are not all successful.

                                We will need _rollback instead. Of course, without implementing the full feature, the trade off is deciding where to draw the line?

                                -Ben

                                • 13. Re: JBoss Atomic Farm Deployment
                                  smarlow

                                  Adrian,

                                  I would like to help with the "aspectized" deployment framework implementation, can you assign some tasks to me?

                                  Thanks,
                                  Scott

                                  • 14. Re: JBoss Atomic Farm Deployment

                                    The initial tasks are here:
                                    http://jira.jboss.com/jira/secure/IssueNavigator.jspa?reset=true&mode=hide&pid=12310060&sorter/order=DESC&sorter/field=priority&resolutionIds=-1&component=12310154

                                    I know Dimitris/Scott started on the infrastructure but they have been (like me)
                                    working on more high prority stuff for JBoss4, so I don't know what state
                                    this is in? Probably nonfunctional?

                                    I plan to do the cutover to the new MicroKernel in JBoss5 over the next
                                    month, so this should start dropping into place.
                                    http://jira.jboss.com/jira/browse/JBAS-1841 + others

                                    The key part that really needs doing is the
                                    VFS - virtual file system
                                    VDF - virtual deployment framework
                                    to better integrate the classloading and deployment framework.

                                    See the discussions in the POJO Server forum, Scott might have
                                    some extra thoughts?
                                    Or knowing him, an uncommitted prototype lying around on his disk :-)

                                    1 2 Previous Next