7 Replies Latest reply on May 31, 2005 11:16 AM by smarlow

    Should nodes joining the cluster pull applications from ever

    smarlow

      This was raised in http://www.jboss.org/index.html?module=bb&op=viewtopic&t=64186 and I wanted to discuss a proposal for resolving the issue without clouding the discussion in the other item.

      I would like to work towards ensuring that there is a consistent set of applications deployed on the cluster as a goal for farm deployment.

      With this goal in mind, I don't think that server nodes joining the JBoss cluster should pull applications from every other cluster node. I believe that there should be a concept of the cluster coordinator that the applications are pulled from only. The cluster coordinator is the oldest cluster member (replaced by the next oldest during a failure).

      This will make clustered server startup faster as fewer applications will be transferred. I believe that this will also simplify the http://jira.jboss.com/jira/browse/JBCLUSTER-33 task that deals with large file deployments.

      This doesn't address the following issues which I won't to mention:

      1. If you delete a clusterer application, you should remove the application manually from any cluster node member that was down at the time of deletion. This would cover the case that the downed node later becomes the cluster coordinator (to prevent the deleted application from coming back to life). I have reproduced this case.

      2. If two server nodes are started while the network is down, when they later see each other, one will become the cluster coordinator but applications are not propagated to the other node. I haven't reproduced this case but I believe it exists.

      These two cases are present with or without the proposed change and are only mentioned here to document them (will create Jira issues for them later.)

        • 1. Re: Should nodes joining the cluster pull applications from
          smarlow

          I created a Jira issue for this http://jira.jboss.com/jira/browse/JBCLUSTER-48.

          • 2. Re: Should nodes joining the cluster pull applications from

            Regarding issues 1 and 2 - shouldn't all nodes propagate farm deployments to/from the cluster coordinator when they come on line? They should obtain farm deployments from the coordinator and sent their own farm deployments to the coordinator if they're not already located on the coordinator.

            • 3. Re: Should nodes joining the cluster pull applications from
              smarlow

              Issue #1 is about how an application that is deleted from the cluster could currently come back into the cluster. Steps to reproduce:

              a. Create a cluster with more than one node. Lets assume three server nodes named { node1, node2, node3 }

              b. Deploy an application to the cluster, I used WineDemo.war.

              c. Bring node3 down.

              d. Delete WineDemo.war from the cluster. Note that it didn't get deleted from node3's farm folder as node3 is down.

              e. Bring node3 back up, WineDemo.war is still running on node3.

              The workaround for this issue is to manually remove WineDemo.war from node3 before bringing it back up.

              I create task http://jira.jboss.com/jira/browse/JBCLUSTER-49 for this.

              The second problem, issue #2 needs to be tested (I think it might be a problem).

              To answer your question, the nodes only propagate in one direction when the node comes online (nodes pull applications from the cluster during this event but not push).

              • 4. Re: Should nodes joining the cluster pull applications from
                garu

                Hi Scott,
                in my opinion the first, basic, assumption is: cluster farming service must deploy the same applications at the same version (right now this only means same modification date) in all the nodes. It must not be allowed that a farm directory contains and hence its related farm service deploys an application that is not deployed in all the other cluster nodes.
                (i dont want here to touch the argument of atomic deployment ie what happens if a deploy is successful on one node and not in the others, that's another movie, and i want to see the thing just from the service startup point of view)

                Given that, we can point out the following items:
                1- the first node that cames up is the one that decides what applications and at what version will be initiallly deployed in the cluster

                2- when a node is coming up it must not advertise its deployed applications untill it has fully catched up with the rest of the cluster (this is already done)

                3- when a node is joining the cluster it must delete all the applications present in its farm directory, but not present in the application list it's getting from the cluster (this is easy, i tried it and can be done with a few lines of code)

                3a - when a node joins the cluster it must obtain the list of clustered application from a node in the cluster (more on this later) and pull the applications from that node

                3b- cluster always takes precedence, ie a joining node will always alingn on the cluster and not viceversa. A node could have been down for a while and applications could have been added, removed updated ,etc, so when it joins a cluster it must not be allowed to corrupt it with a possible outaded situation.

                4- basing on the above items it means that when a farm service is up and running the farmed, deployed application are exactly aligned with all the cluster nodes.

                5- if all the above items are respected, it doesn't matter from which node the applications are pulled since all the nodes will have the same content. A simple GET_FIRST request could be used.
                (Rigth now since the node list is based on the startup order, the answers will always be returned in nodes startup order)

                Gabriele

                • 5. Re: Should nodes joining the cluster pull applications from

                  Gabriele,

                  Thanks for the suggestion. In terms of starting up sequence, Scott and I have also discussed pulling a farm deployment from single node (e.g., coordinator, or even a user-designated node). It is in line with your idea.

                  However, the tricky thing is the case of group slipt and then join. For example, when you have two nodes that are initially in a group. Next comes the network problem causing two groups to split/form and so each become its own coordinator. When the network heals and they are merged again, only one will become coordinator (by default determined by ip address).

                  So user may not get consistent behavior. But if we can document the consequence, I think we can get by.

                  -Ben

                  • 6. Re: Should nodes joining the cluster pull applications from

                    If nodes only pull deployments from a coordinator and an administrator is deploying a new application to the cluster, how can he determine which node to deploy the application to? Since the designated coordinator will change over time as nodes are recycled, it won't necessarily be the original coordinator.

                    • 7. Re: Should nodes joining the cluster pull applications from
                      smarlow

                      A server that is starting up and joining the cluster for the first time will "pull" deployments from the cluster (they are only copied once from the oldest cluster member instead of from every other cluster member.)

                      There is also the concept of "push" which occurs when the administrator copies a new application to the farm deployment folder on any of the nodes currently part of the cluster (this can also be a http url). The new application is pushed to every other node on the cluster.

                      The change that we are discussing here doesn't impact how new applications are pushed out to the cluster.

                      The administrator can also undeploy an application by removing it from the farm folder of any running cluster node. This is also not impacted by the change that we are discussing here.