-
1. Re: Farm deployment, cluster merge and offline operations
adrian.brock Oct 4, 2005 2:19 PM (in response to smarlow)This is what jgroups calls a merge problem.
It must be handled by the "application" since only it knows what the state means.
In this case, you need to merge the lists of deployed applications.
The fundamental problem I would think is that the scanner is currently
"timestamp" based which is not very reliable in a cluster because of differences in
system clocks.
If there was some form of reliable timestamp, ithe problem(s) could be fixed
as follows:
Problem 1:
Take the deployment with the latest timestamp
Problem 2:
E has to remember the timestamp of when the deployment was removed
in case there is a future merge (or a new member joins that still has the deployment)
and it needs to take precedence over other members claiming it should be deployed.
For E just joining the cluster or when it got bounced between the undeployment
and the merge it would need to make that information persistent.
(I assume the timestamp on the directory is not a reliable indicator of the
removal time - which it isn't on all operating systems).
Letting the user decide is always preferable to automagical behaviour that
cannot be overridden. ;-) -
2. Re: Farm deployment, cluster merge and offline operations
adrian.brock Oct 4, 2005 2:20 PM (in response to smarlow)Can we please move these tasks back into the JBAS project where they belong
(until we decide otherwise).
Farming/clustering is developed as part of the application server and should be on
its roadmap/change log. -
3. Re: Farm deployment, cluster merge and offline operations
smarlow Oct 4, 2005 2:35 PM (in response to smarlow)I will link the tasks to JBAS, does that work?
-
4. Re: Farm deployment, cluster merge and offline operations
adrian.brock Oct 4, 2005 2:42 PM (in response to smarlow)It puts them on the roadmap.
-
5. Re: Farm deployment, cluster merge and offline operations
adrian.brock Oct 4, 2005 2:43 PM (in response to smarlow)Removing deployments offline seems like an intractable problem to me.
Take the pathological example where E is take out offline, a deployment is removed
but E is not restored to the cluster for a week.
What is the timestamp for the deployment's removal, when E left the cluster or
when it rejoined? Both solutions have issues. -
6. Re: Farm deployment, cluster merge and offline operations
smarlow Oct 4, 2005 3:22 PM (in response to smarlow)I agree that this is a hard problem and wanted to bring it out in the open for discussion. Bela Ban had some suggestions similar to yours that I will try to post later (need to reformat it.)
Perhaps we should solve the "offline" issue separately from handling "cluster split/merge".
I can think of two possible ways to handle offline operations:
1. Introduce a command line utitlity for managing the farm deployment folder for an offline server. If you delete an entry from the farm folder with the command line utility, an entry will be created in a metadata file describing the deletetion. This metadata file can be used when the node joins the cluster.
2. Introduce hidden shadow files for each deployed file, when the node joins the cluster, having a shadow file but no corresponding deployment file will mean that we need to undeploy the application.
For example, if I deploy WineStore.ear, a hidden WineStore_Ear.sdw file is generated in each farm deployment folder. If a node goes down and someone deletes WineStore.ear from that machine, we would propagate the undeploy of WineStore.ear to the cluster.
Could this fit into the new future virtual file system scheme? Or does the design for the future virtual file system have a solution for this issue?
Should we develop the solution to this problem under the new server micro-kernel environment and back port later? -
7. Re: Farm deployment, cluster merge and offline operations
smarlow Oct 4, 2005 3:59 PM (in response to smarlow)I linked new issue JBAS-2326 to JBLUSTER-68
-
8. Re: Farm deployment, cluster merge and offline operations
adrian.brock Oct 4, 2005 4:15 PM (in response to smarlow)"ScottMarlowNovell" wrote:
Could this fit into the new future virtual file system scheme? Or does the design for the future virtual file system have a solution for this issue?
Should we develop the solution to this problem under the new server micro-kernel environment and back port later?
This doesn't exist yet. -
9. Re: Farm deployment, cluster merge and offline operations
adrian.brock Oct 4, 2005 4:18 PM (in response to smarlow)"ScottMarlowNovell" wrote:
I can think of two possible ways to handle offline operations:
Like I said, you will need some form of persistence to track/spot offline (or out of cluster)
changes done by the user.
I don't know whether that is shadow files or some other information? It sounds like
an implementation choice to me.
Asking the user to maintain some file correctly sounds error prone,
though they should be able to "fix" it when if/when it goes wrong. -
10. Re: Farm deployment, cluster merge and offline operations
belaban Oct 5, 2005 1:21 AM (in response to smarlow)Another option would be to let the application choose what to do when a merge happens, this is in line with what Adrian said. Okay, here is how it could work (this is very similar to the way load balance proxies work in Clustering):
- The application provide a MergePolicy which is given EAR1 and EAR1', and then decides what to do with it, e.g. to pick EAR1 to deploy, or undeploy EAR1'
- This has to be generalized, so we would probably have to compare N *sets* of files, because we can have multi-party merges
- There has to be a possibility to define a handback object from the user, e.g. a timestamp, which then has to be a parameter to the merge reconcile method, so the user can make the right decision
- We would provide default policies, that can be replaced by the user
Hmm, not sure if this makes sense and/or is too complex to implement... -
11. Re: Farm deployment, cluster merge and offline operations
smarlow Oct 5, 2005 9:49 AM (in response to smarlow)Below is the ".index file" suggestion from Bela that I mentioned earlier.
* We maintain a .index file in the ./farm directory * It contains, for each file, the time the file was deployed or undeployed and the number of deployments and undeployments * Undeployed files are kept for a certain time (e.g. 5 days), so we can record the fact that they were undeployed * When we do reconciliation (either cause by merge or offline deployment/undeployment), we can have multiple 'substates', e.g. {A,B,C} and {D,E} o We assume the dirs on A,B and C are in sync, and the dirs in D and E are in sync too o The coordinators of the previous subpartitions (A and D) will do the reconciliation o For each file F that is different between A and D (either changed, new file deployed, or file undeployed) + Consult a user?defined merge policy on what to do # Which F to pick # Whether to deploy or undeploy F + The policy implementation gets the full pathname of the file, plus some metadata, e.g. timestamp, number of deployments, undeployments + We ship with default policies + This is similar to Clustering where we have load balance policies + As long as the policy picks the right F deterministically, this will always create consistent ./farm dirs + The policy could also pop up a GUI where the user has to pick the right files, or it could always pick the files from the member with the lowest rank (oldest)
-
12. Re: Farm deployment, cluster merge and offline operations
smarlow Nov 23, 2005 10:23 AM (in response to smarlow)We had a face to face meeting to discuss a number of topics including farm deployment. I think that we have a decent proposal for handling offline operations. We still need to think more about handling cluster split/merge.
We will have a local per node catalog file that contains the following metadata about applications in the farm folder:
- Application file name
- Application last modification timestamp.
- Name of node that made last change.
During startup, we will handle ?offline addition? by checking the local metadata file for each application file in the farm folder. If a file isn't listed in the metadata file, its a new application that should be pushed to the cluster.
If the application file is in the local metadata file, then we can do further checking to see if we need to push it out to the cluster as a case of ?offline modification?. We will compare the file timestamp against the metadata modification timestamp. If the timestamps are different, then the file with the higher timestamp will be propagated to the other node (either pull or push operation).
If an application file is in the local metadata file but not in the local file system, we need to propagate a delete to the cluster of that application file.
For the cluster merge handling, we can make a pass through the set of subcluster groups and build a plan for how we will sync up the nodes. We then execute the plan.
The elected root coordinator will be responsible for building/executing the plan that will reconcile changes after the cluster split.
We will also look at optimizing node startup operations to avoid redeploying applications that haven't changed.