Well, that piece of code contributes to expose a problem that exists during the service startup.
I have been successful in reproducing the following scenario.
1- server starts up and joins the cluster, /farm applications are pulled from remote node.
2- farmDeploy() is called for the applications, remotelyDeployed gets filled with their names
3- scanner thread is started and begins calling deploy() for each application
4- by chance, deploy() executes while service status is still STARTING so it call super.deploy() and applications are deployed but their names are not removed from remotelyDeployed
5- you remove application from /farm and farmUndeploy() is invoked.
6- you put again application in /farm, deploy() is called, super.deploy() is called, BUT remotelyDeployed is still dirty from the preceding deploy so farmDeploy() is not invoked and the application won't get farmed
Then everything cleans up and for subsequent deploy/undeploy works correctly.
My question: is this the intended behaviour and the date check is only a forgotten piece of code which was not supposed to work or is it a bug and the date check should be corrected?
When starting the JBoss server, the cluster applications always overide the local applications. This helps avoid having different versions of your application on the cluster.
If the application on your server is newer than what is on the cluster, then you need to copy the new application back to your server farm directory after the server is completely running.
You might read a new posting related to this on the Wiki at http://wiki.jboss.org/wiki/Wiki.jsp?page=JoinTheClusterBeforeUpdatingTheFarmDirectory
I hope this helps.
I'll respond separately to your second posting.
I looked at the code and I agree with your finding. The remotelyDeployed state should be cleared during startup processing but we leave it dirty. Nice find!
The fix might be as simple as moving the "clear remotelyDeployed" flag to be before the "getState() == STARTING" test.
Please create a Jira task for this (http://jira.jboss.com/jira/secure/BrowseProjects.jspa) and post the Jira bug report number here.
Opened Bug http://jira.jboss.com/jira/browse/JBCLUSTER-42.
I tested the proposed solution and it looks to be working.
As for my first question, now i see the need for that date check within the pullNewDeployments() method: stop/start the service from jmx console!
Sorry i didn't thought of that.
One more thing, i'm writing for my own usage a FarmMemberSingletonService that's modeled on FarmMemberService, ie it farms the applications in all the cluster nodes like it, but deployes them only in the singleton master so that if it goes down the new master is sure to have the correct apps version to deploy.
Right now it works, but for the logic to control the service start/stop from jmx console that i have still to implement.
I wished to extend FarmMemberService, but it is not easily extendable for such task unless some refactoring is done in URLDeploymentScanner and FarmMemberService so i had to duplicate the code adding some more logic.
I'm wondering if you might be interested in including that service in Jboss once it is finished.
That is pretty creative thinking on your part, finding a use for the timestamp checking code :-)
The FarmMemberSingletonService that you are working on sounds really interesting. If you want to post the implementation, just create a Jira task with the code attached when your done.
The org.jboss.ha.framework.server.FarmMemberService.deploy() fix for JBCLUSTER-42 is checked into the JBoss 4.0.3 repository.
I will encourage you to post your design doc (can be brief) to the Clustering Design forum. People may have valuable feedback, actually. :-)
Like Scott mentioned, the correct patch process is create a Jira and attach your documentation, patch, and possibly junit testing there.
Please post the jira issue here as well once you have that.
Come to think of your singleton service, I don't understand how will that work though?
If I need to upgrade, should I deploy to that specific farm directory?
BTW, Adrian has a good thread on the farm service
for the future implementation on the new Pojo Server.
What i did is simply to decouple the farming from the deployment.
The FarmMemberSingletonService is itself a singleton, controlled by org.jboss.ha.singleton.HASingletonController class, that scan a specific directory, lets call it /farmSingleton.
When the service is started it immediately activates the farming, ie it pulls the deployments from the other nodes and start the scanner.
The scanner call an override scan() method that calls a farm() method that, mimicking the URLDeploymentScanner deploy(), puts into a
farmedSet all the deployments units found in scanned dir so that subsequent scan run won't call again the farm() method unless something
changes for the deployment unit files.
This means that if i remove a file from the /farmSingleton directory in any node of the cluster, the scanner for that node will call unfarm() and the file is immediately unfarmed in all the nodes. If i add or replace a file in the /farmSingleton directory in any node of the cluster, the scanner for that node will call farm() and the file is immediately farmed in all the nodes.
This for the farming part, for the deployment part, as i said before, this service is itself a singleton and the deploy is controlled by the service receiving startSingleton() call.
When the singleton is started the scan() will always call super.scan() so that the normal deploy cycle is activated and HASingletonController
ensures that this can happen on only one node at time.
If someone tries to start the singleton from jmx console, it checks with its controller if this is master node and refuse to start if it is not.
If someone stop the singleton, it checks with its controller if this is master node and if it is, it simply stop the deploy scan cycle, activating the farm scan without undeploying the applications. If it is not the master node (like if someone stopped its controller), it also undeploy the applications (it is assuming that another node is becoming master).
There are two drawbacks of the thing.
First is the behaviour of the master node algorithms in DistributedReplicantManagerImpl. It is biased by cluster nodes startup order and is not symmmetric, ie if i stop HASingletonController in the first started node because for whatever reason i need that it relinquishes its
master status in favour of another node, unless i bring down the whole server, i cannot restart HASingletonController for FarmMemberSingletonService otherwise it will reclaim back the master node status, which may be is not what i want.
Second i didn't found any way to efficiently extend FarmMemberService so i had to duplicate all the code in the new class.
I still have to get rid of a couple of perplexities i have about start/stop cycle, then polish a bit the code and add some comments, then i'll post it.
Does it sound twisted enough?
Well, it took longer than foreseen, i was rather busy with my real job, but finally here the promised service
It's completely rewritten because i was not satisfied about the scarce reuse of already existing functions.
Due to the structure of URLDeploymentScanner and FarmMemberService the new class ended upt to be an almost complete duplication of both's code.
So i heavily refactored them and now the new servce is about 200 locs included braces and comments!
I'm bit scared about your reactions about this refactoring, so i'll be here waiting with crossed fingers... ;)
I saw your message and have been tied up with something else. I will try to look at this soon and also talk to my team leader (Ben Wang) about this.
I just wanted to let you know that I'm not purposely ignoring your request. :-)