1 2 Previous Next 23 Replies Latest reply on Jun 9, 2003 10:51 AM by slaboure

    JBoss Scheduler

    vakrishnan

      Is the JBoss Scheduler cluster aware?

      I am trying to deploy a cluster aware Scheduler. I tried searching the forums, but no prior post seems to have addressed this issue. Google-ing for it didn't help either.

      I ran two instances of JBoss on my machine and deployed the Scheduler on one, hoping to see signs of clustering. I killed one of the instances, hoping that the other would pick up where the first left off, but it didn't.

      When I deployed the Scheduler on both instances, both started their own Schedulers.

      Can anybody tell me if I am missing something here?

      Thanks in advance.

        • 1. Re: JBoss Scheduler
          ivelin.ivanov

          I am interested in the exact same problem.
          Link to a source with answers to the following questions will be appreciated:
          Can one expect that tasks will be load balanced in a cluster?
          How can one configure the schedulers such that they run tasks once per cluster instead of once per VM.

          • 2. Re: JBoss Scheduler
            weiqingh

            well, there was a previous post on the same subject:
            http://www.jboss.org/modules/bb/index.html?module=bb&op=viewtopic&t= too am trying to figure out a similar problem. by reading that post, it seems that DistributedReplicantManager is the way to go.

            however, i cannot figure out how to elect the master replica. it seems that when the first node adds a replicant, it will be considered the master. but even after it dies, the other nodes won't be considered master at all. i must be missing something here.

            Sacha, would you enlighten us on this? i have read the JBoss Clustering doc many times....


            • 3. Re: JBoss Scheduler
              weiqingh

              i have given more thoughts on this. (there is no expert responding to the topic. :) )

              this scheduling issue is a typical distributed locking problem. according to the paid clustering document, the jboss clustering is not distributed locking.

              think about it, what, for instance, if there is a network partition between 2 clustered jboss servers. neither of them is down. what they will detect is the other server is gone. to them , it doesn't make any difference if the other server is dead, or simply the network is down. so without distributed locking, both will start doing the scheduled work. the conclusion is that clustering (or java group for that matter) won't give you the control.

              so the solution to this is you have to implement your own distributed locking. e.g. you can put the lock in database and each scheduler has to grab the lock before it starts doing stuff. now db becomes the single point of failure. (well you can cluster db.) then you need to figure out how to clean up dead server's lock....

              • 4. Re: JBoss Scheduler
                ivelin.ivanov

                Thanks for the analisys.

                Selecting a leader within a network of peers is a very basic distributed computing problem.
                It should be made available in JBoss sooner than later. There are efficient algorithms that do not require single point of failure.
                I will be posting more on this subject once I figure a possible solution that builds on the current JBoss HA services.

                • 5. Re: JBoss Scheduler
                  weiqingh

                  yes, please post your findings.

                  on the other hand, one has to carefully define what "selecting a leader" really implies. if you want to be really sure, e.g. no 2 scheduler can run the same job twice, then if there is network partition, you may have 2 servers both running, yet neither of them knows if the other one is running. that is what i meant in my previous post. i am not sure things like JavaGroup itself would be able to address such issues.

                  of course if your application logic doesn't break if two servers both run the same job, you just want to minimize the chance it may happen, then JavaGroup could help you select a leader/master, and possibly more than one master.

                  • 6. Re: JBoss Scheduler
                    slaboure

                    Currently, there is a election mechanism in place and you can rely on it for your services. For example, if you use the DRM, I think there is even a isMasterNode or something like this.

                    As for partition, this is something I want to add in 4.0. While partition can be merged in 3.x, there is currently no way to use any mechanism (such as a central DB) to check if the partition in which you are located is a "viable" partition or a "disconnected" partition.

                    Cheers,


                    sacha

                    • 7. Re: JBoss Scheduler
                      weiqingh

                      Sacha, would you elaborate on this election mechanism? how does it work? dd you mean using isMasterReplica?

                      i have read both the paid doc and the source code of DRM, and still couldn't figure out what the method isMasterReplica means.

                      if i have 2 servers in the cluster and one server puts a value (with a common key) in DRM, this server will have isMasterReplica=true and the other server will have isMasterReplica=false. they both see the same value.

                      now if the first server is dead, the 2nd server still sees the value, but its isMasterReplica is still false. i don't see an election process going on (at the javagroup level, i assume) to assign the 2nd server as the master.

                      did i totally miss something here? thanx.

                      • 8. Re: JBoss Scheduler
                        slaboure

                        > Sacha, would you elaborate on this election
                        > mechanism? how does it work? dd you mean using
                        > isMasterReplica?

                        yes, DRM.isMasterReplica


                        > i have read both the paid doc and the source code of
                        > DRM, and still couldn't figure out what the method
                        > isMasterReplica means.

                        pfff a shame these open source coders...

                        > if i have 2 servers in the cluster and one server
                        > puts a value (with a common key) in DRM, this server
                        > will have isMasterReplica=true and the other server
                        > will have isMasterReplica=false. they both see the
                        > same value.

                        if both add a value for the same key, we will have one different value for each node for this key.

                        >
                        > now if the first server is dead, the 2nd server still
                        > sees the value, but its isMasterReplica is still
                        > false. i don't see an election process going on (at
                        > the javagroup level, i assume) to assign the 2nd
                        > server as the master.

                        if it dies, a new master is elected and is correctly returned i.e. the remaining node is the master. the election is done at the javagroups layer, the DRM simply elect as the master the first member from the JG list which has subscribed as a member of the DRM for this key

                        • 9. Re: JBoss Scheduler
                          weiqingh

                          > if it dies, a new master is elected and is correctly
                          > returned i.e. the remaining node is the master. the
                          > election is done at the javagroups layer, the DRM
                          > simply elect as the master the first member from the
                          > JG list which has subscribed as a member of the
                          > DRM for this key

                          Sacha, i really appreciate your response.

                          unfortunately i still don't understand how DRM will help determin if a server is a master after the other server dies. as said in my previous post, if only one server sets a value in DRM, the other server will never be master replica even if the first server dies. on the other hand, if they both set the values, both are master replica.

                          i understand JG will elect a master. but i cannot see how to get that from DRM. would you provide more guideline on this? thanks a lot.

                          • 10. Re: JBoss Scheduler
                            slaboure

                            Hello,

                            > unfortunately i still don't understand how DRM will
                            > help determin if a server is a master after the other
                            > server dies. as said in my previous post, if only one
                            > server sets a value in DRM, the other server will
                            > never be master replica even if the first server
                            > dies. on the other hand, if they both set the values,
                            > both are master replica.

                            NO! why do you think so. Re-read the DRM doco. The DRM is a way for a service to advertize it presence on a given node AND give an arbitrary value (optional) that can be used by the other nodes, that's all! Then, it is up to the DRM service to dynamically elect one, and only one, master at any given time. And you can subscribe to know receives callabacks when the master node changes.

                            > i understand JG will elect a master. but i cannot see
                            > how to get that from DRM. would you provide more
                            > guideline on this? thanks a lot.

                            There is only a way to determine if you are the master or not, not a way to ask who is the master. It is "isMasterReplica". The later functionality could be added though if you think it is really useful (most of the time it is not).

                            Cheers,


                            sacha

                            • 11. Re: JBoss Scheduler
                              weiqingh

                              hi Sacha,

                              thanks for your reply. yes, i read the document and what you said is exactly what i understood DRM should be doing. however after i ran my test program, where i have 2 servers, each setting a value (same or different), and then call isMasterReplica on both servers, they both return true.

                              if only one server sets a value, then the other server's isMasterReplica would return false. however even if the first server dies, the 2nd server still has isMasterReplica false.

                              • 12. Re: JBoss Scheduler
                                slaboure

                                then post your code

                                • 13. Re: JBoss Scheduler
                                  ivelin.ivanov

                                  Sacha and I started a development thread on this issue. Please join if you like:

                                  http://www.jboss.org/modules/bb/index.html?module=bb&op=viewtopic&t=

                                  • 14. Re: JBoss Scheduler
                                    weiqingh

                                    Sacha, here is the code. i wrote a simple mbean that i can invoke from jmx for the testing:

                                    private HAPartition getHAPartition () {
                                    InitialContext lContext = new InitialContext();
                                    try {
                                    return (HAPartition)lContext.lookup("HAPartition/DefaultPartition");
                                    } catch (NameNotFoundException e) {
                                    logger.info("no HAPArtition found " + e);
                                    return null;
                                    }
                                    }

                                    public String updateReplicatedValue () {
                                    HAPartition p = getHAPartition();
                                    DistributedReplicantManager drm = p.getDistributedReplicantManager();
                                    drm.add("replicated_key", "new_value");
                                    return drm.lookupReplicants("replicated_key").toString();
                                    }
                                    }

                                    public String displayPartitionInfo () {
                                    HAPartition p = getHAPartition();
                                    StringBuffer buf = new StringBuffer();
                                    DistributedReplicantManager drm = p.getDistributedReplicantManager();
                                    buf.append(" isMasterReplica = " + drm.isMasterReplica("replicated_key"));
                                    buf.append(" replicants= " + drm.lookupReplicants("replicated_key"));
                                    return buf.toString();
                                    }

                                    i have machines A and B in the cluster, both with the same test mbean deployed.

                                    i have two test cases. in the first test, both servers set the same value for "replicated_key" by each invoking updateReplicatedValue once. i then invoke displayPartitionInfo from both servers and i got the same result on both machines:
                                    isMasterReplica = true replicants= [new_value, new_value]

                                    the problem here is that they are both considered master for the same key.

                                    in the 2nd test case, i only invoke updateReplicatedValue once from machine A, and then invoke displayPartitionInfo on both machines. i got:
                                    A: isMasterReplica = true replicants= [new_value]
                                    B: isMasterReplica = false replicants= [new_value]

                                    this is what i expected. however if i shutdown machine A (which is the master), and then invoke displayPartitionInfo on machine B, i still get the same result: isMasterReplica = false replicants= [new_value]

                                    i.e. machine B doesn't become a master. this is the 2nd problem i am seeing.

                                    maybe i totally miss out on something here. any suggestion is highly appreciated.

                                    1 2 Previous Next