4 Replies Latest reply on Jul 19, 2006 10:09 PM by chanyungwan

    Singleton Failover

    chanyungwan

      I have a jboss cluster with 2 members. The OS is Redhat Linux Advanced Server 4 and the jboss version is 3.2.5. There are 3 singletons being put in deploy-hasingleton folder in each member. I want to test the singleton failover in this cluster and 2 cases are observed:

      Case 1) In the master node, I use the command "ps -auxwww | grep run.sh' to kill the jboss service, the 3 singletons can be launched in the 2nd node within 2 minutes and it is acceptable for us.

      Case 2) In the master node, I quit the jboss service using normal exit, i.e."Ctrl-C". However, the 3 singletons can also be launched in the 2nd node but it almost needs 10 - 15 minutes to complete the launching process. I tried to decrease the value of the linux kernel parameters like /proc/sys/net/ipv4/tcp_rmem or /proc/sys/net/ipv4/tcp_wmem or /proc/sys/net/ipv4/tcp_retries2, but none of them help.


      Please give an advice on tackling the 2nd case timing issue.

        • 1. Re: Singleton Failover
          brian.stansberry

          Suggest you take a look at http://wiki.jboss.org/wiki/Wiki.jsp?page=FDVersusFD_SOCK, particularly the portion at the bottom that discusses combining FD and FD_SOCK.

          • 2. Re: Singleton Failover
            chanyungwan

            Thanks for the reply.
            I tried to modify the cluster-service.xml by adding FD_SOCK as follows:

            :
            <FD_SOCK up_thread="true" down_thread="true" />
            <FD shun="true" up_thread="true" down_thread="true"
            timeout="2500" max_tries="5" />
            :

            Also, different version of jgroups.jar are being replaced
            a)2.2.4 <---default in 3.2.5
            b)2.2.8
            c)2.3 sp1

            However, the timing issue is still unresolved for the 2nd case. Is this the bug for 3.2.5 release ? We are not intended to upgrade the version by the way.

            So, any other solution can be provided ? Many thanks.

            • 3. Re: Singleton Failover
              brian.stansberry

              Not sure how much help I can be without detailed log information about what happens.

              I realize now I may have misread your original post -- you describe the time it takes to "launch" the singletons. Is this the time it takes to *begin* the process, i.e. how long it takes to recognize that the 2nd server has become the master, or the time it takes to *complete* the process, including whatever it takes for your singletons to deploy and/or become active?

              My previous question was targetted toward the former.

              • 4. Re: Singleton Failover
                chanyungwan

                There are totally 3 singletons. Make it simple, may be we use 1 singleton to illustrate the scenario. What I mean the "long" time for launching it in 2nd case is that the singleton almost needs 2 - 3 minutes for being completion of running its constructor and "startSingleton" method defined in jboss-service.xml, then finally the new cluster view is built. You can see the following log in 2nd node once I quit the master node using "Ctrl-C":

                :
                :
                :
                10:02:17,628 INFO [Http11Protocol] Starting Coyote HTTP/1.1 on http-0.0.0.0-8080
                10:02:18,030 INFO [ChannelSocket] JK2: ajp13 listening on /0.0.0.0:8009
                10:02:18,143 INFO [JkMain] Jk running ID=0 time=0/249 config=null
                10:03:20,704 INFO [STDOUT] calling singleton constructor
                10:05:21,489 INFO [STDOUT] starting singleton...
                10:07:21,572 INFO [ServiceModule] Registration is not done -> stop
                10:07:21,641 INFO [EmmgClusterPartition] New cluster view (id: 2, delta: -1) : [192.168.10.231:1099]
                10:07:21,642 INFO [EmmgClusterPartition:ReplicantManager] Dead members: 1


                Since there are 3 singletons in my case, so the total time for "completing" their activation is about 10 minutes. It's really a big problem for us!!