6 Replies Latest reply on Jan 29, 2004 9:31 PM by javajedi

    Delayed joining of partition

    michael.daleiden

       

      "michael.daleiden" wrote:
      I have a JBoss cluster set up between two nodes (Win2K and HP/UX). When I start the first node, everything goes smoothly. When I then go to start the second node, things get strange. It appears from the logs on the second node that the second node does not immediately join the partition (the log reports only one partition member):

      2003-08-06 11:14:05,838 INFO [org.jboss.ha.framework.server.ClusterPartition] Starting channel
      2003-08-06 11:14:05,839 INFO [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] Number of cluster members: 1
      2003-08-06 11:14:05,840 INFO [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] Other members: 0

      Later on in the log (after the server startup is complete and just before the farming service starts polling for new deployments), it reports that the cluster view has changed to include the other node:

      2003-08-06 11:15:46,562 INFO [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] New cluster view: 1 ([mdaleiden2:1705, codvap02:51245] delta: 1)
      2003-08-06 11:15:46,568 INFO [DefaultPartition:ReplicantManager] Merging partitions...
      2003-08-06 11:15:46,568 INFO [DefaultPartition:ReplicantManager] Dead members:0

      Configuration:
      Win2K node: Win2K SP2, JDK 1.4.1_01, JBoss 3.2.2RC1
      HP/UX node: HP/UX 11.0, JDK 1.3.1_08, JBoss 3.2.2RC1

      Nodes are on separate subnets, but multicast is enabled across the routers.

      Any ideas as to why this delay occurs? I really need to get the clustering issues worked out ASAP, as we are trying to get prepared for a pilot launch of our application.


        • 1. Re: Delayed joining of partition
          heartbit

           

          "HeartBit" wrote:
          IMHO, I think this is normal and by design.


          • 2. Re: Delayed joining of partition
            michael.daleiden

             

            "michael.daleiden" wrote:
            This is not normal behavior. If I cluster two Win2K machines that are on the same subnet, when the second node starts up, it immediately joins the partition once the JG channel is started by the ClusterPartition. All docs and posts that I have seen regarding clustering on JBoss state that the above behavior is the norm, but do not provide any information on detailed troubleshooting if this behavior does not occur.


            • 3. Re: Delayed joining of partition
              michael.daleiden

               

              "michael.daleiden" wrote:
              Does anyone have any idea why this is happening? I am struggling to get our systems fully configured and tested for a pilot launch of our application!

              The delayed join of the nodes is causing the HAPartition to not initialize the DistributedState with the current values from the partition (which have been set by the other node during startup/operation, before the second node was started).


              • 4. Re: Delayed joining of partition

                 

                "javajedi" wrote:
                I know this doesn't help, but for what it's worth, I'm seeing the same problem. Did you ever figure out a solution?


                • 5. Re: Delayed joining of partition
                  milton_quranda

                   

                  "milton_quranda" wrote:
                  Hi,
                  In your current config it takes some secs to detect a crashed
                  member. I 'd suggest you use timeout 10 OR 00 and max_tries=3. The caveat:
                  each member will send a heartbeat every 1.5 secs. Also, you increase the
                  probability of a 'false ' suspicion. If VERIFY_SUSPECT doesn 't catch
                  that, your suspected member will be shunned and then re-admitted later.
                  If you have a slow member, that 's just above the 1500 in response time,
                  it will constantly be shunned and re-joined.

                  Post the current configuration of your cluster-service.xml file... probably i'll be able to help you further.

                  Milton


                  • 6. Re: Delayed joining of partition

                    I'm using the cluster-service.xml right out of the box.