6 Replies Latest reply on Aug 2, 2007 4:16 AM by atomicgun

    Out of the box clustering won't work

    mmelsen

      Hi everyone,

      I hope you can help me out. The following is the case:

      At work we want to set up a cluster made up off four machines. Two machines have our front office software on it and the other two have our mid office software installed. In case one should fail the other machine should take it over.

      I've read everywhere that jboss out-of-the-box should detect another jboss instance without any config. I've tried this and it worked on my windows computer that i clustered with a collegeaus computer. In the logging I saw it detected 2 cluster members and that they found eachother.

      After this i wanted to install this clustering on our linux machines but one way or another, they can't find eachother and telling me in the logging that they are first in cluster and that there is only one member in the cluster.

      I've checked for firewalls, they're off
      I've checked if port 1100 and 1102 were listening, they were
      I've checked if both machine could ping eachother, they can
      I've checked logging but it didn't show any errors
      I've added bindaddr="ip1, ip2" to cluster-view.xml but that didn't work either

      but one way or another it just isn't able to start out - of - the -box.

      I hope someone can help me, cause I really don't know what to do anymore.

        • 1. Re: Out of the box clustering won't work
          brian.stansberry

          See http://wiki.jboss.org/wiki/Wiki.jsp?page=TestingJBoss. Note in particular the links at the bottom that take you to a page that tells you how to test multicast.

          • 2. Re: Out of the box clustering won't work
            mmelsen

            Hi thanks for your reply,

            ive downloaded the jgroups bin and extracted it. I've added the location of the jar files (javagroups-all.jar, jms.jar etc) to the classpath. when I execute this command:

            java McastReceiverTest.class -mcast_addr 228.1.2.3 -port 45566

            it's telling me:

            Exception in thread "main" java.lang.NoClassDefFoundError: McastReceiverTest/class

            my dir is /home/jboss/JavaGroups-2.1.1.bin and this is the classpath setting:


            PATH=/usr/kerberos/bin:/usr/java/j2sdk1.4.2_13//bin:/usr/local/bin:/bin:/usr/bin:/home/jboss/bin:/home/jboss/JavaGroups-2.1.1.bin


            CLASSPATH=/usr/java/j2sdk1.4.2_13/:/home/jboss/JavaGroups-2.1.1.bin

            any idea?

            • 3. Re: Out of the box clustering won't work
              mmelsen

              hi

              forget my last reply, i've got these multicast test working and i'm seeing all kind of action.

              Also i've added the attribute bind_addr in cluster-view.xml and added the ip of the local machine. when i start jboss it says that it is finding two members, but unfortunately it's two times the local machine.

              What do I have to do to let it find the other machine?

              • 4. Re: Out of the box clustering won't work
                mmelsen

                this is the logging:

                -------------------------------------------------------
                GMS: address is 10.0.10.144:37553 (additional data: 14 bytes)
                -------------------------------------------------------
                2007-03-13 10:21:11,659 DEBUG [org.jboss.ha.framework.server.ClusterPartition] Starting channel
                2007-03-13 10:21:11,660 DEBUG [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] get nodeName
                2007-03-13 10:21:11,667 DEBUG [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] Get current members
                2007-03-13 10:21:11,668 INFO [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] Number of cluster members: 2
                2007-03-13 10:21:11,669 INFO [org.jboss.ha.framework.interfaces.HAPartition.lifecycle.DefaultPartition] New cluster view for partition DefaultPartition (id: 1, delta: 0) : [127.0.0.1:1099, 127.0.0.1:1099]
                2007-03-13 10:21:11,669 INFO [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] Other members: 1
                2007-03-13 10:21:11,669 INFO [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] Fetching state (will wait for 30000 milliseconds):
                2007-03-13 10:21:11,678 DEBUG [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] membership changed from 2 to 2
                2007-03-13 10:21:11,678 DEBUG [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] Begin notifyListeners, viewID: 1
                2007-03-13 10:21:11,679 INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.DefaultPartition] I am (null) received membershipChanged event:
                2007-03-13 10:21:11,679 INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.DefaultPartition] Dead members: 0 ([])
                2007-03-13 10:21:11,679 INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.DefaultPartition] New Members : 0 ([])
                2007-03-13 10:21:11,679 INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.DefaultPartition] All Members : 2 ([127.0.0.1:1099, 127.0.0.1:1099])
                2007-03-13 10:21:11,679 DEBUG [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] End notifyListeners, viewID: 1
                2007-03-13 10:21:11,755 DEBUG [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] State was retrieved successfully
                2007-03-13 10:21:11,755 DEBUG [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] setState called
                2007-03-13 10:21:11,760 DEBUG [org.jboss.ha.framework.server.ClusterPartition] Started ClusterPartition: DefaultPartition
                2007-03-13 10:21:11,760 DEBUG [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] Begin AsynchViewChangeHandler
                2007-03-13 10:21:11,760 DEBUG [org.jboss.ha.framework.server.ClusterPartition] Started jboss:service=DefaultPartition
                2007-03-13 10:21:11,760 DEBUG [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] Begin notifyListeners, viewID: 1
                2007-03-13 10:21:11,760 DEBUG [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] End notifyListeners, viewID: 1
                2007-03-13 10:21:11,760 DEBUG [org.jboss.system.ServiceController] Starting dependent components for: jboss:service=DefaultPartition dependent components: [ObjectName: jboss:service=HASessionState
                 State: CREATED
                 I Depend On:
                 jboss:service=DefaultPartition
                , ObjectName: jboss:service=HAJNDI
                 State: CREATED
                 I Depend On:
                 jboss:service=DefaultPartition
                 jboss.system:service=ThreadPool
                , ObjectName: jboss.cache:service=InvalidationBridge,type=JavaGroups
                 State: CREATED
                


                • 5. Re: Out of the box clustering won't work
                  mmelsen

                  I've found the problem.

                  seems like i had to add a route net 224.0.0.0 netmask 240.0.0 and also had to add bind_addr="ip" in the tc5-cluster-service.xml. I only changed the cluster-service.xml and therefore it didn't work. So i changed the cluster-service.xml back to it's old values and added bind_addr in the tc5-clusterview.xml. Seems this is necessary on my linux machines

                  • 6. Re: Out of the box clustering won't work
                    atomicgun

                    I am having this problem and wonder anyone can help me on this.

                    I have setup 4 different cluster nodes on 4 different machines(A, B, C, D) for testing. There is this machine A cannot be detected as cluster member (i.e. A will form its own cluster) and B, C, and D can see each other. I was wondering what happened, then I realized A was actually seated on a different network switch. I tried to put A's cable to the same switch B/C/D have used and now A get detected.

                    any idea why and how to resolve this if A has to be on the different switch?