11 Replies Latest reply on Dec 4, 2012 11:44 AM by wdfink

    jboss 3.2.6 clustering problem

    tumux

      Hello,

      I have working cluster from 3 jboss 3.2.6 (version upgrade is not an option) nodes. Now i have to add one more jboss node to the cluster.

      I have 4th jboss 3.2.6 configured on new server. It is starting and runing ok - but old cluster nodes are not acdepting new "friend" into cluster.

       

      jgroups version : 2.2.7( $Id: Version.java,v 1.13 2004/08/19 12:37:36 belaban Exp $)

       

      Here are few lines from node 4 startup log:

       

      2012-12-02 12:37:22,658 INFO  [org.jboss.ha.framework.interfaces.HAPartition.WorkflowPartition] Initializing

      2012-12-02 12:37:22,690 INFO  [org.jgroups.protocols.UDP] unicast sockets will use interface 192.168.21.14

      2012-12-02 12:37:22,693 INFO  [org.jgroups.protocols.UDP] socket information:

      local_addr=node4:34587 (additional data: 18 bytes), mcast_addr=228.1.2.3:45566, bind_addr=/192.168.21.14, ttl=32

      sock: bound to 192.168.21.14:34587, receive buffer size=131071, send buffer size=131071

      mcast_recv_sock: bound to 192.168.21.14:45566, send buffer size=131071, receive buffer size=131071

      mcast_send_sock: bound to 192.168.21.14:49913, send buffer size=131071, receive buffer size=131071

      2012-12-02 12:37:25,724 INFO  [org.jboss.ha.framework.interfaces.HAPartition.WorkflowPartition] Number of cluster members: 1

      2012-12-02 12:37:25,725 INFO  [org.jboss.ha.framework.interfaces.HAPartition.WorkflowPartition] Other members: 0

      2012-12-02 12:37:25,725 INFO  [org.jboss.ha.framework.interfaces.HAPartition.WorkflowPartition] Fetching state (will wait for 60000 milliseconds):

      2012-12-02 12:37:25,781 INFO  [org.jboss.ha.jndi.HANamingService] Listening on /0.0.0.0:1050

       

      so at startup point i see, that it is initialising wrong - is not counting all cluster members (Number of cluster members: 1)

       

      after startup - in other nodes server.log i see such :

       

      2012-12-02 12:53:58,647 WARN  [org.jgroups.protocols.pbcast.NAKACK] [noe3:44707 (additional data: 18 bytes)] discarded message from non-member 192.168.21.14:52874 (additional data: 18 bytes)

      2012-12-02 12:53:58,851 WARN  [org.jgroups.protocols.pbcast.NAKACK] [node3:44707 (additional data: 18 bytes)] discarded message from non-member 192.168.21.14:52874 (additional data: 18 bytes)

      2012-12-02 12:54:36,572 WARN  [org.jgroups.protocols.pbcast.NAKACK] [node3:44707 (additional data: 18 bytes)] discarded message from non-member 192.168.21.14:52874 (additional data: 18 bytes)

      2012-12-02 12:54:40,972 WARN  [org.jgroups.protocols.pbcast.NAKACK] [node3:44707 (additional data: 18 bytes)] discarded message from non-member 192.168.21.14:52874 (additional data: 18 bytes)

       

      cluster-service.xml - configuration is a copy from one working node.

       

      I've made tests with org.jgroups.tests.McastSenderTest/org.jgroups.tests.McastReceiverTest. As described in documentation - it means multicast is working fine.

       

      Can someone explain or point me to some documentation / discusion thread where I could read about what-to-do next. Need to make this 4 node cluser, from existing 3.

       

      Thank you in advance, will try to add more info if requested.

       

      Tomas

        • 1. Re: jboss 3.2.6 clustering problem
          tumux

          i've enabled debuging of jgroups in jboss, here is the output of startup part on node4

           

          2012-12-02 17:33:05,198 DEBUG [org.jgroups.protocols.UDP] created unicast receiver thread

          2012-12-02 17:33:05,202 DEBUG [org.jgroups.protocols.PING] FIND_INITIAL_MBRS

          2012-12-02 17:33:05,205 DEBUG [org.jgroups.protocols.PING] waiting for initial members: time_to_wait=2000, got 0 rsps

          2012-12-02 17:33:05,205 DEBUG [org.jgroups.protocols.UDP] sending message to 228.1.2.3:45566 (src=192.168.21.14:55508 (additional data: 18 bytes)), headers are {UDP=[UDP:group_addr=WorkflowPartition], PING=[PING: type=GET_MBRS_REQ, arg=null]}

          2012-12-02 17:33:05,206 DEBUG [org.jgroups.protocols.UDP] received (mcast) 121 bytes from /192.168.21.14:46757 (size=121 bytes)

          2012-12-02 17:33:05,209 DEBUG [org.jgroups.protocols.UDP] message is [dst: 228.1.2.3:45566, src: node4:55508 (additional data: 18 bytes) (2 headers), size = 0 bytes], headers are {UDP=[UDP:group_addr=WorkflowPartition], PING=[PING: type=GET_MBRS_REQ, arg=null]}

          2012-12-02 17:33:07,205 DEBUG [org.jgroups.protocols.PING] initial mbrs are []

          2012-12-02 17:33:07,205 DEBUG [org.jgroups.protocols.pbcast.ClientGmsImpl] initial_mbrs are []

          2012-12-02 17:33:07,205 DEBUG [org.jgroups.protocols.pbcast.ClientGmsImpl] no initial members discovered: creating group as first member

           

          and this output is when there is live working cluster of other 3 nodes

          • 2. Re: jboss 3.2.6 clustering problem
            wdfink

            Hi Tomas,

             

            that's quite an old version and difficult to remember

             

            I suppose that the configuration is correct, as you have still three other members which are working (do you see Members=3 for the others?).

             

            In most cases the problem is releated to IP multicasting.

            Are network-HW or firewall inbetween? If yes the mcast address 228.1.2.3 might be blocked.

            What if you copy the instance to the same server and start it there does it join the cluster?

            • 3. Re: jboss 3.2.6 clustering problem
              tumux

              Hi, Wolf,

               

              This was my last test-case, I've made it today. Started new jboss instance on one of old nodes, and it was clustered fine. This way i've tested/decided that the problems is in IP/network level with this new node. System admin checked/changed some parameteres in node4 firewall - and now it's working. Even restart of node was not neccesary.

               

              so conclusion - case is solved.

               

              I just was misled by the test results of

               

              "I've made tests with org.jgroups.tests.McastSenderTest/org.jgroups.tests.McastReceiverTest. As described in documentation - it means multicast is working fine."

               

              this test should have showed that the problem is in network level, but did not.

               

              anyway, thank you for the replay. Hope this stream will help someone.

              • 4. Re: jboss 3.2.6 clustering problem
                wdfink

                Do you start the Mcast test only at this host?

                You have to use the sender on one cluster node and the receiver on the other. Also different IP or mcast addresses might work without problems at the same phys. box

                 

                You can use this test in the TestingJBoss wiki

                • 5. Re: jboss 3.2.6 clustering problem
                  tumux

                  I've performed mcast test on all nodes, just with different mcast prot than in production. maybe this was the reason (different port) why the test worked.

                  • 6. Re: jboss 3.2.6 clustering problem
                    tumux

                    But the story is not over. This is absolutely other problem (i think), and maybe already answered somewhere, but anyway - posting it as the end of current stream:

                     

                     

                    After adding new node (Actually to be precise it's 5th node) - all cluster farming becomes so slow...

                     

                    1. I'm starting new node. It starts up and enters the cluster.

                    2. I'm restarting old cluster nodes (one by one) - and it takes about 12-15 minutes each, instead of 1 minute (when there is no new cluster node).

                    all this long time jboss is doing smth in

                    INFO  [FarmMemberService] **** pullNewDeployments ****

                    deploying...

                     

                    3. I'm removing new node from the cluster (change parameters in cluster-service.xml)

                    4. restart of old nodes takes as usual 1 minute, with no delay in farming step.

                     

                     

                    So i'm in a search for answer again... and if someone can direct me some right way - would be big thanks.

                    • 7. Re: jboss 3.2.6 clustering problem
                      rhusar

                      Actually, this could be simple to fix: I would discourage use of farming service. There were known issues in the implementation even in AS 4.0 (see e.g. https://issues.jboss.org/browse/JBAS-6879). The service was IIRC never supported and is discontinued from AS 7.

                       

                      Even though you are using an old version, this is a safe change: copy the deployment to all nodes manually and dont use farming service.

                      • 8. Re: jboss 3.2.6 clustering problem
                        tumux

                        Hi, Radoslav,

                         

                        Thank you for replay. I'm still new in jboss. So one question - will clustering work fiine without farming?

                         

                        are those two features not connected ? clustering and farming?

                        • 9. Re: jboss 3.2.6 clustering problem
                          rhusar

                          Session clustering itself and farming are independent:

                           

                          Farming distributes deployments from the /farm folder to the other members of the cluster. So its enough to deploy only on one server. The same thing you can achieve by copying the jar on the filesystem.

                          Clustering takes care of HA in form HTTP/EJB session replication, singleton deployments, HAPartition abstraction, etc etc.

                           

                          (I hope I am remembering this correctly, been a while )

                          • 10. Re: jboss 3.2.6 clustering problem
                            tumux

                            thank you for the replay. possible that you have saved me lots of time.

                            • 11. Re: jboss 3.2.6 clustering problem
                              wdfink

                              You might remove the farming service as well, I think it is deploy/deploy.last/farm-service.xml in AS4.

                              might be similar in AS3