11 Replies Latest reply on Nov 13, 2012 11:43 AM by rhusar

    Jboss 4 clustering problem

      Hi, 

       

      We have two servers running using jboss 4.0.0 (in redhat enterprise 5.0) and joined as a cluster.  It worked fine until we updated the server cpu from one quad core cpu to two quad core cpu.  After the update the cluster is not working any more after running 20-30 minutes.  Two servers worked as independent server. Any idea of what might be wrong and how it can be fixed. Will really appreciate if anyone can help.

       

      Here is the jgroup configration of cluster-service.xml (see attached file)

      ...

      <!-- The JGroups protocol configuration -->

          <attribute name="PartitionConfig">

            <Config>

              <!-- UDP: if you have a multihomed machine,

                   set the bind_addr attribute to the appropriate NIC IP address -->

              <!-- UDP: On Windows machines, because of the media sense feature

                   being broken with multicast (even after disabling media sense)

                   set the loopback attribute to true -->

              <UDP mcast_addr="225.0.0.10" mcast_port="45566"

                   ip_ttl="32" ip_mcast="true"

                   mcast_send_buf_size="800000" mcast_recv_buf_size="150000"

                   ucast_send_buf_size="800000" ucast_recv_buf_size="150000"

                   loopback="false" />

              <PING timeout="2000" num_initial_members="3"

                    up_thread="true" down_thread="true" />

              <MERGE2 min_interval="10000" max_interval="20000" />

              <FD shun="true" up_thread="true" down_thread="true"

                  timeout="2500" max_tries="5" />

              <VERIFY_SUSPECT timeout="3000" num_msgs="3"

                              up_thread="true" down_thread="true" />

              <pbcast.NAKACK gc_lag="50" retransmit_timeout="300,600,1200,2400,4800"

                             max_xmit_size="8192"

                             up_thread="true" down_thread="true" />

              <UNICAST timeout="300,600,1200,2400,4800" window_size="100" min_threshold="10"

                       down_thread="true" />

              <pbcast.STABLE desired_avg_gossip="20000"

                             up_thread="true" down_thread="true" />

              <FRAG frag_size="8192"

                    down_thread="true" up_thread="true" />

              <pbcast.GMS join_timeout="5000" join_retry_timeout="2000"

                          shun="true" print_local_addr="true" />

              <pbcast.STATE_TRANSFER up_thread="true" down_thread="true" />

            </Config>

          </attribute>

      ...

       

      Thanks a lot!

        • 1. Re: Jboss 4 clustering problem
          wdfink

          Hello, welcome to the forum.

           

          I suppose there are not only changes of the cpu.

          To analyze the problem you need to provide a bit more information.

          How do you start JBoss, do you use the standard configuration or apply changes. Do you see errors/warnings within the logfiles?

          • 2. Re: Jboss 4 clustering problem
            rhusar

            +1 

            I suppose there are not only changes of the cpu.

            Its very very unlikely that only changing the CPU would make difference in clustering. Try to rethink or get some info as to what else changed. If nothing -- then it would probably be runtime setting that has changed after the previos boot, e.g. very likely stuff like firewall remained enabled, etc.

             

            Rado

            • 3. Re: Jboss 4 clustering problem

              Thank you both for answering my question.  The server is hosted in third party.  The only difference between previous day working fine and next day cluster having problem is, the server cpu from our view (unless there is other problem in server side).  We can't see any message relating to the clustering from the log files. We notice the database error "Row was updated or deleted by another transaction" in the logs.  That is because two servers work not as a cluster but independent server.  From your experience, in what kind of circumstance can cause the cluster broken? Do you think any of the setting in cluster-service.xml need to tune up? Any program we can run to test the cluster?  Really appreciate your help.

              • 4. Re: Jboss 4 clustering problem
                wdfink

                If you use the standard cluster configuration the detection will use IP-multicasting. Often this is the problem if a cluster can not find all nodes.

                It might happen that it change the behavior with other mcast addresses (-u option) or with a different binding (-b option)

                 

                You might use native JGroups to test multicase, see TestingJBoss.

                • 5. Re: Jboss 4 clustering problem

                  yes, we use -b option to start the server, the ip is server's ip.  So teh reason could be bacause the network issue between these two servers?  Do you think tcp will be better than udp?  Will try to test the native jgroup later.  Today, when the extra cpu is remove, it seems the cluster is working fine again.  It was really hard to explain what was the problem.   Thank you very much for your help.

                  • 6. Re: Jboss 4 clustering problem
                    wdfink

                    TCP is only recommended if you can not use the UDP. But I would recommend to use the (default) UDP if it is possible.

                     

                    ATM I did not understand what come into play with the CPU change. Maybe there are other changes under the hood (e.g. replace the complete system with a similar configuration, I often see that the understanding is very different)

                    • 7. Re: Jboss 4 clustering problem

                      Thanks for the recomendation.  Yes, we are using the UDP and cluster is working fine since we reverse teh change of cpu.  It could be something else changed in server when the CPU is added, but it's out of our control.

                       

                      Since we are using 4.0.0, do you think the jboss cluster will be more robust in 4.2.3 version?  We may need to upgrade the jboss in near future.

                       

                      Thanks again. 

                      • 8. Re: Jboss 4 clustering problem
                        wdfink

                        You will get bugfixes and new features, but there is nothing completly different. There might be fixes for some cases in the clustering but not if there are network or configuration problems in general.

                        From my experience such problems where the cluster will not detect other nodes it will be a environment issue in most of the cases.

                        • 9. Re: Jboss 4 clustering problem
                          rhusar

                          It could be something else changed in server when the CPU is added, but it's out of our control.

                          Sounds to me like they are changing the entire machine in the lab, so you get into networking problems where multicasting is not routed between these 2 machines. I really find it unlikely that someone is shutting down a machine and only taking out the CPU back and forth.

                           

                          Since we are using 4.0.0, do you think the jboss cluster will be more robust in 4.2.3 version?  We may need to upgrade the jboss in near future.

                          Absolutely. Don't forget to test on the new version before deploying to production.


                          R

                          • 10. Re: Jboss 4 clustering problem

                            Thanks for the reply.  Recently we also found out when the loading (for example heavy database related actions) is heavy, the cluster is broken (see following logs).. The slave server restarted itself, but after the restart, it can't join the cluster any more.  We have to stop both servers, start the master server, start the other server only after the master server is fully started, then the cluster is working fine. Just wonder if there is any configuration change (i.e. in crease timeout etc) that can prevent the broken cluster happen, do you think adding a third server will help in this case?  Eventually we will migrate to jboss 4.2.3, but need more testing before that.

                             

                            [org.jboss.ha.framework.interfaces.HAPartition.lifecycle.DefaultPartition] New cluster view for partition DefaultPartition (id: 0, delta: -1) : [172.16.0.1:11009]

                            [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.DefaultPartition] I am (172.16.0.1:1100) received membershipChanged event:

                            [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.DefaultPartition] Dead members: 1 ([172.16.0.2:1100])

                            [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.DefaultPartition] New Members : 0 ([])

                            [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.DefaultPartition] All Members : 1 ([172.16.0.1:11009])

                            • 11. Re: Jboss 4 clustering problem
                              rhusar

                              With sucn an old and unmaintaned version of AS, it could be easily a bug. Can you post more logs as to figure out why is that member dead?