1 2 Previous Next 18 Replies Latest reply on Aug 28, 2002 9:06 AM by slaboure

    Cluster Member Starvation

    groovesoftware

      I am experiencing a situation where some members of a cluster are not getting any requests. The following is my configuration.

      OS: Solaris
      Java version: 1.3
      JBoss version: jboss_3.0.1RC1-tomcat_4.0.4

      I have 2 partitions. One called ProdEJB has only SFSB and SLSB clustering turned on. The other called ProdJNDI has only JNDI clustering turned on. At present all servers live on the same physical computer. Each server participating in the ProdJNDI partition is running HA-JNDI on a different port.

      5 or 6 servers participate in the ProdEJB partition. All of the 5 or 6 servers that participate in the ProdEJB partition also participate in the ProdJNDI partition. We refer to these servers as EJB servers.

      There is one other server which runs servlets only and no EJBs that participates in the ProdJNDI partition. The servlets in this server use ejb-ref declarations to talk to EJBs in the ProdEJB partition. Each ejb-ref points to a JNDI name in HA-JNDI on this server. We refer to this server as the web server.

      So you can see that for each EJB looked up by the web server the lookup will talk to HA-JNDI running in the web server. HA-JNDI will not find the EJB name in the global namespace or the local namespace so it will begin asking the other (EJB) servers if they have the name in their local namespace. The first EJB server it asks will have the name and will return a clustered home interface. The web server now has a reference to the clustered home of an EJB. Calls to the create methods in the home interface should be made round-robin on the 5 or 6 EJB servers. For SLSB calls to any remote method should be made round-robin on the 5 or 6 EJB servers.

      One final note is that the web server holds on to the home reference in a Service Locator object rather than looking it up each time it is needed.

      With this setup we are seeing the following behavior.
      1. With 4 or less servers every server receives calls made to create methods in the home or business methods in the remote in a more or less fair manner. Over time all servers get roughly the same number of requests.
      2. With 5 servers 1 of the servers gets starved. In other words it never gets any requests. In addition to this some other server gets about twice as many requests as the other servers. The remaining 3 servers get roughly the same number of requests.
      3. With 6 servers 2 of the servers get starved and some other server gets about twice as many requests as the other servers.

      The particular servers that get starved seems fairly random but somewhat related to the order in which I start the servers.

      I realize that this is a somewhat more complicated clustering situation than most people are using, but it is reasonable, and suits our needs. It enables us to have 1 (and later more) servers handling servlets and talking to a cluster of EJB servers. This will perform well in our environment.

      Has anyone else seen problems like this with the Round Robin algorithms? Any ideas?


        • 1. Re: Cluster Member Starvation
          groovesoftware

          I have narrowed this down further to being a problem with the round-robin load balancing on home interfaces. Here's how to reproduce it.

          - Create 6 server instances on the same server. These servers can be the same configuration as 'default'. You will need to change port numbers for each server so that none conflict.
          - In 5 of the configurations place the attached ejb-cluster-service.xml file. I call these servers EJB servers.
          - The 6th server I call the web server.
          - In all 6 servers place the attached jndi-cluster-service.xml file. You will need to change the port in this file for each server so that there are no conflicts.
          - un-jar the attached jar. Change the jnp url in rr/resources/war/jnpport/WEB-INF/jboss-web.xml to use the port specified for HA-JNDI on the web server.
          - run Ant in rr.
          - copy rr/build/jnpport/jnpport.jar to the deploy directory of the EJB servers.
          - copy rr/build/jnpport/jnpport.war to the deploy directory of the web server.
          - start all 6 servers
          - To test you can use a URL of the form
          http://server:port/jnpport/hello.jsp?homeCount=1&beanCount=1&callCount=1
          You can change the parameters to change the behavior. The JSP looks up the home interface to an ejb homeCount times. For each home it looks up it creates beanCount beans. For each bean it creates it calls a method callCount times. All the bean does is log some output so that you can see which server handled the call by looking at the log files.

          I have found the following behavior with this example.

          - homeCount=50 beanCount=1 callCount=1
          All calls go to the same server. This is expected and I do not believe it is a bug.

          - homeCount=1 beanCount=1 callCount=50
          All calls are evenly distrubuted to all EJB servers.

          - homeCount=1 beanCount=50 callCount=50
          Calls are not evenly distrubuted to the EJB servers. 1 or more servers will not receive any calls at all. This is a bug because some of the servers are being left out of the load balancing.

          Matt

          • 2. Re: Cluster Member Starvation
            groovesoftware

            Files are attached.

            • 3. Re: Cluster Member Starvation
              ironstorm

              This is perhaps a silly question, but does this problem
              still occur if you split out an instance (or two) on to a
              seperate machine?

              • 4. Re: Cluster Member Starvation
                groovesoftware

                I do not know. We do not plan on running on seperate machines at this time. We use multiple servers running on the same machine because of the better performance it gives us on a multi-processor machine.

                • 5. Re: Cluster Member Starvation
                  joao.clemente

                  Sorry for not helping you, but I would like some info on what need to be done to get other instances running on the same machine. As you've got several instances on a single machine, you should know this details from bottom up...

                  And I've got another question: What balances your jndi calls? As I see, the jndi gives you a home proxy that knows every ejb on the cluster and so is able to balance... But how are the multiple jndi servers reached?

                  Thank you
                  Joao Clemente
                  INESC - Portugal

                  • 6. Re: Cluster Member Starvation
                    groovesoftware

                    > ...I would like some info
                    > on what need to be done to get other instances
                    > running on the same machine....

                    This is the basics of how I accomplished this.
                    1. make recursive copies of server/default to server/??? and give them appropriate names.
                    2. Change ports in server/???/conf/jboss-service.xml, server/???/deploy/tomcat4-service.xml and server/???/deploy/hsqldb-service.xml so that no two instances use the same ports.
                    3. use 'bin/run.sh --configuration ???' to run the new configurations.

                    >
                    > And I've got another question: What balances your
                    > jndi calls? As I see, the jndi gives you a home proxy
                    > that knows every ejb on the cluster and so is able to
                    > balance... But how are the multiple jndi servers
                    > reached?

                    I am using an HA-JNDI cluster as defined in the file I attached. The HA-JNDI tree does not know about the EJBs, but when it doesn't find a name it delegates to the local JNDI on every server in the cluster. This allows me to get a clustered home. In this configuration the HA-JNDI does not do any load balancing, only failover.

                    Is anyone from the JBoss group who is familiar with clustering reading this? This is a real problem in a real-world situation which is intended to go into production soon. Any ideas why the load balancing does not work on large numbers of servers?

                    Matt

                    • 7. Re: Cluster Member Starvation
                      slaboure

                      yes, I agree, it is a problem if you have this behaviour. You have more info or an easy reproducible case so I can spend time on debugging instead of re-creating your case?

                      thank you. Cheers,



                      sacha

                      • 8. Re: Cluster Member Starvation
                        slaboure

                        oups, I went to quickly to the end of the thread. I see that you already provide all necessary files. I will take a look at that.

                        Cheers,


                        sacha

                        • 9. Re: Cluster Member Starvation
                          joao.clemente

                          This is certainly nothing, but in the instruction you gave me (groovesoftware) you said you did

                          1. make recursive copies of server/default to server/???

                          and I just checked with the article Bill Burk and Sacha published at Oreilly's (July 10) and they state in Clustering JBoss that
                          Clustering is only enabled in the all configuration

                          As you have been able to run it with fewer instances the article can be wrong.. but I just wondered if maybe the "all" configuration has something that you are missing in yours...

                          Can you comment on this article issue, Sacha? Thank you

                          Joao Clemente

                          • 10. Re: Cluster Member Starvation
                            joao.clemente

                            I apologize to both of you for my last post.
                            Please ignore it.
                            I just read the QuickStartGuide that was on sourceforge and understood the nonsense I was saying.
                            Shame on me...

                            • 11. Re: Cluster Member Starvation
                              skysea

                              i have the same problem, but just jboss servers are runing on different computer.There are very strange behaviors that the requests from the client are responsed from one node, and others are free.However,if you turn off the node, others act.Why?

                              • 12. Re: Cluster Member Starvation
                                seanx

                                To me, I had to change the extra ports in the following files to avoid port conflict.
                                deploy/jmx-html-adaptor.sar
                                conf/jacorb.properties
                                conf/standardjboss.xml
                                deploy/jbossmq-service.xml
                                specify different HA-JNDI port for each instance in cluster-services.xml or jboss-service.xml.

                                I am just curious. Do you need to do these ports?

                                thanks.

                                • 13. Re: Cluster Member Starvation
                                  groovesoftware

                                  Most of these sound right, although I don't believe some of these exist in the default configuration which is what I used. Basically you need to change all conflicting ports in each of your servers.

                                  • 14. Re: Cluster Member Starvation
                                    seanx

                                    I confirmed the same behavior. I have two servers installed on one hosts. And two simultaneous clients are sent to the first server.

                                    I also observe some weird behavior. Please refer to my later posting for detail.

                                    1 2 Previous Next