5 Replies Latest reply on Aug 25, 2005 1:06 PM by davewebb

    My application hangs when more than one node in the cluster

    gshekar

      Hi,

      My application hangs when more than one node in the cluster is active. The application works fine when there is only one active node in the cluster. Please help.

      I have clustered 3 instances of jboss running on 3 solaris 10 servers.
      I am using mod_jk with apache to talk to the cluster. Find below the mod_jk configuration.

      httpd.conf
      ------------
      LoadModule jk_module /data/bronze-web/modules/mod_jk.so

      <IfModule mod_jk.c>
      JkWorkersFile /data/bronze-web/node02/workers.properties
      JkLogFile /logs/bronze-web/node02/mod_jk.log
      JkShmFile /logs/bronze-web/node02/jk-runtime-status
      JkLogLevel debug
      JkLogStampFormat "[%a %b %d %H:%M:%S %Y] "
      JkOptions +ForwardKeySize +ForwardURICompat -ForwardDirectories
      JkRequestLogFormat "%w %V %T"
      JkMount /* loadbalancer


      workers.properties
      ----------------------
      worker.list=loadbalancer
      worker.loadbalancer.type=lb
      worker.loadbalancer.balanced_workers=node01,node02,node03

      worker.node01.port=8009
      worker.node01.host=bronze-app-dev-01.am.health.ge.com
      worker.node01.type=ajp13
      worker.node01.lbfactor=1

      worker.node02.port=8009
      worker.node02.host=bronze-app-dev-02.am.health.ge.com
      worker.node02.type=ajp13
      worker.node02.lbfactor=1

      worker.node03.port=8009
      worker.node03.host=bronze-app-dev-03.am.health.ge.com
      worker.node03.type=ajp13
      worker.node03.lbfactor=1


      Thanks & Regards,
      -GnanaShekar-

        • 1. Re: My application hangs when more than one node in the clus

          I assume you are running from "all" config? You can make sure your mod_jk works first by running from "default" (without http session replication). If it works, turns on the log (org.jboss.web.tomcat) and you can see the session repl info to troubleshoot.

          -Ben

          • 2. Re: My application hangs when more than one node in the clus
            gshekar

            Hi Ben,

            All the nodes are running from all config. I have done the Basic cluster test following the guidelines in: http://wiki.jboss.org/wiki/Wiki.jsp?page=BasicClusterTest .

            When I first enter the URL, the web page does not appear, but when I refresh it appears. So the application does not hang, instead responds when I click twice or click the refresh button on the web browser.

            Thanks & Regards,
            -GnanaShekar-

            • 3. Re: My application hangs when more than one node in the clus
              gshekar

              Hi,

              We are using 3 identical solaris 10 servers for 3 jboss nodes. In one of the nodes jboss throws the following exception while start. This is the node which when isolated from the cluster, the application works fine. I have setup all the 3 nodes in identical way.. I don't know what makes one of the jboss instance throw this exception... Please suggest/help.

              Thanks & Regards,
              -GnanaShekar-

              2005-08-18 09:45:45,896 DEBUG [org.jboss.iiop.CorbaORBService] Ignoring sunJDK14IsLocalBugFix=true due to inability to load org.jboss.iiop.SunJDK14IsLocalBugFix
              java.lang.ClassNotFoundException: Unexpected error during load of: org.jboss.iiop.SunJDK14IsLocalBugFix, msg=com/sun/corba/se/internal/iiop/ShutdownUtilDelegate
              at org.jboss.mx.loading.RepositoryClassLoader.loadClassImpl(RepositoryClassLoader.java:512)


              2005-08-18 09:45:55,069 DEBUG [org.jboss.mq.pm.jdbc2.PersistenceManager] Could not create table with SQL: CREATE CACHED TABLE JMS_MESSAGES ( MESSAGEID INTEGER NOT NULL, DESTINATION VARCHAR(255) NOT NULL, TXID INTEGER, TXOP CHAR(1), MESSAGEBLOB OBJECT, PRIMARY KEY (MESSAGEID, DESTINATION) )
              java.sql.SQLException: Table already exists: JMS_MESSAGES in statement [CREATE CACHED TABLE JMS_MESSAGES]
              at org.hsqldb.jdbc.Util.throwError(Unknown Source)
              at org.hsqldb.jdbc.jdbcPreparedStatement.executeUpdate(Unknown Source)

              2005-08-18 09:45:55,071 DEBUG [org.jboss.mq.pm.jdbc2.PersistenceManager] Could not create table with SQL: CREATE CACHED TABLE JMS_TRANSACTIONS ( TXID INTEGER, PRIMARY KEY (TXID) )
              java.sql.SQLException: Table already exists: JMS_TRANSACTIONS in statement [CREATE CACHED TABLE JMS_TRANSACTIONS]
              at org.hsqldb.jdbc.Util.throwError(Unknown Source)
              at org.hsqldb.jdbc.jdbcPreparedStatement.executeUpdate(Unknown Source)

              • 4. Re: My application hangs when more than one node in the clus
                yaronr

                It's been a while since I looked at JBoss clustering code, but it sounds like a problem I encountered in the past:

                The problem is with the HTTP session replication:
                When you login (or, do something that reads or writes to the HTTP Session context), JGroups tries to replicate it to all other nodes in the cluster.
                What JGroups does, is send a message to all the cluster members (and wait for an ACK).

                If, for some reason, it failes to get a receive enough ACKs (if there are 10 cluster members, it should receive 10 ACKs) then it waits until some timeout expires.

                This timeout used to be hard coded to 60 seconds, and I think it was changed to being configurable form the XML file.

                Also, this used to happen when for some reason, we had the same server multiple times in the node list of the cluster (and therefore the sender doesn't receieve enough ACKs).

                Try moving some of the JGroups logs to DEBUG and see if all the servers have the same view of the cluster.

                • 5. Re: My application hangs when more than one node in the clus
                  davewebb

                  You have to use StickySession with mod_jk and make sure your session replication in tc5-cluster-service.xml is set to REPL_ASYNC for performance gains.

                  Add this to you mod_jk configuration:

                  worker.loadbalancer.sticky_session=1


                  Use this in you tc5-cluster-service.xml

                  <attribute name="CacheMode">REPL_ASYNC</attribute>


                  shutdown apache
                  shutdown cluster
                  startup cluster
                  start apache

                  That should do it...work for me! Good luck!