5 Replies Latest reply on Mar 30, 2011 5:55 AM by joaocunhalopes

    mod_cluster 1.1.1 Moving to Production

    joaocunhalopes

      After testing mod_cluster 1.1.1 on several different environments we decided to move it to production.

      On the frontend we are running one Apache HTTP 2.2.17 server, running on Windows Server 2008 R2 (64 bit).

      The installed Apache is the 32 bit version. The mod_cluster modules installed are also the 32 bit version modules.

       

      On the backend we have two app servers running JBoss 5.1.

       

      Between the frontend and the backend we have a firewall but there are no known rules implemented that would make this test fail. The Apache server is able to talk the the JBoss servers and the JBoss servers are able to talk to the Apache server.

       

      Here's what our tests are showing, when trying out the load balancing demo app.

      1.JPG

       

      2.JPG

       

       

      It seems that the Apache server starts well and that after 10 sessions it just stops responding.

      Our Apache test configuration is:

       

      ServerRoot "D:/Apache2.2"

       

      # Required for Apache startup

      LoadModule authz_host_module modules/mod_authz_host.so

       

      # Module for server-status

      LoadModule status_module modules/mod_status.so

      ExtendedStatus On

       

      # Modules for JBoss mod_cluster

      LoadModule proxy_module modules/mod_proxy.so

      LoadModule proxy_ajp_module modules/mod_proxy_ajp.so

      LoadModule slotmem_module modules/mod_slotmem.so

      LoadModule manager_module modules/mod_manager.so

      LoadModule proxy_cluster_module modules/mod_proxy_cluster.so

      LoadModule advertise_module modules/mod_advertise.so

       

      Listen 192.168.150.6:6666

      <VirtualHost 192.168.150.6:6666>

          KeepAliveTimeout 60

          MaxKeepAliveRequests 0

       

          ManagerBalancerName ApacheHttpdBalancer

          ServerAdvertise Off

      </VirtualHost>

       

      Listen 192.168.150.6:80

      <VirtualHost *:80>

          <Location /mcm>

              SetHandler mod_cluster-manager

       

              Order deny,allow

              Deny from all

              #Allow from 192.168.120. 192.168.150.

              Allow from all

          </Location>

       

          <Location /server-status>

              SetHandler server-status

       

              Order deny,allow

              Deny from all

              #Allow from 192.168.120. 192.168.150.

              Allow from all

          </Location>

       

          <Location /load-demo>

              Order deny,allow

              Deny from all

              #Allow from 192.168.120. 192.168.150.

              Allow from all

          </Location>

      </VirtualHost>

       

      Please notice that we are not using the advertise feature.

      On the JBoss end we have configured the servers so they don't use advertise. The following changes were made to the file "mod_cluster-jboss-beans.xml" (the bean changed was "ModClusterConfig"):

       

          <!--<property name="proxyList">${jboss.mod_cluster.proxyList,jboss.modcluster.proxyList:}</property>-->

          <property name="proxyList">192.168.150.6:6666</property>

       

      and

       

          <!--<property name="advertise">${jboss.mod_cluster.advertise:true}</property>-->

          <property name="advertise">false</property>

       

      The changes above were made acording to "What to do if I don't want to use Advertise (multicast)":

       

      http://docs.jboss.org/mod_cluster/1.1.0/html/faq.html#d0e4112

       

      I checked the Apache HTTP files for errors (after the problem described above) and couldn't find any relevant for this problem:

       

      httpd.exe: Could not reliably determine the server's fully qualified domain name, using 192.168.150.6 for ServerName

      [Mon Mar 28 22:50:23 2011] [notice] Advertise initialized for process 2864

      [Mon Mar 28 22:50:23 2011] [notice] Apache/2.2.17 (Win32) mod_cluster/1.1.x configured -- resuming normal operations

      [Mon Mar 28 22:50:23 2011] [notice] Server built: Oct 18 2010 01:58:12

      [Mon Mar 28 22:50:23 2011] [notice] Parent: Created child process 1868

      httpd.exe: Could not reliably determine the server's fully qualified domain name, using 192.168.150.6 for ServerName

      httpd.exe: Could not reliably determine the server's fully qualified domain name, using 192.168.150.6 for ServerName

      [Mon Mar 28 22:50:23 2011] [notice] Child 1868: Child process is running

      [Mon Mar 28 22:50:23 2011] [notice] Child 1868: Acquired the start mutex.

      [Mon Mar 28 22:50:23 2011] [notice] Child 1868: Starting 64 worker threads.

      [Mon Mar 28 22:50:23 2011] [notice] Child 1868: Starting thread to listen on port 80.

      [Mon Mar 28 22:50:23 2011] [notice] Child 1868: Starting thread to listen on port 6666.

       

      Also, no errors on the event log.

       

      After the problem, mod_cluster seems normal:

       

      Node: [1],Name: si_part1_node1,Balancer: ApacheHttpdBalancer,LBGroup: ,Host: 192.168.150.43,Port: 8109,Type: ajp,Flushpackets: Off,Flushwait: 10,Ping: 10,Smax: 65,Ttl: 60,Elected: 0,Read: 0,Transfered: 0,Connected: 0,Load: 96

      Node: [2],Name: pc_part1_node2,Balancer: ApacheHttpdBalancer,LBGroup: ,Host: 192.168.150.44,Port: 8209,Type: ajp,Flushpackets: Off,Flushwait: 10,Ping: 10,Smax: 65,Ttl: 60,Elected: 0,Read: 0,Transfered: 0,Connected: 0,Load: 97

      Node: [3],Name: pc_part1_node1,Balancer: ApacheHttpdBalancer,LBGroup: ,Host: 192.168.150.43,Port: 8209,Type: ajp,Flushpackets: Off,Flushwait: 10,Ping: 10,Smax: 65,Ttl: 60,Elected: 0,Read: 0,Transfered: 0,Connected: 0,Load: 97

      Node: [4],Name: ne_part1_node2,Balancer: ApacheHttpdBalancer,LBGroup: ,Host: 192.168.150.44,Port: 8309,Type: ajp,Flushpackets: Off,Flushwait: 10,Ping: 10,Smax: 65,Ttl: 60,Elected: 138,Read: 3588,Transfered: 0,Connected: 0,Load: 97

      Node: [5],Name: ne_part1_node1,Balancer: ApacheHttpdBalancer,LBGroup: ,Host: 192.168.150.43,Port: 8309,Type: ajp,Flushpackets: Off,Flushwait: 10,Ping: 10,Smax: 65,Ttl: 60,Elected: 130,Read: 3380,Transfered: 0,Connected: 0,Load: 97

      Node: [6],Name: ph_part1_node2,Balancer: ApacheHttpdBalancer,LBGroup: ,Host: 192.168.150.44,Port: 8009,Type: ajp,Flushpackets: Off,Flushwait: 10,Ping: 10,Smax: 65,Ttl: 60,Elected: 0,Read: 0,Transfered: 0,Connected: 0,Load: 97

      Node: [7],Name: ph_part1_node1,Balancer: ApacheHttpdBalancer,LBGroup: ,Host: 192.168.150.43,Port: 8009,Type: ajp,Flushpackets: Off,Flushwait: 10,Ping: 10,Smax: 65,Ttl: 60,Elected: 0,Read: 0,Transfered: 0,Connected: 0,Load: 96

      Vhost: [1:1:1], Alias: localhost

      Vhost: [2:1:2], Alias: localhost

      Vhost: [3:1:3], Alias: localhost

      Vhost: [4:1:4], Alias: localhost

      Vhost: [5:1:5], Alias: localhost

      Vhost: [6:1:6], Alias: localhost

      Vhost: [7:1:7], Alias: localhost

      Context: [1:1:1], Context: /femss, Status: ENABLED

      Context: [2:1:2], Context: /fepcwcm, Status: ENABLED

      Context: [3:1:3], Context: /fepcwcm, Status: ENABLED

      Context: [4:1:4], Context: /load-demo, Status: ENABLED

      Context: [5:1:5], Context: /load-demo, Status: ENABLED

      Context: [6:1:6], Context: /fephn, Status: ENABLED

      Context: [7:1:7], Context: /fephn, Status: ENABLED

       

      Really puzzled about this problem, since preliminary tests went great.

      Will look into this tomorrow.

      Some possibilities:

       

      The ASA firewall is cutting the trafic.

      Some wrong configuration.

      32bit vs 64bit.

       

      Any pointer/sugestion on where to start and what to look for would be great.

      Thank you.

        • 1. mod_cluster 1.1.1 Moving to Production
          jfclere

          - try with LogLevel debug

           

          - See http://httpd.apache.org/docs/2.2/platform/windows.html

           

          - check that the AJP Connector in the AS side is configured with enough threads.

          • 2. Re: mod_cluster 1.1.1 Moving to Production
            joaocunhalopes

            Jean-Frederic,

             

            thank you for your reply.

            I spent all day on this and I haven't found a solution.

            Here are some links to the Apache HTTP logs that I captured during the problem. They are large so I have zipped them:

             

            <links removed since a solution was found (see below) and these files are not relevant to the solution>

             

            Couldn't find anything wrong on the logs.

            Tomorrow I will look into the JBoss side, to the AJP connectors and I will try to monitor them.

             

            From what I have seem the Apache takes the HTTP requests and places them in WAIT state until it receives an answer from the JBoss server.

            That answer is sent on the first requests until it reaches a point where they (the JBoss servers?) can't reply.

            I will definitly look into the thread number for the AJP connector on the AS side.

             

            Today, when I tested with the Apache HTTP debug on the failure happened latter in time. It took a bit more for the failure to happen. As intense logging was beeing done by the Apache HTTP the delay between requests was higher. This maybe allowed the JBoss servers to reply in a timely fashion. It seems the Apache is flooding the JBoss servers to a point where they can't reply.

             

            I will post tomorrow about the AS side monitoring/tunning.

            • 3. mod_cluster 1.1.1 Moving to Production
              joaocunhalopes

              After my last post I read

               

              http://community.jboss.org/wiki/OptimalModjk12Configuration

               

               

              and

               

              https://access.redhat.com/kb/docs/DOC-15866

               

               

              Changed the AJP config from

               

                    <Connector protocol="AJP/1.3" port="8009" address="${jboss.bind.address}"

                       redirectPort="8443" />

               

              to

               

                    <Connector port="8009" address="${jboss.bind.address}" protocol="AJP/1.3"

                       emptySessionPath="true" enableLookups="false" redirectPort="8443"

                       maxThreads="600" connectionTimeout="600000" />

               

              and retested.

              The result is almost the same:

               

              3.jpg

               

              After the crash the line is no longer flat. It's getting better.

              • 4. mod_cluster 1.1.1 Moving to Production
                jfclere

                That looks like a problem in AS side... May you run out of resources... Any errors in the server.log file?

                • 5. mod_cluster 1.1.1 Moving to Production
                  joaocunhalopes

                  Jean-Frederic, you are right, it was on the AS side.

                  I found the problem. For future reference here is the expected test result and the cause of the problem:

                   

                  Test Result (this is what I was aiming for)

                   

                  4.JPG

                  5.JPG

                   

                  And now the culprit (on this imaged it's off; it was on)

                   

                  6.JPG

                  We are using new machines and they were delivered with a Microsoft product called "Microsoft Forefront Endpoint Protection 2010".

                  More about this product here:

                   

                  http://en.wikipedia.org/wiki/Microsoft_Forefront

                   

                  Anyway, I decided to turn this off on both machines that have the AS installed and it worked.

                  Here's how you turn it off:

                   

                  http://technet.microsoft.com/en-us/library/ff823868.aspx

                   

                  So, for future reference, possible culprits:

                   

                  Firewall (hardware)

                  Security Products (software firewall or security products on Apache machine ou AS machine(s)).

                   

                  Jean-Frederic, thank you for your help on this.

                  Regards.

                   

                  John