4 Replies Latest reply on Apr 29, 2014 5:45 PM by mustaq.pradhan

    load-balancer cannot access application because mod_cluster cannot find node

    mustaq.pradhan

      We are running a multi-host domain of JBoss EAP 6.1.0.GA (AS 7.2.0.Final-redhat-8) on Red Hat Enterprise Linux Server release 6.5.

       

      We have following modules:

       

      mod_cluster-native-1.2.4-1.Final.redhat_1.ep6.el6.x86_64

      mod_cluster-1.2.4-1.Final_redhat_1.ep6.el6.noarch

      jboss-as-modcluster-7.2.0-8.Final_redhat_8.ep6.el6.noarch

       

      The mod_cluster.conf is:

      LoadModule slotmem_module  modules/mod_slotmem.so

      LoadModule proxy_cluster_module modules/mod_proxy_cluster.so

      LoadModule advertise_modulemodules/mod_advertise.so
      LoadModule manager_module  modules/mod_manager.so

       

      <Location /mod_cluster_manager>

      SetHandler mod_cluster-manager

      </Location>

       

      Listen ut-j10-01.appu.test.det.nsw.edu.au:10001

      <VirtualHost ut-j10-01.appu.test.det.nsw.edu.au:10001>

         KeepAliveTimeout 60

         MaxKeepAliveRequests 0

         ManagerBalancerName mycluster

       

         ServerAdvertise On

         AdvertiseFrequency 5

         AdvertiseGroup 228.1.10.1:23364

       

         EnableMCPMReceive

       

      </VirtualHost>

       

      AllowCmd Off

       

      Maxhost 1000

      Maxnode 100

       

      We are getting following errors on the server.log

       

      24-04-2014 10:19:14.394 +1000 ERROR [org.jboss.modcluster] (ContainerBackgroundProcessor[StandardEngine[jboss.web]]) MODCLUSTER000042: Error MEM sending STATUS command to ut-j10-03.appu.test.det.nsw.edu.au/153.107.90.149:10001, configuration will be reset: MEM: Can't read node

      24-04-2014 10:19:14.396 +1000 ERROR [org.jboss.modcluster] (ContainerBackgroundProcessor[StandardEngine[jboss.web]]) MODCLUSTER000042: Error MEM sending STATUS command to ut-j10-01.appu.test.det.nsw.edu.au/153.107.90.147:10001, configuration will be reset: MEM: Can't read node

      24-04-2014 10:24:34.666 +1000 ERROR [org.jboss.modcluster] (ContainerBackgroundProcessor[StandardEngine[jboss.web]]) MODCLUSTER000042: Error MEM sending STATUS command to ut-j10-02.appu.test.det.nsw.edu.au/153.107.90.148:10001, configuration will be reset: MEM: Can't read node

      24-04-2014 10:24:44.679 +1000 ERROR [org.jboss.modcluster] (ContainerBackgroundProcessor[StandardEngine[jboss.web]]) MODCLUSTER000042: Error MEM sending STATUS command to ut-j10-03.appu.test.det.nsw.edu.au/153.107.90.149:10001, configuration will be reset: MEM: Can't read node

      24-04-2014 10:24:44.681 +1000 ERROR [org.jboss.modcluster] (ContainerBackgroundProcessor[StandardEngine[jboss.web]]) MODCLUSTER000042: Error MEM sending STATUS command to ut-j10-01.appu.test.det.nsw.edu.au/153.107.90.147:10001, configuration will be reset: MEM: Can't read node

       

      Also warnings on the apache error.log

       

      [Thu Apr 24 10:24:44 2014] [warn] manager_handler STATUS error: MEM: Can't read node

       

      The web & modcluster subsystems are configured as:

       

                  <subsystem xmlns="urn:jboss:domain:web:1.4" default-virtual-server="default-host" instance-id="${jboss.node.name}" native="false">

                      <connector name="http" protocol="HTTP/1.1" scheme="http" socket-binding="http"/>

                      <connector name="ajp" protocol="AJP/1.3" scheme="https" socket-binding="ajp" proxy-port="443" secure="true"/>

                      <virtual-server name="default-host" enable-welcome-root="true">

                          <alias name="localhost"/>

                          <alias name="example.com"/>

                      </virtual-server>

                  </subsystem>

       

                  <subsystem xmlns="urn:jboss:domain:modcluster:1.1">

                      <mod-cluster-config advertise-socket="modcluster" balancer="mycluster" load-balancing-group="mycluster" connector="ajp">

                          <dynamic-load-provider>

                              <load-metric type="busyness"/>

                          </dynamic-load-provider>

                      </mod-cluster-config>

                  </subsystem>

       

      When I am getting this error,  I can access application directly going to the server's apache on the port (eg., http://ud-j10-03.appu.dev.det.nsw.edu.au/smu/)

      But I cannot access from the load-balancer url.

       

      Is there any way to fix this?

      Is it possible to get the application without restarting the server? I have tried restating httpd, didn't help.

       

      Thanks for your help.

        • 1. Re: load-balancer cannot access application because mod_cluster cannot find node
          mbabacek

          Hmm, strange. Isn't there some <Directory /> directive missing, allowing access for your worker nodes tot he balancer's virtual host with EnableMCPMReceive?
          What happens if you fake the worker's mesage, e.g. by sending this to your balancer:

           

          {echo"CONFIG / HTTP/1.0"; echo"Content-length: 105"; echo""; echo"JVMRoute=FakeNode-1&Host=myfakeworker-node-1.example.edu&Maxattempts=1&Port=8009&Type=ajp&ping=100"; sleep1; }| telnet ut-j10-01.appu.test.det.nsw.edu.au 10001
          
          

           

          Send that command and tell us what's there in error_log and access_log. It would be the best to have the Apache set on LogLevel Debug, if you can.

          • 2. Re: load-balancer cannot access application because mod_cluster cannot find node
            jfclere

            I think that the CONFIG should also give a " Can't read node" error message. My guess is that is it related to SELinux.

            • 3. Re: load-balancer cannot access application because mod_cluster cannot find node
              mustaq.pradhan

              Yes, we do get "Can't read node" errors". But we have SELinux turned off.

              • 4. Re: Re: load-balancer cannot access application because mod_cluster cannot find node
                mustaq.pradhan

                Not sure if we need Directory. As the server instance (that was in error) got restarted, the error is no longer there and cannot reproduce. But this will sure to happen again, happened at random in the past.

                 

                I have tried sending the request from one of the nodes to the LB.

                ut-j10-02> (echo "CONFIG / HTTP/1.0"; echo "Content-length: 105"; echo ""; echo "JVMRoute=tca-102&Host=j10-lb.test.det.nsw.edu.au&Maxattempts=1&Port=8009&Type=ajp&ping=100"; sleep 1; )| telnet ut-j10-02.appu.test.det.nsw.edu.au 10001

                Trying 153.107.90.148...

                Connected to ut-j10-02.appu.test.det.nsw.edu.au.

                Escape character is '^]'.

                HTTP/1.1 200 OK

                Date: Tue, 29 Apr 2014 05:21:16 GMT

                Server: Apache/2.2.22 (Red Hat Enterprise Web Server)

                Connection: close

                Content-Type: httpd/unix-directory

                 

                Connection closed by foreign host.

                Will try this when I see any server instance in error again. The strange bit is this is not happening for all the server instances on the same domain accessing through the same load-balancer. It is only happening for some of the servers/apps. Somehow that particular server getting de-registered from the mod-cluster.