4 Replies Latest reply on Apr 29, 2014 5:45 PM by mustaq.pradhan

load-balancer cannot access application because mod_cluster cannot find node

mustaq.pradhan Apr 23, 2014 11:46 PM

We are running a multi-host domain of JBoss EAP 6.1.0.GA (AS 7.2.0.Final-redhat-8) on Red Hat Enterprise Linux Server release 6.5.

We have following modules:

mod_cluster-native-1.2.4-1.Final.redhat_1.ep6.el6.x86_64

mod_cluster-1.2.4-1.Final_redhat_1.ep6.el6.noarch

jboss-as-modcluster-7.2.0-8.Final_redhat_8.ep6.el6.noarch

The mod_cluster.conf is:

LoadModule slotmem_module

modules/mod_slotmem.so

LoadModule proxy_cluster_module modules/mod_proxy_cluster.so

LoadModule advertise_module	modules/mod_advertise.so
LoadModule manager_module	modules/mod_manager.so

SetHandler mod_cluster-manager

</Location>

Listen ut-j10-01.appu.test.det.nsw.edu.au:10001

KeepAliveTimeout 60

MaxKeepAliveRequests 0

ManagerBalancerName mycluster

ServerAdvertise On

AdvertiseFrequency 5

AdvertiseGroup 228.1.10.1:23364

EnableMCPMReceive

</VirtualHost>

AllowCmd Off

Maxhost 1000

Maxnode 100

We are getting following errors on the server.log

24-04-2014 10:19:14.394 +1000 ERROR [org.jboss.modcluster] (ContainerBackgroundProcessor[StandardEngine[jboss.web]]) MODCLUSTER000042: Error MEM sending STATUS command to ut-j10-03.appu.test.det.nsw.edu.au/153.107.90.149:10001, configuration will be reset: MEM: Can't read node

24-04-2014 10:19:14.396 +1000 ERROR [org.jboss.modcluster] (ContainerBackgroundProcessor[StandardEngine[jboss.web]]) MODCLUSTER000042: Error MEM sending STATUS command to ut-j10-01.appu.test.det.nsw.edu.au/153.107.90.147:10001, configuration will be reset: MEM: Can't read node

24-04-2014 10:24:34.666 +1000 ERROR [org.jboss.modcluster] (ContainerBackgroundProcessor[StandardEngine[jboss.web]]) MODCLUSTER000042: Error MEM sending STATUS command to ut-j10-02.appu.test.det.nsw.edu.au/153.107.90.148:10001, configuration will be reset: MEM: Can't read node

24-04-2014 10:24:44.679 +1000 ERROR [org.jboss.modcluster] (ContainerBackgroundProcessor[StandardEngine[jboss.web]]) MODCLUSTER000042: Error MEM sending STATUS command to ut-j10-03.appu.test.det.nsw.edu.au/153.107.90.149:10001, configuration will be reset: MEM: Can't read node

24-04-2014 10:24:44.681 +1000 ERROR [org.jboss.modcluster] (ContainerBackgroundProcessor[StandardEngine[jboss.web]]) MODCLUSTER000042: Error MEM sending STATUS command to ut-j10-01.appu.test.det.nsw.edu.au/153.107.90.147:10001, configuration will be reset: MEM: Can't read node

Also warnings on the apache error.log

[Thu Apr 24 10:24:44 2014] [warn] manager_handler STATUS error: MEM: Can't read node

The web & modcluster subsystems are configured as:

<virtual-server name="default-host" enable-welcome-root="true">

</virtual-server>

</subsystem>

<mod-cluster-config advertise-socket="modcluster" balancer="mycluster" load-balancing-group="mycluster" connector="ajp">

<dynamic-load-provider>

<load-metric type="busyness"/>

</dynamic-load-provider>

</mod-cluster-config>

</subsystem>

When I am getting this error, I can access application directly going to the server's apache on the port (eg., http://ud-j10-03.appu.dev.det.nsw.edu.au/smu/)

But I cannot access from the load-balancer url.

Is there any way to fix this?

Is it possible to get the application without restarting the server? I have tried restating httpd, didn't help.

Thanks for your help.

1. Re: load-balancer cannot access application because mod_cluster cannot find node

mbabacek Apr 24, 2014 3:15 AM (in response to mustaq.pradhan)
Hmm, strange. Isn't there some <Directory /> directive missing, allowing access for your worker nodes tot he balancer's virtual host with EnableMCPMReceive?
What happens if you fake the worker's mesage, e.g. by sending this to your balancer:

{echo"CONFIG / HTTP/1.0"; echo"Content-length: 105"; echo""; echo"JVMRoute=FakeNode-1&Host=myfakeworker-node-1.example.edu&Maxattempts=1&Port=8009&Type=ajp&ping=100"; sleep1; }| telnet ut-j10-01.appu.test.det.nsw.edu.au 10001

Send that command and tell us what's there in error_log and access_log. It would be the best to have the Apache set on LogLevel Debug, if you can.
Actions
2. Re: load-balancer cannot access application because mod_cluster cannot find node

jfclere Apr 24, 2014 5:36 AM (in response to mbabacek)

I think that the CONFIG should also give a " Can't read node" error message. My guess is that is it related to SELinux.
Actions
3. Re: load-balancer cannot access application because mod_cluster cannot find node

mustaq.pradhan Apr 29, 2014 1:15 AM (in response to jfclere)

Yes, we do get "Can't read node" errors". But we have SELinux turned off.
Actions
4. Re: Re: load-balancer cannot access application because mod_cluster cannot find node

mustaq.pradhan Apr 29, 2014 5:45 PM (in response to mbabacek)

Not sure if we need Directory. As the server instance (that was in error) got restarted, the error is no longer there and cannot reproduce. But this will sure to happen again, happened at random in the past.

I have tried sending the request from one of the nodes to the LB.

ut-j10-02> (echo "CONFIG / HTTP/1.0"; echo "Content-length: 105"; echo ""; echo "JVMRoute=tca-102&Host=j10-lb.test.det.nsw.edu.au&Maxattempts=1&Port=8009&Type=ajp&ping=100"; sleep 1; )| telnet ut-j10-02.appu.test.det.nsw.edu.au 10001

Trying 153.107.90.148...

Connected to ut-j10-02.appu.test.det.nsw.edu.au.

Escape character is '^]'.

HTTP/1.1 200 OK

Date: Tue, 29 Apr 2014 05:21:16 GMT

Server: Apache/2.2.22 (Red Hat Enterprise Web Server)

Connection: close

Content-Type: httpd/unix-directory

Connection closed by foreign host.

Will try this when I see any server instance in error again. The strange bit is this is not happening for all the server instances on the same domain accessing through the same load-balancer. It is only happening for some of the servers/apps. Somehow that particular server getting de-registered from the mod-cluster.
Actions

Go to original post