Sorry, what happens if you simulate the failover and the request was about to be routed to the node that is failed?
Also, the testing is not exactly right, DROP in iptables does not correspond 1:1 to pulled cable/dead process. You could try kill -9 of the java process.
Thanks for you answer.
What do you mean with simulate the failover ?
About iptables vs kill -9, note that i'm not talking about process dead but instance dead.
If i kill the process (with -9 option), mod-cluster marks the node as NOTOK really soon (indeed the expected behavior).
But when an VM instance dies, the behavior is exaclty as what i have with iptables.
Main difference for instance:
nc -vz xx.rr.tt.ww 8009
returns immediately an error if the process is down. But if xx.rr.tt.www :8009 is blocked since a firewall, nc command goes in timeout after 60 sec. Same think if the instance is dead (nc command goes in timeout after 60 sec).
So i suppose that if the process is dead but the instance is reachable even modcluster (like nc) is able to detect quickly that tomcat process is not running. If the instance is not reachable (since firewall or since it is dead) event modcluster (like nc) needs a lot of time (my PING value ?) to detect that tomcat is not reachable. In the meantime...something bad happen.
at 21:01:16 i run iptables command.
at 21.02:21 mod-cluster marked the node as NOTOK
at 21:02:39 the node is marked as OK
at 21:02:55 the node is marked as NOTOK
at 21:03:43 the node is marked as OK
at 21:03:55 the node is marked as NOTOK
at 21:05:57 the node is marked as OK
at 21:06:58 the node is marked as NOTOK
at 21:08:53 the node is marked as OK
at 21:09:54 the node is marked as NOTOK
Meanwhile some requests were forwared to the blocked node and they were waiting 1m (i suppose since my PING value) and then sent to the working node.
Just done a test with PING=10
at 07:17:33 firewall closed
at 07:17:53 NOTOK
at 07:18:12 OK
at 07:18:24 NOTOK
at 07:18:30 OK
at 07:18:39 NOTOK
at 07:18:42 OK
at 07:18:51 NOTOK
...and so on...
As before, meanwhile some requests were forwared to the blocked node...
Could you try with mod_cluster-1.2.6? Set LogLevel to debug and send the trace.
What is the httpd -V output?
Ok, i'll try latest vesion.
In the meantime, this is the required output :
[root]# httpd -V
Server version: Apache/2.2.24 (Unix)
Server built: Mar 19 2013 14:33:22
Server's Module Magic Number: 20051115:31
Server loaded: APR 1.4.6, APR-Util 1.4.1
Compiled using: APR 1.4.6, APR-Util 1.4.1
Server MPM: Worker
threaded: yes (fixed thread count)
forked: yes (variable process count)
Server compiled with....
-D APR_HAVE_IPV6 (IPv4-mapped addresses enabled)
I replicated the issue with mod-cluster 1.2.6
Here you can find the log: http://tny.cz/eb0e6021
(i didn't find a way to attach it)
- at 15:57:28 i started httpd
- at 15:59:31 i run IPTABLES on NODE02 (10.2.2.2) (before both nodes were OK)
- at 15:59:52 NODE02 was marked as NOTOK
- at 16:00:21 NODE02 was marked as OK
After looking to the trace there are unexpected STATUS from NODE02 the cping/cpong retries it and also mark it OK to do the retry and during that time some requests are going to NODE2.
That is a bug triggered by your weird test. You should create JIRA (and attach the log file to the JIRA).
Just for clarify, i reproducce the issue closing traffic on port 666 (my iptables command was incomplete since it doesn't block already existing connections). So at the end the behaviour is the same but there are not unexpected status commands from NODE02).