8 Replies Latest reply on Nov 11, 2013 5:02 AM by nichele

    Tomcat instance lost and mod-cluster behaviour.  modcluster-97 ?

    nichele

      Hi all,

      i'm using mod-cluster 1.2.0 with tomcat 7. My envrionemnt is deployed on Amazon and, just to have the discussione easier, i have 1 httpd instance +  2 tomcat instances (3 different VMs).

       

      I have noticed that if a tomcat instance is lost (stopped or terminated from amazon console) my environemnt starts to be not stable.

      HTTPD still report my instance in mod_cluster-manager sometime view with status ok and sometime with status notok.

      At this point i have seen some requests still sent to the dead node and, after 1 min,  sent to the other live node.

       

      I have replicated this behavior just using iptables on my tomcat instance closing outgoing 6666 connections and incoming traffic on 8009.

      iptables -A OUTPUT -p tcp -m state --state NEW -m tcp --dport 6666 -j DROP

      iptables -A INPUT -m state --state NEW,ESTABLISHED -p tcp --dport 8009 -j DROP

       

      My concern is about the fact the mod-cluster should know that the node is not working as expected (at least since it is not sending anymore status information) and should remove it from the list of the nodes.

       

      Just found this issue:

      [MODCLUSTER-97] httpd should remove workers who crashed - JBoss Issue Tracker

      that is really similar to my case. Is it a regression ?

       

      This is my tomcat configuration

        <Listener className="org.jboss.modcluster.container.catalina.standalone.ModClusterListener"

                  advertise="false"

                  proxyList="zzzzz:6666"

                  maxAttempts="3"

                  nodeTimeout="600"

                  workerTimeout="-1"

                  ping="60"

                  stickySession="true"

                  stickySessionRemove="false"

                  stickySessionForce="false"

                  loadMetricClass="org.jboss.modcluster.load.metric.impl.AverageSystemLoadMetric"

                  loadMetricCapacity="20"

        />

       

      and this is my httpd conf:

      Listen *:6666

      <VirtualHost *:6666>

       

         <Directory />

            Order deny,allow

            #Deny from all

            Allow from all

         </Directory>

       

         KeepAliveTimeout 60

         MaxKeepAliveRequests 0

         ManagerBalancerName mycluster

         ServerAdvertise Off

         EnableMCPMReceive

       

      </VirtualHost>

       

      Many thanks in advance.

      ste

        • 1. Re: Tomcat instance lost and mod-cluster behaviour.  modcluster-97 ?
          rhusar

          Sorry, what happens if you simulate the failover and the request was about to be routed to the node that is failed?

           

          Also, the testing is not exactly right, DROP in iptables does not correspond 1:1 to pulled cable/dead process. You could try kill -9 of the java process.

          • 2. Re: Re: Tomcat instance lost and mod-cluster behaviour.  modcluster-97 ?
            nichele

            Thanks for you answer.

            What do you mean with simulate the failover ?

             

            About iptables vs kill -9, note that i'm not talking about process dead but instance dead.

            If i kill the process (with -9 option), mod-cluster marks the node as NOTOK really soon (indeed the expected behavior).

             

            But when an VM instance dies, the behavior is exaclty as what i have with iptables.

            Main difference for instance:

            nc -vz xx.rr.tt.ww 8009

            returns immediately an error if the process is down. But if xx.rr.tt.www :8009 is blocked since a firewall, nc command goes in timeout after 60 sec. Same think if the instance is dead (nc command goes in timeout after 60 sec).

            So i suppose that if the process is dead but the instance is reachable even modcluster (like nc) is able to detect quickly that tomcat process is not running. If the instance is not reachable (since firewall or since it is dead) event modcluster (like nc) needs a lot of time (my PING value ?) to detect that tomcat is not reachable. In the meantime...something bad happen.

             

            For instance:

            at 21:01:16 i run iptables command.

            at 21.02:21 mod-cluster marked the node as NOTOK

            at 21:02:39 the node is marked as OK

            at 21:02:55  the node is marked as NOTOK

            at 21:03:43 the node is marked as OK

            at 21:03:55 the node is marked as NOTOK

            at 21:05:57 the node is marked as OK

            at  21:06:58 the node is marked as NOTOK

            at 21:08:53 the node is marked as OK

            at  21:09:54  the node is marked as NOTOK

             

            Meanwhile some requests were forwared to the blocked node and they were waiting 1m (i suppose since my PING value) and then sent to the working node.

             

            Ideas ?

             

            thx

            ste

            • 3. Re: Tomcat instance lost and mod-cluster behaviour.  modcluster-97 ?
              nichele

              Just done a test with PING=10

               

              at 07:17:33 firewall closed

              at 07:17:53 NOTOK

              at 07:18:12 OK

              at 07:18:24 NOTOK

              at 07:18:30 OK

              at 07:18:39 NOTOK

              at 07:18:42 OK

              at 07:18:51 NOTOK

              ...and so on...

               

              As before, meanwhile some requests were forwared to the blocked node...

               

              thx

              ste

              • 4. Re: Tomcat instance lost and mod-cluster behaviour.  modcluster-97 ?
                jfclere

                Could you try with mod_cluster-1.2.6? Set LogLevel to debug and send the trace.

                What is the httpd -V output?

                • 5. Re: Tomcat instance lost and mod-cluster behaviour.  modcluster-97 ?
                  nichele

                  Ok, i'll try latest vesion.

                   

                  In the meantime, this is the required output :

                   

                  [root]# httpd -V

                  Server version: Apache/2.2.24 (Unix)

                  Server built:   Mar 19 2013 14:33:22

                  Server's Module Magic Number: 20051115:31

                  Server loaded:  APR 1.4.6, APR-Util 1.4.1

                  Compiled using: APR 1.4.6, APR-Util 1.4.1

                  Architecture:   64-bit

                  Server MPM:     Worker

                    threaded:     yes (fixed thread count)

                      forked:     yes (variable process count)

                  Server compiled with....

                  -D APACHE_MPM_DIR="server/mpm/worker"

                  -D APR_HAS_SENDFILE

                  -D APR_HAS_MMAP

                  -D APR_HAVE_IPV6 (IPv4-mapped addresses enabled)

                  -D APR_USE_SYSVSEM_SERIALIZE

                  -D APR_USE_PTHREAD_SERIALIZE

                  -D APR_HAS_OTHER_CHILD

                  -D AP_HAVE_RELIABLE_PIPED_LOGS

                  -D DYNAMIC_MODULE_LIMIT=128

                  -D HTTPD_ROOT="/usr/local/httpd-2.2.24"

                  -D SUEXEC_BIN="/usr/local/httpd-2.2.24/bin/suexec"

                  -D DEFAULT_SCOREBOARD="logs/apache_runtime_status"

                  -D DEFAULT_ERRORLOG="logs/error_log"

                  -D AP_TYPES_CONFIG_FILE="conf/mime.types"

                  -D SERVER_CONFIG_FILE="conf/httpd.conf"

                   

                  thanks again

                  ste

                  • 6. Re: Tomcat instance lost and mod-cluster behaviour.  modcluster-97 ?
                    nichele

                    I replicated the issue with mod-cluster 1.2.6

                     

                    Here you can find the log: http://tny.cz/eb0e6021

                    (i didn't find a way to attach it)

                     

                    For helping:

                    - at 15:57:28 i started httpd

                    - at 15:59:31 i run IPTABLES on NODE02 (10.2.2.2)  (before both nodes were OK)

                    - at 15:59:52 NODE02 was marked as NOTOK

                    - at 16:00:21 NODE02 was marked as OK

                     

                    thx

                    ste

                    • 7. Re: Tomcat instance lost and mod-cluster behaviour.  modcluster-97 ?
                      jfclere

                      After looking to the trace there are unexpected  STATUS from NODE02 the cping/cpong retries it and also mark it OK to do the retry and during that time some requests are going to NODE2.

                      That is a bug triggered by your weird test. You should create JIRA (and attach the log file to the JIRA).

                      • 8. Re: Tomcat instance lost and mod-cluster behaviour.  modcluster-97 ?
                        nichele

                        Created [MODCLUSTER-369] httpd should remove lost node/worker - JBoss Issue Tracker.

                        Just for clarify, i reproducce the issue closing traffic on port 666 (my iptables command was incomplete since it doesn't block already existing connections). So at the end the behaviour is the same but there are not unexpected status commands from NODE02).

                         

                        ste