4 Replies Latest reply on Jan 5, 2009 12:01 PM by Brian Stansberry

    Slow failover

    Mohit Anchlia Novice

      We have Front End box and Back End box. Back End boxes have clustered EJB stateless bean. Front End box calls business method on EJB Stateless Bean that is on Back End box.

      Now the problem is if one of the Back End machine(s) goes down or reboots then all our Front End server requests going to stateless bean comes to grinding halt. It looks like the EJB proxy stub that does Round Robin is not working as efficiently. Could someone help me diaganose this issue in detail? I really want to get down to level where I understand what's going on inside Clustering when box gets rebooted. We use "jnp.partitionName" to do the lookup so that we get interceptor proxy Context.

      P.S. Note: When we shut down jboss everything is working as normal.

        • 1. Re: Slow failover
          Mohit Anchlia Novice

          Could it have something to do with TCP sockets? I even tried to reset max tries in FD to 1 but it didn't help. We are using default out of box cluster-service.xml file.

          • 2. Re: Slow failover
            Mohit Anchlia Novice

            I did more analysis and it looks like after back end nodes detects one of the node is down it re-elects the master (after calling ElectionPolicy), after that jboss just pauses, it just halts. Could there be a bug with how Proxy stub at Front end box is not able to make the remote call and just hangs?

            I am just wondering why would rebooting one box cause all other boxes to hang and why would EJB call not round robin in this case.

            • 3. Re: Slow failover - Plz read
              Mohit Anchlia Novice

              I got the logs and it shows me that Jboss is continuously trying to connect to the host that just got rebooted or crashed. I thought when EJB is @clustered it will round robin the invocation and not try same host over and over again. But what I am seeing is that Jboss repeatedly tries to connecto failed host. Could someone tell me what's going on?

              ---
              2008-12-31 12:20:14,287 DEBUG [transport.socket.MicroSocketClientInvoker:ajp-0.0.0.0-8009-99] - SocketClientInvoker[18b118, socket://10.10.8.77:3873] got Exception java.net.NoRouteToHostException: No route to host, creation attempt took 3001 ms
              2008-12-31 12:20:14,287 DEBUG [transport.socket.MicroSocketClientInvoker:ajp-0.0.0.0-8009-86] - SocketClientInvoker[18b118, socket://10.10.8.77:3873] got Exception java.net.NoRouteToHostException: No route to host, creation attempt took 3001 ms
              2008-12-31 12:20:17,291 DEBUG [transport.socket.MicroSocketClientInvoker:ajp-0.0.0.0-8009-86] - SocketClientInvoker[18b118, socket://10.10.8.77:3873] got Exception java.net.NoRouteToHostException: No route to host, creation attempt took 3001 ms
              2008-12-31 12:20:17,292 DEBUG [transport.socket.MicroSocketClientInvoker:ajp-0.0.0.0-8009-99] - SocketClientInvoker[18b118, socket://10.10.8.77:3873] got Exception java.net.NoRouteToHostException: No route to host, creation attempt took 3002 ms


              ..........

              • 4. Re: Slow failover
                Brian Stansberry Master

                For others who are interested in this question, please see the thread mohitanchlia opened on the Remoting forum:

                http://www.jboss.com/index.html?module=bb&op=viewtopic&t=148017