1 2 Previous Next 20 Replies Latest reply on Aug 5, 2005 1:46 AM by starksm64 Go to original post
      • 15. Re: Failover problem

        vignesh76;

        darranl has outlined exactly what we were experiencing making HA calls to our middleware servers. The scenario was this:

        1) My servers start up and make an HA-JNDI lookup and acquire a remote to the already running middleware servers.
        2) My servers issue several calls per minute all day long. During this time, provided at least one server in the middleware cluster remains running across two invocations by my servers, the calls never fail. At the same time, the middleware engineers can add or remove servers from the middleware cluster without issue, provided the stricture outlined above.
        3) The traffic from my servers against the middleware servers slows down around 11 PM and cease all together around 2 PM, and do not pick up again until around 6 AM.
        4) The middleware engineers decided to reboot all their servers starting at 4:30 AM. They think that if they stagger the reboots, there will be no interruption, but since there are no calls from my servers during that time, when 6 AM rolls around, all my server's middleware calls fail with a ServiceUnavailable exception.

        Our resolution was to always trap the invocation exception, determine if the cause was ServiceUnavailable, and if so, we reinitialized the [cached] remote and poof. Problem solved.

        There was a little bit of overhead when the re-init occured, on account of the HA-JNDI lookup, but since it only occured once per server per day, it was negligible. If the call failed again after the re-init, then we throw a fatal exception and the bat-phone rings.

        //Nicholas

        • 16. Re: Failover problem

           

          "nickman" wrote:

          darranl has outlined exactly what we were experiencing making HA calls to our middleware servers.


          What it describes is an orphaned client. It only affects protocols like RMI
          where the stub can be invalidated by a server reboot. i.e. the RMI server is
          available, but it is no longer accepting requests from previous clients.

          Rebooting the entire cluster will invalidate JBoss's JRMPHA proxy,
          i.e. all the stubs are broken. :-(


          Our resolution was to always trap the invocation exception, determine if the cause was ServiceUnavailable, and if so, we reinitialized the [cached] remote and poof. Problem solved.


          That is the well known solution to the rmi issue, clustered or otherwise.

          For JBoss working around the problem for you "automagically"
          http://www.jboss.org/?module=bb&op=viewtopic&t=58984
          http://docs.jboss.org/jbossas/javadoc/4.0.2/org/jboss/proxy/ejb/RetryInterceptor.html
          http://jira.jboss.com/jira/browse/JBAS-1330

          Of course, you could always use another protcol like HTTP(S) HA
          which doesn't have the stub invalidation problem at server reboot.

          • 17. Re: Failover problem
            vignesh76

            Hi Guys,

            Thanks, really appreciate your inputs. Nick, I am facing the exact same problem as you have described and I believe the failover should be transparent rather than the client having to handle it. We are not caching the interfaces as well. In fact I have just tested with the latest production version of JBoss (4.0.2) and this problem does not happen in that version. Just needed a straight forward answer from the guys who knew it without the big hoo-ha!

            The problem happens in JBoss (MX MicroKernel) [4.0.1 (build: CVSTag=JBoss_4_0_1date=200412230944)] and I do not know whether it's due to a bug, OS, JVM or some other issue but it's definitely not a configuration problem as I used the exact same configuration while testing with 4.0.2. This could have been fixed in 4.0.1 sp1 though I don't know nor have time to test in that version. Probably someone who has already done could clarify. Thanks again.

            • 18. Re: Failover problem
              vignesh76

              Hey Guys,

              I still encounter that error even with the latest release [4.0.2 (build: CVSTag=JBoss_4_0_2 date=200505022023)]. I got to go now though I shall do detailed testing tomorrow to confim my results and post my detailed analysis.

              Thanks.

              • 19. Re: Failover problem
                vignesh76

                Hi Nick,

                Apparently as this seems to be a long exisitng bug in JBoss, I have to look into handling the failover in the client as you have done for the time being. I am not too sure of the patch posted in JIRA nor have time to test it.

                Thanks.

                • 20. Re: Failover problem
                  starksm64
                  1 2 Previous Next