Failover problem| JBoss.org Content Archive (Read Only)

15. Re: Failover problem

nickman Aug 3, 2005 5:57 PM (in response to vignesh76)

vignesh76;

darranl has outlined exactly what we were experiencing making HA calls to our middleware servers. The scenario was this:

1) My servers start up and make an HA-JNDI lookup and acquire a remote to the already running middleware servers.
2) My servers issue several calls per minute all day long. During this time, provided at least one server in the middleware cluster remains running across two invocations by my servers, the calls never fail. At the same time, the middleware engineers can add or remove servers from the middleware cluster without issue, provided the stricture outlined above.
3) The traffic from my servers against the middleware servers slows down around 11 PM and cease all together around 2 PM, and do not pick up again until around 6 AM.
4) The middleware engineers decided to reboot all their servers starting at 4:30 AM. They think that if they stagger the reboots, there will be no interruption, but since there are no calls from my servers during that time, when 6 AM rolls around, all my server's middleware calls fail with a ServiceUnavailable exception.

Our resolution was to always trap the invocation exception, determine if the cause was ServiceUnavailable, and if so, we reinitialized the [cached] remote and poof. Problem solved.

There was a little bit of overhead when the re-init occured, on account of the HA-JNDI lookup, but since it only occured once per server per day, it was negligible. If the call failed again after the re-init, then we throw a fatal exception and the bat-phone rings.

//Nicholas

16. Re: Failover problem

adrian.brock Aug 3, 2005 6:20 PM (in response to vignesh76)

"nickman" wrote:

darranl has outlined exactly what we were experiencing making HA calls to our middleware servers.

What it describes is an orphaned client. It only affects protocols like RMI
where the stub can be invalidated by a server reboot. i.e. the RMI server is
available, but it is no longer accepting requests from previous clients.

Rebooting the entire cluster will invalidate JBoss's JRMPHA proxy,
i.e. all the stubs are broken. :-(

Our resolution was to always trap the invocation exception, determine if the cause was ServiceUnavailable, and if so, we reinitialized the [cached] remote and poof. Problem solved.

That is the well known solution to the rmi issue, clustered or otherwise.

For JBoss working around the problem for you "automagically"
http://www.jboss.org/?module=bb&op=viewtopic&t=58984
http://docs.jboss.org/jbossas/javadoc/4.0.2/org/jboss/proxy/ejb/RetryInterceptor.html
http://jira.jboss.com/jira/browse/JBAS-1330

Of course, you could always use another protcol like HTTP(S) HA
which doesn't have the stub invalidation problem at server reboot.

17. Re: Failover problem

vignesh76 Aug 3, 2005 6:27 PM (in response to vignesh76)

Hi Guys,

Thanks, really appreciate your inputs. Nick, I am facing the exact same problem as you have described and I believe the failover should be transparent rather than the client having to handle it. We are not caching the interfaces as well. In fact I have just tested with the latest production version of JBoss (4.0.2) and this problem does not happen in that version. Just needed a straight forward answer from the guys who knew it without the big hoo-ha!

The problem happens in JBoss (MX MicroKernel) [4.0.1 (build: CVSTag=JBoss_4_0_1date=200412230944)] and I do not know whether it's due to a bug, OS, JVM or some other issue but it's definitely not a configuration problem as I used the exact same configuration while testing with 4.0.2. This could have been fixed in 4.0.1 sp1 though I don't know nor have time to test in that version. Probably someone who has already done could clarify. Thanks again.

JBossDeveloper

15. Re: Failover problem

16. Re: Failover problem

17. Re: Failover problem

18. Re: Failover problem

19. Re: Failover problem

20. Re: Failover problem