1.2.0.GA transparent node failover does not alw...| JBoss.org Content Archive (Read Only)

1 2 Previous Next 26 Replies Latest reply on May 31, 2007 4:11 AM by timfox Go to original post

15. Re: 1.2.0.GA transparent node failover does not always work

bander Mar 10, 2007 3:24 AM (in response to bander)

"timfox" wrote:
Since it seems you're not interested in the load balancing/automatic failover abilities of JBM, you could just use the non clustered connection factory at /NonClusteredConnectionFactory to create connections.

We certainly are interested in the load balancing and automatic failover abilities of JBM but we have to verify that all our JBM 1.0.1 issues have been resolved.
Actions
16. Re: 1.2.0.GA transparent node failover does not always work

timfox Mar 10, 2007 11:53 AM (in response to bander)

Ben-

I return to the UK next week (am in US right now), and I'll spend some time trying to get to the bottom of what is happening in your case.
Actions
17. Re: 1.2.0.GA transparent node failover does not always work

sergeypk May 30, 2007 11:46 AM (in response to bander)

I investigated this with help from Tim. The example worked fine for me, transparent failover was really transparent, and non-transparent failover also worked.

To make it handle the situation when the entire cluster goes down, I modified the example to re-lookup the connection factory from JNDI when reconnecting.
Actions
18. Re: 1.2.0.GA transparent node failover does not always work

timfox May 30, 2007 11:51 AM (in response to bander)

Yes, thanks Sergey :)

One thing we noticed with Ben's code, is that it tries to use the same old connection factory after failure has occurred.

If you're doing the "old style" manual reconnect on failure, you need to throw away the connection factory after failure, or it may not know about the new cluster topology.

Also, for this kind of thing, the user code should also bne using HAJNDI to ensure the new JNDI lookup works on a different node after failure of the original node.
Actions
19. Re: 1.2.0.GA transparent node failover does not always work

bander May 30, 2007 6:56 PM (in response to bander)

"timfox" wrote:
Yes, thanks Sergey :)

One thing we noticed with Ben's code, is that it tries to use the same old connection factory after failure has occurred.

If you're doing the "old style" manual reconnect on failure, you need to throw away the connection factory after failure, or it may not know about the new cluster topology.

Is it standard practice to not reuse a connection factory reference or is this just a JBM specific requirement? (i'm asking because we're trying to keep our code vendor neutral)
Actions
20. Re: 1.2.0.GA transparent node failover does not always work

timfox May 31, 2007 3:08 AM (in response to bander)

"bander" wrote:
"timfox" wrote:
Yes, thanks Sergey :)

Is it standard practice to not reuse a connection factory reference or is this just a JBM specific requirement? (i'm asking because we're trying to keep our code vendor neutral)

It certainly was standard practice with JBoss MQ, and if you think about it, you can't guarantee that provider XYZ doesn't encode information into the connection factor about their topology so it makes sense to do it for all.
Actions
21. Re: 1.2.0.GA transparent node failover does not always work

sergeypk May 31, 2007 3:11 AM (in response to bander)

Tim, please correct me if I'm wrong on this:

I don't think there's anything in the JMS spec about behavior of clustering and failover. In any case, you only need to re-lookup the connection factory if the entire cluster goes down and is restarted later, not if just one node fails.

Basically, the factory remembers the last node that was alive and will try to create connections targetting this node. If the node doesn't come alive, the attempts will keep failing, even though there could be other nodes already alive that could be used in place of this one.
Actions
22. Re: 1.2.0.GA transparent node failover does not always work

timfox May 31, 2007 3:14 AM (in response to bander)

But, in the general case you need to look it up every time, since you can't make assumptions how a specific provider implements their clustering.
Actions
23. Re: 1.2.0.GA transparent node failover does not always work

bander May 31, 2007 3:37 AM (in response to bander)

"timfox" wrote:
But, in the general case you need to look it up every time, since you can't make assumptions how a specific provider implements their clustering.

I'm showing my ignorance of JNDI here - what exactly does looking up the connection factory do (other than get an object reference)? Is a new connection factory object being created each time?
Actions
24. Re: 1.2.0.GA transparent node failover does not always work

timfox May 31, 2007 4:04 AM (in response to bander)

It gets whatever is bound in the JNDI tree at that time.
Actions
25. Re: 1.2.0.GA transparent node failover does not always work

bander May 31, 2007 4:07 AM (in response to bander)

"timfox" wrote:
It gets whatever is bound in the JNDI tree at that time.

So JBM is actively changing whatever is bound as nodes go up and down etc?
Actions
26. Re: 1.2.0.GA transparent node failover does not always work

timfox May 31, 2007 4:11 AM (in response to bander)

"bander" wrote:

So JBM is actively changing whatever is bound as nodes go up and down etc?

Yes. But it's actually more than that. The connection factory maintains a list of nodes to failover onto and to load balance connetions across, when the cluster topology changes (a node joins or leaves) then two things happen:

1) The connection factory is rebound in JNDI with the updated list

2) An update message is sent to all the clients which already have their connection factories to make them update their internal lists.

I suspect other providers might also use a similar approach.

So if you re-use a CF from before the crash then it's likely it won't know about the different topology - at least you can't guarantee it since you'll have to start making nig assumptions based on the implementation details of the particular messaging system you're using. Safest thing to do is to throw it away and start again.
Actions

1 2 Previous Next

Go to original post