Server side HA and failover| JBoss.org Content Archive (Read Only)

45. Re: Server side HA and failover

timfox Nov 23, 2006 6:36 AM (in response to timfox)

"clebert.suconic@jboss.com" wrote:
I've created a new class called LeaveClusterRequest.
This request is sent over the cluster when PostOffice.stop is called.

Yep, that's the way to do it for now, as discussed.

While chatting with Bela at JBW, he said he's going to look at extending the JGroups API so we can have the information of whether the node crashed or not directly.

Although this won't be available for a while though.

46. Re: Server side HA and failover

timfox Nov 23, 2006 7:27 AM (in response to timfox)

"clebert.suconic@jboss.com" wrote:

Now... viewAccept will perform a failOver if a view is changed. I still need to add some logic to have only one server accepting the failOver but that is pretty easy.

The most trivial way of doing this is to define a failover node F of a failed node A as the node to the right (or left, it doesn't matter) of the failed node in the jgroups view. Also need to think of the view as a "ring".

E.g. if the jgroups view has addresses:

A
B
C
D
E

then node A fails over to node B, node B fails over to node C, node C fails onto node D, node D fails onto node E, node E fails onto node A

Questions:

Is this sufficient for our needs?

Is there any sense in supporting multiple failover nodes for a single node? Or does that make no sense?

Should the policy really be pluggable? (Probably yes)

47. Re: Server side HA and failover

timfox Nov 23, 2006 7:34 AM (in response to timfox)

Ok, so it looks like good progress is being made on failover :)

What is there left to do?

My understanding is the remaining pieces are:

Load balancing policy for determining which server to initially connect to. (Re-use from remoting).

Mechanism for propagating changes in the client side server list from the server to the client when a jgroups view change occurs. I.e. when a new node joins / leaves. (Do we re-use from remoting here too?)

"Valve" functionality to stall any activity on connections when server failover is occurring.

Replaying of delivered messages to the ServerConmsumerEndpoint so the delivery list can be recreated.

Anything else?

48. Re: Server side HA and failover

timfox Nov 23, 2006 7:56 AM (in response to timfox)

"clebert" wrote:

We might need a design session to discuss what we have accomplished though.

Yes we should definitely do this ASAP so we can evaulate where we are and what remains to be done.

When does America come back after Thanksgiving?

49. Re: Server side HA and failover

clebert.suconic Nov 27, 2006 9:41 AM (in response to timfox)

when node A fails over to node B, node B fails over to node C, node C fails onto node D, node D fails onto node E, node E fails onto node A

Questions:

Is this sufficient for our needs?

The logic you described above, is what I though for the logic on fail over.

Is there any sense in supporting multiple failover nodes for a single node? Or does that make no sense?

As the implementation stands now, I guess it's not possible to have multiple nodes taking care of a single failure.

Should the policy really be pluggable? (Probably yes)

I can implement it through an interface/abstract class, however I don't see other policies being implemented.

50. Re: Server side HA and failover

timfox Nov 27, 2006 9:56 AM (in response to timfox)

"clebert.suconic@jboss.com" wrote:

As the implementation stands now, I guess it's not possible to have multiple nodes taking care of a single failure.

Fine. But when we do in memory message replication this becomes more important. We will want to be able to replicate messages to more than one other server, and in the case the failover server fails too, we still have the messages on the second failover node. (This is similar to buddy replication groups in JbossCache)

I can implement it through an interface/abstract class, however I don't see other policies being implemented.

See previous comment.

51. Re: Server side HA and failover

clebert.suconic Nov 27, 2006 10:03 AM (in response to timfox)

Tim wrote:
Ok, so it looks like good progress is being made on failover :)

What is there left to do?

...

Load balancing policy for determining which server to initially connect to. (Re-use from remoting).

At this point, there is a HAConnectionFactory, registered in JNDI, that will LoadBalance Connectionso on createConnection.

Tim wrote:
Mechanism for propagating changes in the client side server list from the server to the client when a jgroups view change occurs. I.e. when a new node joins / leaves. (Do we re-use from remoting here too?)

There is one thing I've done on this direction. If you do a lookup on HAConnectionFactory, you will have the list updated on the new de-serialized HAConnectionFactory. I don't know yet how we could update existent instances.

Tim wrote:
"Valve" functionality to stall any activity on connections when server failover is occurring.

I'm not sure... but I guess we are locking write on failOver. Isn't that enough? We shall test it anyway.

Tim wrote:
Replaying of delivered messages to the ServerConmsumerEndpoint so the delivery list can be recreated.

What you mean? I didn't understand.

Tim wrote:
Anything else?

- Finish implementing the FailOverPolicy (Next node assumes the previous node)
- Establish a relationship between HAConnectionFactories and specific ConnectionFactories. (In case there is more than one Connection Factory, e.g. HTTP, JMS...)
- Failure on HA at this point is only done when ConnectionListener receives an event, which is pretty much done only at leasing. We should also change interceptors to take actions when Exceptions are occuring and take actions such as retry... retry... failOver
- We should have a MAP of failed nodes. (When a node assumes another node)
- The redirect protocol. Are you ready for this node? (What would be a good method name BTW?)
- Go over the current design on HAConnectionFactories.

52. Re: Server side HA and failover

clebert.suconic Nov 27, 2006 10:12 AM (in response to timfox)

Tim wrote:
Fine. But when we do in memory message replication this becomes more important. We will want to be able to replicate messages to more than one other server, and in the case the failover server fails too, we still have the messages on the second failover node. (This is similar to buddy replication groups in JbossCache)

We (or simply I) will need to understand how the local queue will take effect when we have that implemented. That's the only point on the fail over with multiple nodes as the queue is transfered to the failedNode.