Server side HA and failover| JBoss.org Content Archive (Read Only)

30. Re: Server side HA and failover

clebert.suconic Nov 3, 2006 12:18 AM (in response to timfox)

"ovidiu.feodorov@jboss.com" wrote:
ummmmm, Clebert .... this is a public forum, everybody can read it :)

Never mind, you go ahead, I am sure you'll figure out a way to overcome those invading Bindings, nodeIDs and routers :)

Eh eh... a little bit of humor...

BTW... I already committed some changes regarding these changes.

31. Re: Server side HA and failover

clebert.suconic Nov 9, 2006 5:59 PM (in response to timfox)

I just changed DefaultRouter on HA Branch.

I'm round robbin between local Queues and FailedQueues now.

One thing we need to identify is... when to delete a failedOver queue. Say... a queue that failed 1 week ago.. won't be accessed any more, as when the client bounces there is no way to access a failed queue any more.

32. Re: Server side HA and failover

clebert.suconic Nov 13, 2006 7:59 PM (in response to timfox)

I have made a couple of changes to get the actual RemotingLocator into NodeAddressInfo.

- I have made explicity the dependency between ServerPeer and Remoting JMX.

- ServerPeer now looks up for the Remoting Locator

Tomorrow I will make this to go into ConnectionFactories and start doing some load balancing and detection failure on the client side.

Anyway... I had a talk with Tim today, and we are missing:

- failure detection (server
- client side load balancing
- client side failure detectionm
- reconnection protocol

For most of this I needed the actual Remoting Locator, so I had started today doing these changes.

So... so far... so good towards the HA release what is great.

_____________
Clebert Suconic

33. Re: Server side HA and failover

clebert.suconic Nov 16, 2006 6:10 PM (in response to timfox)

I changed NodeAddressInfo a little bit. It now has ConnectionFactories for all the nodes on NodeAddressInfo. (Even I had to convert the factories to bytes before sending it to the cluster).

I had to do that because just the Locator is not enough to perform a connection creation.

I have also started creating a ClusteredConnectionFactory. So far I'm requiring a cluster to also have a regular connection Factory. ClusteredConnectionFactory will get already registered ConnectionFactories on ServerPeer and encapsulate them on a list that will work both as RoundRobbin Connectors and Failover in case of connection failure.

There are some issues I don't know how to solve yet... like what to do when a Server has multiple connectors. We will probably need two ClusteredConnectionFactory (one for each type of connector, for example HTTP/Socket). (I will get the basic working first and worry about multiple datasources right after this).

Also.. We have at this point two PostOffices (queue/topic). To manage the Datasource we need only one. So, I will configure ClusteredDataSource to use Queue ClusteredPostOffice as we don't need to deal with both postOffices. (Let me know if there is any objection)

34. Re: Server side HA and failover

timfox Nov 17, 2006 9:07 AM (in response to timfox)

Clebert - I'm confused here.

Can you explain in more detail why you think you need a ClusteredConnectionFactory and why you need to replicate more than the locatorURI?

35. Re: Server side HA and failover

clebert.suconic Nov 17, 2006 10:42 AM (in response to timfox)

Tim wrote:
...why you think you need a ClusteredConnectionFactory...

- How would you as an user, decide if you want to use an HAConnection or not?
- The ClientConnectinFactory which is dealing with HA, is being called ClusteredClientConnectionFactory, and it aggregates ClientConnectinFactory[]. I will round robbing between the array of child factories.

Tim wrote:
...and why you need to replicate more than the locatorURI?

ClientConnectionFactory has also an equivalent ServerConnectionFactoryEndpoint, right?

Looking at properties from ClientConnectionFactoryDelegate:

 //... from superClass DelegateSupport:
 protected int id;

 //... from the class itself

 protected String serverLocatorURI;
 protected Version serverVersion;
 protected int serverID;
 protected boolean clientPing;

I would need to replicate at least the LocatorURI and the serverID and DelegateSupport::id, assuming serverVersion and clientPing won't never change.

Well... as I had to store almost everything, why not to take this into a real instance of ClientConnectionFactoryDelegate? It made much more sense for me.

Anyway... today I'm going to change ClusteredConnectionFactoryClient a lot today. I had created two classes:

One ClusteredClientConnectionFactoryDelegateServer and another ClusteredConnectionFactoryClient. The server version was supposed to use serialization replaceObject with the Client version updating the list of server from one PostOffice. This didn't work of course..

I will change the PostOffice to re-register the ClusteredConnectionFactory when the View changes.

36. Re: Server side HA and failover

timfox Nov 17, 2006 11:36 AM (in response to timfox)

I am still baffled.

We should talk about this.

Unfortunately I am maxed out right now :(

37. Re: Server side HA and failover

timfox Nov 17, 2006 11:47 AM (in response to timfox)

"clebert.suconic@jboss.com" wrote:
Tim wrote:
...why you think you need a ClusteredConnectionFactory...

- How would you as an user, decide if you want to use an HAConnection or not?

What do you mean by HAConnection, I am not aware of this term in JBoss Messaging

Looking at properties from ClientConnectionFactoryDelegate:
 //... from superClass DelegateSupport:
 protected int id;

 //... from the class itself

 protected String serverLocatorURI;
 protected Version serverVersion;
 protected int serverID;
 protected boolean clientPing;
I would need to replicate at least the LocatorURI and the serverID and DelegateSupport::id, assuming serverVersion and clientPing won't never change.

All you need to replicate is locator uri and server id, the rest stays the same as you mention

Well... as I had to store almost everything,

Well, not almost everything, just a string and an int.

You should create a class as specified in the wiki page I created some time ago.

Replicate the whole connection factory is not a good solution IMHO

38. Re: Server side HA and failover

clebert.suconic Nov 17, 2006 1:05 PM (in response to timfox)

Tim wrote:
Replicate the whole connection factory is not a good solution IMHO

I don't see a problem... it's replicated only when a new server joins the cluster.
I don't think it's needed to create a new class for this.

Anyway, A ConnectionFactory will have to play with multiple delegates anyway. The best way to achive this is by having these delegates on the factory/replicated.

Tim wrote:
What do you mean by HAConnection, I am not aware of this term in JBoss Messaging

Do you expect any JMS Connection Factory being able to round robbin with other servers? What about local connections?

I think the user would have to choose if he wants a ConnectionFactory with HA features or not. This is just a matter of decide the use case.

39. Re: Server side HA and failover

clebert.suconic Nov 17, 2006 1:06 PM (in response to timfox)

Tim wrote:
All you need to replicate is locator uri and server id, the rest stays the same as you mention

I would still need to clone a delegate and replace objectIds to reflect a delegate on that server. It's simpler/cleaner to just use the ClientConnectionFactoryDelegate directly.

40. Re: Server side HA and failover

clebert.suconic Nov 17, 2006 1:08 PM (in response to timfox)

All you need to replicate is locator uri and server id, the rest stays the same as you mention

One tiny correction... the Delegate::ID needs to be replicated also.

41. Re: Server side HA and failover

timfox Nov 17, 2006 1:16 PM (in response to timfox)

Have you considered just downloading the new connection factory from JNDI from the new server?

Then you'll just need the URL of the jndi server on the failover box.

Just an idea. Don't know if it's any good.

42. Re: Server side HA and failover

clebert.suconic Nov 20, 2006 6:29 PM (in response to timfox)

Tim wrote:
Have you considered just downloading the new connection factory from JNDI from the new server?

We could consider that idea if we see any issues. Right now the load balancing is working fine. The PostOffice re-register changes when a new node arrives. It's working really nicely.

Today I played around with remoting.Client.setConnectionListener on Client side. It looks like we have what we need so far. We might need a design session to discuss what we have accomplished though. I will be talking with Ovidiu these days having him acting as an user, and I hope having the design session with Tim/Ovidiu/others by next week when Tim is back from JBW.

43. Re: Server side HA and failover

clebert.suconic Nov 22, 2006 1:01 PM (in response to timfox)

I was talking to Tom Elrod about ConnectionListeners and stuff..

I wanted to know when aConnectionListener would be called. In specifically I wanted to know if the event is called during an invocation.

Tom gave me some light that I want to keep it registered on this thread:

"Tom Elrod over IM" wrote:
The ConnectionListener can be registered on the client and server side. The server side uses leasing to tell if a client has not contacted the server within a certain amount of time and thus considers it dead and notifies the ConnectionListener. On the client side, the ConnectionListener is only there to tell the client that server has died when the client is idle (meaning not making invocations). If an invocation is made and the server is dead, there will be a CannotConnectException thrown from the invoke() method.

"Tom Elrod over IM" wrote:
on the client side, the ConnectionListener is independent of invocation calls. is just a pinger that runs concurrently with any client invocations that might be made.

For more information on Ping at Remoting:

http://labs.jboss.com/portal/jbossremoting/docs/guide/ch08.html

44. Re: Server side HA and failover

clebert.suconic Nov 22, 2006 6:29 PM (in response to timfox)

I've created a new class called LeaveClusterRequest.
This request is sent over the cluster when PostOffice.stop is called.

I have added a parameter boolean to stop, when if you set it to false it will emulate a crash.

Now... viewAccept will perform a failOver if a view is changed. I still need to add some logic to have only one server accepting the failOver but that is pretty easy. I will do it probably over the holidays.

Cheers,