5 Replies Latest reply on Jul 1, 2011 5:47 AM by ronsen

Failover performance

ronsen Jun 27, 2011 5:31 AM

Since the ability (ejb) to failover relies on the client side(if im not mistaken), the failover on a node crashing shouldnt be measurable or?

The client downloads the dp-object and handles it itsself, so will there be a measurable "delay" if a servercrash occurs and it fails over to another node?

My test was to deploy an ejb on 3 nodes and send counting numbers to the cluster. Whenever a node receives a number, it writes the receivment to the database whereas the database adds a timestamp, so that the times wont change according to the server clocks.

So, whenever a failover occurs, there should be a measureable delay between No. X and Y, just in case it causes some.

The result was, that theres (with nanosecs) actually none...

Did I understand anything wrong or does a measureable delay only occurs if theres anything has to be restored from replicated data?

Thanks in advance,

1. Re: Failover performance

wdfink Jun 27, 2011 9:33 AM (in response to ronsen)

The EJB loadbalancing and failover works with a client proxy and a server communication.

The server communicate a shutdown or detect a full stop (e.g. complete hanging or looong full GC).
A crash, e.g. JVM cored, is also detected.
It depends on the situation how long it takes.

Fact is
- all Tx on the chrased server are not commited
- the client proxy might hung for a few millis and try the next server of its list
- the next server provide the new cluster view without the crashed server.

So there is no measurable time for such failover in best case and only a few millis in worst.
Actions
2. Re: Failover performance

ronsen Jun 27, 2011 10:02 AM (in response to wdfink)

Thats means teh detection is done on serverside and the client will be informed?
Because there must be at least a test for a connection to the crashed server, nothing comes back -> next server in list (DP). So there should be a measureable time for it?

But yet, good to know, thanks for clarifying. But how about failover in case of session-replication would that be a measureable value?
Actions
3. Re: Failover performance

wdfink Jun 27, 2011 1:00 PM (in response to ronsen)

For the internal communication you should have a look to:
http://community.jboss.org/wiki/Shunning
http://community.jboss.org/wiki/FDVersusFDSOCK
http://community.jboss.org/wiki/JGroupsPbcastGMS
http://community.jboss.org/wiki/JGroupsFD
You will find a lot of information about it works inside.

With HTTP session-replication I do not work this time.
I know that the most common way is a buddy-replication, only two nodes keep the state of the session.
If the one where the session is connected fail an other server will process the next call. If this is not the 'buddy' the session must be copied to the current instance and this will take it's time depend to the size of the session data.
Actions
4. Re: Failover performance

ronsen Jun 28, 2011 3:12 AM (in response to wdfink)

Great, thanks. I'm going to take a look and this and will try to get on this with the replication
Actions
5. Re: Failover performance

ronsen Jul 1, 2011 5:47 AM (in response to ronsen)

Hey, can somebody probably (please only if you are sure ) why with the load-balancing policy randomRobin/RoundRobin, only the first nodes-1 requests are slow and afterwards everything becomes way faster? Is there something cached and will there be a timeout? when do these values will be invalidated?

As an example, send counting numbers to a cluster with a round-robin policy with a 50ms pause in between and measure the amount of time it takes to print the first clustersize-1 values. I discovered that it increased by a factor of ~4

thanks a lot,
Actions

Go to original post