puzzles about load balancing and failover of JBoss cluster
menkun Jul 21, 2005 10:27 AMI just go through the book 'jboss clustering', but still have several questions related with the load balance and session bean failover capabilities of JBoss cluster.
1) It seems that JBoss cluster cannot balance running process, for example, I have a two-node (A and B) cluster (my EJB running on both with clustered configuration), and three same client requests from third computer C are leveled with ?Round-Robin? policy, two in node A and one in node B. If I kill node A (Ctrl+c), all processes in A are transferred to B seamlessly. However, after I restart A and let A successfully join back to the cluster, all process running in B still remain there. Seems that JBoss cluster cannot migrate running process back. I just wonder that my conclusion is right or not, maybe I have missed some configuration?
2) Here is my understanding about session bean failover: If we kill a server (say, node A) by ?Ctrl+c? or shutting down its OS normally, the OS of dying node will send out a specific message to other nodes, this message mark A as ?dead?. Thus other nodes will immediately know its death and take over the session running on the dead node. My question is that: does the ?smart proxy? in client side also capture and use this message? My guess is that: the ?smart proxy? does not need this message. When node A is unreachable, the interceptor will capture a RMI call exception, then the proxy will elect another target node and forward its RMI call to the new target node, the new node will take over the session and give back the response and new view of the cluster. If my understanding is right, every time the proxy capture a RMI call exception, it will forward its RMI call to another target node immediately, then no matter what kind of failure of A, there should be a very short time delay to failover a session. But we found a problem described in my question 3)
3) Still suppose we have a two nodes cluster (A and B), and if we unplug the network cable of A, there will be no signal that can be sent to B to mark A as ?dead?. Node B cannot identify A as a ?dead? member immediately, because it is hard to immediately tell between the network traffic jam and an unplug event. Basically node B will try several times to ping node A till ?time out? to verify its suspect of node A?s death. With default ?time out? configuration in ?cluster-service.xml?, it may take minutes to failover a session. I have decreased those timeout parameters and number of re-try included in tag, also set the ?shun? attribute of ?pbcast.GMS? to ?True?. From log info, we know that node B can immediately detect the death of A, but it still take another 20~30 seconds to failover the session to B. So I feel puzzled, if the failover is exclusively handled by proxy in client side as I described above, there should not have such a time delay (even node B don?t know A is dead).
4) My last question, suppose we have a extreme case, client make a RMI call to node A at t=0, and this RMI call will take 100min to finish the computation, then at t=2min, node A crush, then what will happen now? Can this also be failover to node B without restart this RMI call? The computation in node A can be continued in node B?
I am not sure my question is clear and right, and I guess that maybe I still have wrong understanding of the mechanism of JBoss cluster. Any help will be highly appreciated, thanks a lot!