-
1. Re: JBoss Cluster issue
ben.wang Jun 24, 2006 10:31 PM (in response to tterm)The best way to troubleshoot is to turn on log tracing for both tomcat (org.jboss.web.tomcat.tc5.session) and ejb3 (I assume? org.jboss.ejb3). This way you are sure whether the states have been replicated or not.
-
2. Re: JBoss Cluster issue
brian.stansberry Jun 28, 2006 6:19 PM (in response to tterm)Please also make sure your SFSB is clustered, i.e. has @Clustered annotation if EBJ3 or
<clustered>true</clustered>
in jboss.xml if EJB 2. -
3. Re: JBoss Cluster issue
tterm Jun 29, 2006 6:37 AM (in response to tterm)Ok thank you all for the fast replies!
I tried it again with tracing and a small sample application.
The node (node pluto) which serves the user did the following during a normal request:12:20:03,195 DEBUG [JBossCacheManager] Creating an empty ClusteredSession 12:20:03,196 DEBUG [JBossCacheManager] loadSession(): session kenH3CFYvQjvouUiK8UxYQ** not found in distributed cache 12:20:03,196 DEBUG [JBossCacheManager] Creating an empty ClusteredSession 12:20:03,197 DEBUG [JBossCacheManager] Session with id=kenH3CFYvQjvouUiK8UxYQ**.node1 added. Current active sessions 1 12:20:03,200 DEBUG [JBossCacheManager] Created a ClusteredSession with id: kenH3CFYvQjvouUiK8UxYQ**.node1 12:20:03,537 DEBUG [JBossCacheManager] check to see if needs to store and replicate session with id kenH3CFYvQjvouUiK8UxYQ**.node1 12:20:03,541 DEBUG [ClusteredSession] processSessionRepl(): session is dirty. Will increment version from: 0 and replicate. 12:20:11,003 DEBUG [JvmRouteValve] checkJvmRoute(): check if need to re-route based on JvmRoute. Session id: kenH3CFYvQjvouUiK8UxYQ**.node1 jvmRoute: node1 12:20:11,686 DEBUG [ClusterSFBean] myInit 12:20:11,769 DEBUG [ExtendedPersistenceContextPropagationInterceptor] ++++ LongLivedSessionPropagationInterceptor 12:20:12,046 DEBUG [JBossCacheManager] check to see if needs to store and replicate session with id kenH3CFYvQjvouUiK8UxYQ**.node1 12:20:12,047 DEBUG [ClusteredSession] processSessionRepl(): session is dirty. Will increment version from: 1 and replicate.
After that I killed the node pluto and tried to get the values from the replicated session (there is only a remote reference to a stateful session bean in it).
Then this is the message from node mars which tried to serve the request:12:26:19,812 DEBUG [JBossCacheManager] Creating an empty ClusteredSession 12:26:19,947 DEBUG [JBossCacheManager] Session with id=kenH3CFYvQjvouUiK8UxYQ**.node1 added. Current active sessions 1 12:26:19,959 DEBUG [JBossCacheManager] loadSession(): id= kenH3CFYvQjvouUiK8UxYQ**, session=SessionBasedClusteredSession[id: kenH3CFYvQjvouUiK8UxYQ**.node1 lastAccessedTime: 1151576411002 version: 3 lastOutdated: 0] 12:26:19,973 DEBUG [JvmRouteValve] checkJvmRoute(): check if need to re-route based on JvmRoute. Session id: kenH3CFYvQjvouUiK8UxYQ**.node1 jvmRoute: node2 12:26:19,973 DEBUG [JvmRouteValve] handleJvmRoute(): We have detected a failover with different jvmRoute. old one: node1 new one: node2. Will reset the session id. 12:26:19,974 DEBUG [JvmRouteValve] resetSessionId(): changed catalina session to= [kenH3CFYvQjvouUiK8UxYQ**.node2] old one= [kenH3CFYvQjvouUiK8UxYQ**.node1] 12:26:19,995 DEBUG [JBossCacheManager] Setting cookie with session id:kenH3CFYvQjvouUiK8UxYQ**.node2 & name:JSESSIONID 12:26:20,046 DEBUG [ClusterSFServlet] jboss.j2ee:ear=cluster.ear,jar=cluster.jar,name=ClusterSFBean,service=EJB3:5c4o03-5yvadc-ep0ys7kt-1-ep0yv20w-6 12:26:21,436 DEBUG [ExtendedPersistenceContextPropagationInterceptor] ++++ LongLivedSessionPropagationInterceptor 12:26:30,191 INFO [TreeCache] viewAccepted(): [mars:32835|2] [mars:32835] 12:26:30,191 INFO [TreeCache] viewAccepted(): [mars:32835|2] [mars:32835] 12:26:31,590 ERROR [[ClusterSF]] Servlet.service() for servlet ClusterSF threw exception java.lang.RuntimeException: org.jboss.cache.ReplicationException: rsp=sender=pluto:32886, retval=null, received=false, suspected=true at org.jboss.ejb3.cache.tree.StatefulTreeCache.remove(StatefulTreeCache.java:115) at org.jboss.ejb3.stateful.StatefulInstanceInterceptor.invoke(StatefulInstanceInterceptor.java:89) ..... ..... 12:26:31,692 INFO [TreeCache] viewAccepted(): [mars:32826|2] [mars:32826] 12:26:31,692 INFO [TreeCache] viewAccepted(): [mars:32826|2] [mars:32826] 12:26:31,811 DEBUG [JBossCacheManager] check to see if needs to store and replicate session with id kenH3CFYvQjvouUiK8UxYQ**.node2 12:26:31,820 DEBUG [ClusteredSession] processSessionRepl(): session is dirty. Will increment version from: 3 and replicate. 12:26:37,661 INFO [TreeCache] viewAccepted(): [mars:32833|2] [mars:32833] 12:26:37,661 INFO [TreeCache] viewAccepted(): [mars:32833|2] [mars:32833] 12:26:38,763 WARN [FD] ping_dest is null: members=[pluto:32891 (additional data: 16 bytes), mars:32831 (additional data: 16 bytes)], pingable_mbrs=[mars:32831 (additional data: 16 bytes)], local_addr=mars:32831 (additional data: 16 bytes) 12:26:38,763 WARN [FD] ping_dest is null: members=[pluto:32891 (additional data: 16 bytes), mars:32831 (additional data: 16 bytes)], pingable_mbrs=[mars:32831 (additional data: 16 bytes)], local_addr=mars:32831 (additional data: 16 bytes) 12:26:39,264 INFO [DefaultPartition] Suspected member: pluto:32891 (additional data: 16 bytes) 12:26:39,276 INFO [DefaultPartition] New cluster view for partition DefaultPartition (id: 2, delta: -1) : [192.168.0.4:1099] 12:26:39,284 INFO [DefaultPartition] I am (192.168.0.4:1099) received membershipChanged event: 12:26:39,291 INFO [DefaultPartition] Dead members: 1 ([192.168.0.3:1099]) 12:26:39,291 INFO [DefaultPartition] New Members : 0 ([]) 12:26:39,291 INFO [DefaultPartition] All Members : 1 ([192.168.0.4:1099]) 12:26:39,292 DEBUG [JGCacheInvalidationBridge] The list of replicant for the JG bridge has changed, computing and updating local info... 12:26:39,293 DEBUG [JGCacheInvalidationBridge] ... No bridge info was associated to this node
But the thing is if I wait for a longer time not just a few seconds with the next request after the kill then it works fine. What is the problem then? Maybe I didn't get something correctly. If you need more information please ask me again!
Thanks in advance!
Thomas -
4. Re: JBoss Cluster issue
brian.stansberry Jun 30, 2006 8:17 PM (in response to tterm)It looks like when you fail over to the other server, the failover server doesn't know the first one is dead yet and tries to replicate to it. This then fails.
If you wait a few secs, the 2nd server knows the 1st is dead and doesn't try to replicate to it.
Suggest you use a combination of FD and FD_SOCK in your jgroups configs. See http://wiki.jboss.org/wiki/Wiki.jsp?page=FDVersusFD_SOCK, particularly the bit at the bottom.