4 Replies Latest reply on Jun 30, 2006 8:17 PM by brian.stansberry

    JBoss Cluster issue

    tterm

      Hello,

      I tried to setup a cluster with 3 nodes (A,B,C). The apllication which should run is the xpetstore from the JBoss CVS which I updated for clustering support. The app server is the newest JBoss 4.0.4GA (session replication). I used the apache web server as loadbalancer with sticky_session=1. This works fine!

      But I have some errors which I don't understand correctly! The scenario is the following: The application is starting on node A. I login a user and put something in his cart. So the session is created and also the stateful cart bean. After that i kill node A and try to show the shopping cart from the actual user which should have an entry. But then I get the following error:

      13:32:51,703 WARN [RequestProcessor] Unhandled Exception thrown: class javax.ejb.EJBNoSuchObjectException
      13:32:51,704 ERROR [[action]] Servlet.service() for servlet action threw exception
      javax.ejb.EJBNoSuchObjectException: Could not find Stateful bean: 5c4o03-7t25rj-eoqw5osv-1-eoqx37wg-g


      I thought the session will be replicated and also the stateful bean! Is this right?

      The next thing is what does the following error message mean?
      13:20:18,471 WARN [CacheListener] Possible concurrency problem: Replicated version id 84 matches in-memory version for session w9Zpgssg8wgqjKMC2N099w**
      13:20:23,246 WARN [CacheListener] Possible concurrency problem: Replicated version id 85 matches in-memory version for session w9Zpgssg8wgqjKMC2N099w**


      I hope someone could help me! If you need more information please ask me I didn't want post too much!

      Thanks in advance!
      Thomas


        • 1. Re: JBoss Cluster issue

          The best way to troubleshoot is to turn on log tracing for both tomcat (org.jboss.web.tomcat.tc5.session) and ejb3 (I assume? org.jboss.ejb3). This way you are sure whether the states have been replicated or not.

          • 2. Re: JBoss Cluster issue
            brian.stansberry

            Please also make sure your SFSB is clustered, i.e. has @Clustered annotation if EBJ3 or

            <clustered>true</clustered>
            in jboss.xml if EJB 2.

            • 3. Re: JBoss Cluster issue
              tterm

              Ok thank you all for the fast replies!

              I tried it again with tracing and a small sample application.

              The node (node pluto) which serves the user did the following during a normal request:

              12:20:03,195 DEBUG [JBossCacheManager] Creating an empty ClusteredSession
              12:20:03,196 DEBUG [JBossCacheManager] loadSession(): session kenH3CFYvQjvouUiK8UxYQ** not found in distributed cache
              12:20:03,196 DEBUG [JBossCacheManager] Creating an empty ClusteredSession
              12:20:03,197 DEBUG [JBossCacheManager] Session with id=kenH3CFYvQjvouUiK8UxYQ**.node1 added. Current active sessions 1
              12:20:03,200 DEBUG [JBossCacheManager] Created a ClusteredSession with id: kenH3CFYvQjvouUiK8UxYQ**.node1
              12:20:03,537 DEBUG [JBossCacheManager] check to see if needs to store and replicate session with id kenH3CFYvQjvouUiK8UxYQ**.node1
              12:20:03,541 DEBUG [ClusteredSession] processSessionRepl(): session is dirty. Will increment version from: 0 and replicate.
              12:20:11,003 DEBUG [JvmRouteValve] checkJvmRoute(): check if need to re-route based on JvmRoute. Session id: kenH3CFYvQjvouUiK8UxYQ**.node1 jvmRoute: node1
              12:20:11,686 DEBUG [ClusterSFBean] myInit
              12:20:11,769 DEBUG [ExtendedPersistenceContextPropagationInterceptor] ++++ LongLivedSessionPropagationInterceptor
              12:20:12,046 DEBUG [JBossCacheManager] check to see if needs to store and replicate session with id kenH3CFYvQjvouUiK8UxYQ**.node1
              12:20:12,047 DEBUG [ClusteredSession] processSessionRepl(): session is dirty. Will increment version from: 1 and replicate.


              After that I killed the node pluto and tried to get the values from the replicated session (there is only a remote reference to a stateful session bean in it).

              Then this is the message from node mars which tried to serve the request:

              12:26:19,812 DEBUG [JBossCacheManager] Creating an empty ClusteredSession
              12:26:19,947 DEBUG [JBossCacheManager] Session with id=kenH3CFYvQjvouUiK8UxYQ**.node1 added. Current active sessions 1
              12:26:19,959 DEBUG [JBossCacheManager] loadSession(): id= kenH3CFYvQjvouUiK8UxYQ**, session=SessionBasedClusteredSession[id: kenH3CFYvQjvouUiK8UxYQ**.node1 lastAccessedTime: 1151576411002 version: 3 lastOutdated: 0]
              12:26:19,973 DEBUG [JvmRouteValve] checkJvmRoute(): check if need to re-route based on JvmRoute. Session id: kenH3CFYvQjvouUiK8UxYQ**.node1 jvmRoute: node2
              12:26:19,973 DEBUG [JvmRouteValve] handleJvmRoute(): We have detected a failover with different jvmRoute. old one: node1 new one: node2. Will reset the session id.
              12:26:19,974 DEBUG [JvmRouteValve] resetSessionId(): changed catalina session to= [kenH3CFYvQjvouUiK8UxYQ**.node2] old one= [kenH3CFYvQjvouUiK8UxYQ**.node1]
              12:26:19,995 DEBUG [JBossCacheManager] Setting cookie with session id:kenH3CFYvQjvouUiK8UxYQ**.node2 & name:JSESSIONID
              12:26:20,046 DEBUG [ClusterSFServlet] jboss.j2ee:ear=cluster.ear,jar=cluster.jar,name=ClusterSFBean,service=EJB3:5c4o03-5yvadc-ep0ys7kt-1-ep0yv20w-6
              12:26:21,436 DEBUG [ExtendedPersistenceContextPropagationInterceptor] ++++ LongLivedSessionPropagationInterceptor
              12:26:30,191 INFO [TreeCache] viewAccepted(): [mars:32835|2] [mars:32835]
              12:26:30,191 INFO [TreeCache] viewAccepted(): [mars:32835|2] [mars:32835]
              12:26:31,590 ERROR [[ClusterSF]] Servlet.service() for servlet ClusterSF threw exception
              java.lang.RuntimeException: org.jboss.cache.ReplicationException: rsp=sender=pluto:32886, retval=null, received=false, suspected=true
               at org.jboss.ejb3.cache.tree.StatefulTreeCache.remove(StatefulTreeCache.java:115)
               at org.jboss.ejb3.stateful.StatefulInstanceInterceptor.invoke(StatefulInstanceInterceptor.java:89)
               .....
               .....
              12:26:31,692 INFO [TreeCache] viewAccepted(): [mars:32826|2] [mars:32826]
              12:26:31,692 INFO [TreeCache] viewAccepted(): [mars:32826|2] [mars:32826]
              12:26:31,811 DEBUG [JBossCacheManager] check to see if needs to store and replicate session with id kenH3CFYvQjvouUiK8UxYQ**.node2
              12:26:31,820 DEBUG [ClusteredSession] processSessionRepl(): session is dirty. Will increment version from: 3 and replicate.
              12:26:37,661 INFO [TreeCache] viewAccepted(): [mars:32833|2] [mars:32833]
              12:26:37,661 INFO [TreeCache] viewAccepted(): [mars:32833|2] [mars:32833]
              12:26:38,763 WARN [FD] ping_dest is null: members=[pluto:32891 (additional data: 16 bytes), mars:32831 (additional data: 16 bytes)], pingable_mbrs=[mars:32831 (additional data: 16 bytes)], local_addr=mars:32831 (additional data: 16 bytes)
              12:26:38,763 WARN [FD] ping_dest is null: members=[pluto:32891 (additional data: 16 bytes), mars:32831 (additional data: 16 bytes)], pingable_mbrs=[mars:32831 (additional data: 16 bytes)], local_addr=mars:32831 (additional data: 16 bytes)
              12:26:39,264 INFO [DefaultPartition] Suspected member: pluto:32891 (additional data: 16 bytes)
              12:26:39,276 INFO [DefaultPartition] New cluster view for partition DefaultPartition (id: 2, delta: -1) : [192.168.0.4:1099]
              12:26:39,284 INFO [DefaultPartition] I am (192.168.0.4:1099) received membershipChanged event:
              12:26:39,291 INFO [DefaultPartition] Dead members: 1 ([192.168.0.3:1099])
              12:26:39,291 INFO [DefaultPartition] New Members : 0 ([])
              12:26:39,291 INFO [DefaultPartition] All Members : 1 ([192.168.0.4:1099])
              12:26:39,292 DEBUG [JGCacheInvalidationBridge] The list of replicant for the JG bridge has changed, computing and updating local info...
              12:26:39,293 DEBUG [JGCacheInvalidationBridge] ... No bridge info was associated to this node
              



              But the thing is if I wait for a longer time not just a few seconds with the next request after the kill then it works fine. What is the problem then? Maybe I didn't get something correctly. If you need more information please ask me again!

              Thanks in advance!
              Thomas

              • 4. Re: JBoss Cluster issue
                brian.stansberry

                It looks like when you fail over to the other server, the failover server doesn't know the first one is dead yet and tries to replicate to it. This then fails.

                If you wait a few secs, the 2nd server knows the 1st is dead and doesn't try to replicate to it.

                Suggest you use a combination of FD and FD_SOCK in your jgroups configs. See http://wiki.jboss.org/wiki/Wiki.jsp?page=FDVersusFD_SOCK, particularly the bit at the bottom.