Session failover in clustered
dschlenk Dec 3, 2012 12:03 PMShould a session that originated from one node in a cluster be replicated to the other so that the session continues if the original node shuts down? I'm attempting to test that and haven't been successful. This is the console log when I shut down the original node in the cluster:
16:04:34,382 INFO [exo.jcr.component.core.SearchIndex] (Incoming-1,null) Index closed: /opt/gatein/standalone/data/gatein/jcr/lucene/portal-system_portal
16:04:34,439 INFO [exo.jcr.component.core.SearchIndex] (Incoming-1,null) Index closed: /opt/gatein/standalone/data/gatein/jcr/lucene/portal-work_portal
16:04:34,514 INFO [exo.jcr.component.core.SearchIndex] (Incoming-1,null) Index closed: /opt/gatein/standalone/data/gatein/jcr/lucene/wsrp-system_portal
16:04:34,584 WARN [org.hornetq.core.protocol.core.impl.RemotingConnectionImpl] (Thread-3 (HornetQ-client-global-threads-1674379319)) Connection failure has been detected: The connection was disconnected because of server shutdown [code=4]
16:04:34,602 WARN [org.hornetq.core.protocol.core.impl.RemotingConnectionImpl] (Thread-3 (HornetQ-client-global-threads-1674379319)) Connection failure has been detected: The connection was disconnected because of server shutdown [code=4]
16:04:34,602 INFO [exo.jcr.component.core.SearchIndex] (Incoming-1,null) Index closed: /opt/gatein/standalone/data/gatein/jcr/lucene/pc-system_portal
16:04:34,617 INFO [org.hornetq.core.server.cluster.impl.BridgeImpl] (Thread-12 (HornetQ-server-HornetQServerImpl::serverUUID=42a015b1-303a-11e2-8cfe-000c29fbc252-1584009536)) stopped bridge sf.my-cluster.379a88eb-332f-11e2-8d4f-000c2912e688
16:04:34,691 INFO [exo.jcr.component.core.SearchIndex] (Incoming-2,null) Index closed: /opt/gatein/standalone/data/gatein/jcr/lucene/system_portal_system
16:04:34,746 INFO [exo.jcr.component.core.SearchIndex] (Incoming-2,null) Index closed: /opt/gatein/standalone/data/gatein/jcr/lucene/system_portal
16:04:35,186 INFO [org.jboss.cache.RPCManagerImpl] (Incoming-2,null) Received new cluster view: [appserver2-dev-55198|2] [appserver2-dev-55198]
16:04:35,213 WARNING [org.jgroups.protocols.pbcast.NAKACK] (Incoming-2,null) appserver2-dev-56973: dropped message from appserver-dev-25458 (not in table [appserver2-dev-56973]), view=[appserver2-dev-56973|2] [appserver2-dev-56973]
16:04:35,513 INFO [org.jboss.cache.RPCManagerImpl] (Incoming-1,null) Received new cluster view: [appserver2-dev-56973|2] [appserver2-dev-56973]
16:04:35,561 WARNING [org.jgroups.protocols.pbcast.NAKACK] (Incoming-1,null) appserver2-dev-20744: dropped message from appserver-dev-41880 (not in table [appserver2-dev-20744]), view=[appserver2-dev-20744|2] [appserver2-dev-20744]
16:04:35,862 INFO [org.jboss.cache.RPCManagerImpl] (Incoming-2,null) Received new cluster view: [appserver2-dev-20744|2] [appserver2-dev-20744]
16:04:36,328 INFO [exo.jcr.component.core.WorkspaceResumer] (Thread-121) Setting workspace repository_portal-system online
16:04:36,329 INFO [exo.jcr.component.core.SearchIndex] (Incoming-1,null) Index created: /opt/gatein/standalone/data/gatein/jcr/lucene/portal-system_portal
16:04:36,317 INFO [exo.jcr.component.core.WorkspaceResumer] (Thread-122) Setting workspace repository_portal-work online
16:04:36,332 INFO [exo.jcr.component.core.WorkspaceResumer] (Thread-120) Setting workspace repository_system online
16:04:36,333 INFO [exo.jcr.component.core.WorkspaceResumer] (Thread-123) Setting workspace repository_wsrp-system online
16:04:36,350 INFO [exo.jcr.component.core.WorkspaceResumer] (Thread-124) Setting workspace repository_pc-system online
16:04:36,531 INFO [exo.jcr.component.core.SearchIndex] (Incoming-1,null) Initializing RecoveryFilters.
16:04:36,533 INFO [exo.jcr.component.core.SearchIndex] (Incoming-1,null) Index initialized: /opt/gatein/standalone/data/gatein/jcr/lucene/portal-system_portal Version: 4
16:04:36,536 INFO [exo.jcr.component.core.SearchIndex] (Incoming-1,null) Index created: /opt/gatein/standalone/data/gatein/jcr/lucene/portal-work_portal
16:04:36,589 INFO [exo.jcr.component.core.SearchIndex] (Incoming-1,null) Initializing RecoveryFilters.
16:04:36,590 INFO [exo.jcr.component.core.SearchIndex] (Incoming-1,null) Index initialized: /opt/gatein/standalone/data/gatein/jcr/lucene/portal-work_portal Version: 4
16:04:36,594 INFO [exo.jcr.component.core.SearchIndex] (Incoming-1,null) Index created: /opt/gatein/standalone/data/gatein/jcr/lucene/system_portal_system
16:04:36,629 INFO [exo.jcr.component.core.SearchIndex] (Incoming-1,null) Initializing RecoveryFilters.
16:04:36,630 INFO [exo.jcr.component.core.SearchIndex] (Incoming-1,null) Index initialized: /opt/gatein/standalone/data/gatein/jcr/lucene/system_portal_system Version: 4
16:04:36,633 INFO [exo.jcr.component.core.SearchIndex] (Incoming-1,null) Index created: /opt/gatein/standalone/data/gatein/jcr/lucene/wsrp-system_portal
16:04:36,660 INFO [exo.jcr.component.core.SearchIndex] (Incoming-1,null) Initializing RecoveryFilters.
16:04:36,660 INFO [exo.jcr.component.core.SearchIndex] (Incoming-1,null) Index initialized: /opt/gatein/standalone/data/gatein/jcr/lucene/wsrp-system_portal Version: 4
16:04:36,663 INFO [exo.jcr.component.core.SearchIndex] (Incoming-1,null) Index created: /opt/gatein/standalone/data/gatein/jcr/lucene/pc-system_portal
16:04:36,672 INFO [exo.jcr.component.core.SearchIndex] (Incoming-1,null) Initializing RecoveryFilters.
16:04:36,672 INFO [exo.jcr.component.core.SearchIndex] (Incoming-1,null) Index initialized: /opt/gatein/standalone/data/gatein/jcr/lucene/pc-system_portal Version: 4
16:04:36,682 INFO [exo.jcr.component.core.SearchIndex] (Incoming-1,null) Index created: /opt/gatein/standalone/data/gatein/jcr/lucene/system_portal
16:04:36,692 INFO [exo.jcr.component.core.SearchIndex] (Incoming-1,null) Initializing RecoveryFilters.
16:04:36,693 INFO [exo.jcr.component.core.SearchIndex] (Incoming-1,null) Index initialized: /opt/gatein/standalone/data/gatein/jcr/lucene/system_portal Version: 4
16:04:37,087 INFO [org.jboss.as.clustering.impl.CoreGroupCommunicationService.lifecycle.web] (Incoming-12,null) JBAS010247: New cluster view for partition web (id: 2, delta: -1, merge: false) : [appserver2-dev.example.com/web]
16:04:37,090 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (Incoming-12,null) ISPN000094: Received new cluster view: [appserver2-dev.example.com/web|2] [appserver2-dev.example.com/web]
16:04:46,277 INFO [org.jboss.cache.RPCManagerImpl] (Incoming-2,null) Received new cluster view: [appserver2-dev-65221|2] [appserver2-dev-65221]
16:04:46,304 INFO [org.jboss.cache.RPCManagerImpl] (Incoming-1,null) Received new cluster view: [appserver2-dev-59376|2] [appserver2-dev-59376]
16:04:46,305 INFO [org.jboss.cache.RPCManagerImpl] (Incoming-2,null) Received new cluster view: [appserver2-dev-22286|2] [appserver2-dev-22286]
16:04:46,355 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (Incoming-3,null) ISPN000094: Received new cluster view: [appserver2-dev.example.com-49199|2] [appserver2-dev.example.com-49199]
16:04:46,365 INFO [org.jboss.cache.RPCManagerImpl] (Incoming-1,null) Received new cluster view: [appserver2-dev-62749|2] [appserver2-dev-62749]
It's a pretty stock 3.5.0.Beta02 other than I copied the configuration stuff from the standalone-ha.xml profile to standalone-full-ha.xml so I could take advantage of hornetq and some other things only available in the full profile. I'm using mod_cluster with apache2 to front end the cluster. I disabled the local firewall on the two GateIn nodes. I can still access GateIn after the original node shuts down, but I have to log in again. Is that the expected behavior or should the session be preserved? I don't know if those warning logs about dropped messages mean anything or not. I originally had some read and write buffer warnings in the logs but I sysctl'ed those away but the problem persists.
-
Mod_cluster Status.html.zip 1.2 KB