6 Replies Latest reply on Dec 3, 2012 12:04 PM by dschlenk

Session failover in clustered

dschlenk Dec 3, 2012 12:03 PM

Should a session that originated from one node in a cluster be replicated to the other so that the session continues if the original node shuts down? I'm attempting to test that and haven't been successful. This is the console log when I shut down the original node in the cluster:

16:04:34,382 INFO [exo.jcr.component.core.SearchIndex] (Incoming-1,null) Index closed: /opt/gatein/standalone/data/gatein/jcr/lucene/portal-system_portal

16:04:34,439 INFO [exo.jcr.component.core.SearchIndex] (Incoming-1,null) Index closed: /opt/gatein/standalone/data/gatein/jcr/lucene/portal-work_portal

16:04:34,514 INFO [exo.jcr.component.core.SearchIndex] (Incoming-1,null) Index closed: /opt/gatein/standalone/data/gatein/jcr/lucene/wsrp-system_portal

16:04:34,584 WARN [org.hornetq.core.protocol.core.impl.RemotingConnectionImpl] (Thread-3 (HornetQ-client-global-threads-1674379319)) Connection failure has been detected: The connection was disconnected because of server shutdown [code=4]

16:04:34,602 WARN [org.hornetq.core.protocol.core.impl.RemotingConnectionImpl] (Thread-3 (HornetQ-client-global-threads-1674379319)) Connection failure has been detected: The connection was disconnected because of server shutdown [code=4]

16:04:34,602 INFO [exo.jcr.component.core.SearchIndex] (Incoming-1,null) Index closed: /opt/gatein/standalone/data/gatein/jcr/lucene/pc-system_portal

16:04:34,617 INFO [org.hornetq.core.server.cluster.impl.BridgeImpl] (Thread-12 (HornetQ-server-HornetQServerImpl::serverUUID=42a015b1-303a-11e2-8cfe-000c29fbc252-1584009536)) stopped bridge sf.my-cluster.379a88eb-332f-11e2-8d4f-000c2912e688

16:04:34,691 INFO [exo.jcr.component.core.SearchIndex] (Incoming-2,null) Index closed: /opt/gatein/standalone/data/gatein/jcr/lucene/system_portal_system

16:04:34,746 INFO [exo.jcr.component.core.SearchIndex] (Incoming-2,null) Index closed: /opt/gatein/standalone/data/gatein/jcr/lucene/system_portal

16:04:35,186 INFO [org.jboss.cache.RPCManagerImpl] (Incoming-2,null) Received new cluster view: [appserver2-dev-55198|2] [appserver2-dev-55198]

16:04:35,213 WARNING [org.jgroups.protocols.pbcast.NAKACK] (Incoming-2,null) appserver2-dev-56973: dropped message from appserver-dev-25458 (not in table [appserver2-dev-56973]), view=[appserver2-dev-56973|2] [appserver2-dev-56973]

16:04:35,513 INFO [org.jboss.cache.RPCManagerImpl] (Incoming-1,null) Received new cluster view: [appserver2-dev-56973|2] [appserver2-dev-56973]

16:04:35,561 WARNING [org.jgroups.protocols.pbcast.NAKACK] (Incoming-1,null) appserver2-dev-20744: dropped message from appserver-dev-41880 (not in table [appserver2-dev-20744]), view=[appserver2-dev-20744|2] [appserver2-dev-20744]

16:04:35,862 INFO [org.jboss.cache.RPCManagerImpl] (Incoming-2,null) Received new cluster view: [appserver2-dev-20744|2] [appserver2-dev-20744]

16:04:36,328 INFO [exo.jcr.component.core.WorkspaceResumer] (Thread-121) Setting workspace repository_portal-system online

16:04:36,329 INFO [exo.jcr.component.core.SearchIndex] (Incoming-1,null) Index created: /opt/gatein/standalone/data/gatein/jcr/lucene/portal-system_portal

16:04:36,317 INFO [exo.jcr.component.core.WorkspaceResumer] (Thread-122) Setting workspace repository_portal-work online

16:04:36,332 INFO [exo.jcr.component.core.WorkspaceResumer] (Thread-120) Setting workspace repository_system online

16:04:36,333 INFO [exo.jcr.component.core.WorkspaceResumer] (Thread-123) Setting workspace repository_wsrp-system online

16:04:36,350 INFO [exo.jcr.component.core.WorkspaceResumer] (Thread-124) Setting workspace repository_pc-system online

16:04:36,531 INFO [exo.jcr.component.core.SearchIndex] (Incoming-1,null) Initializing RecoveryFilters.

16:04:36,533 INFO [exo.jcr.component.core.SearchIndex] (Incoming-1,null) Index initialized: /opt/gatein/standalone/data/gatein/jcr/lucene/portal-system_portal Version: 4

16:04:36,536 INFO [exo.jcr.component.core.SearchIndex] (Incoming-1,null) Index created: /opt/gatein/standalone/data/gatein/jcr/lucene/portal-work_portal

16:04:36,589 INFO [exo.jcr.component.core.SearchIndex] (Incoming-1,null) Initializing RecoveryFilters.

16:04:36,590 INFO [exo.jcr.component.core.SearchIndex] (Incoming-1,null) Index initialized: /opt/gatein/standalone/data/gatein/jcr/lucene/portal-work_portal Version: 4

16:04:36,594 INFO [exo.jcr.component.core.SearchIndex] (Incoming-1,null) Index created: /opt/gatein/standalone/data/gatein/jcr/lucene/system_portal_system

16:04:36,629 INFO [exo.jcr.component.core.SearchIndex] (Incoming-1,null) Initializing RecoveryFilters.

16:04:36,630 INFO [exo.jcr.component.core.SearchIndex] (Incoming-1,null) Index initialized: /opt/gatein/standalone/data/gatein/jcr/lucene/system_portal_system Version: 4

16:04:36,633 INFO [exo.jcr.component.core.SearchIndex] (Incoming-1,null) Index created: /opt/gatein/standalone/data/gatein/jcr/lucene/wsrp-system_portal

16:04:36,660 INFO [exo.jcr.component.core.SearchIndex] (Incoming-1,null) Initializing RecoveryFilters.

16:04:36,660 INFO [exo.jcr.component.core.SearchIndex] (Incoming-1,null) Index initialized: /opt/gatein/standalone/data/gatein/jcr/lucene/wsrp-system_portal Version: 4

16:04:36,663 INFO [exo.jcr.component.core.SearchIndex] (Incoming-1,null) Index created: /opt/gatein/standalone/data/gatein/jcr/lucene/pc-system_portal

16:04:36,672 INFO [exo.jcr.component.core.SearchIndex] (Incoming-1,null) Initializing RecoveryFilters.

16:04:36,672 INFO [exo.jcr.component.core.SearchIndex] (Incoming-1,null) Index initialized: /opt/gatein/standalone/data/gatein/jcr/lucene/pc-system_portal Version: 4

16:04:36,682 INFO [exo.jcr.component.core.SearchIndex] (Incoming-1,null) Index created: /opt/gatein/standalone/data/gatein/jcr/lucene/system_portal

16:04:36,692 INFO [exo.jcr.component.core.SearchIndex] (Incoming-1,null) Initializing RecoveryFilters.

16:04:36,693 INFO [exo.jcr.component.core.SearchIndex] (Incoming-1,null) Index initialized: /opt/gatein/standalone/data/gatein/jcr/lucene/system_portal Version: 4

16:04:37,087 INFO [org.jboss.as.clustering.impl.CoreGroupCommunicationService.lifecycle.web] (Incoming-12,null) JBAS010247: New cluster view for partition web (id: 2, delta: -1, merge: false) : [appserver2-dev.example.com/web]

16:04:37,090 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (Incoming-12,null) ISPN000094: Received new cluster view: [appserver2-dev.example.com/web|2] [appserver2-dev.example.com/web]

16:04:46,277 INFO [org.jboss.cache.RPCManagerImpl] (Incoming-2,null) Received new cluster view: [appserver2-dev-65221|2] [appserver2-dev-65221]

16:04:46,304 INFO [org.jboss.cache.RPCManagerImpl] (Incoming-1,null) Received new cluster view: [appserver2-dev-59376|2] [appserver2-dev-59376]

16:04:46,305 INFO [org.jboss.cache.RPCManagerImpl] (Incoming-2,null) Received new cluster view: [appserver2-dev-22286|2] [appserver2-dev-22286]

16:04:46,355 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (Incoming-3,null) ISPN000094: Received new cluster view: [appserver2-dev.example.com-49199|2] [appserver2-dev.example.com-49199]

16:04:46,365 INFO [org.jboss.cache.RPCManagerImpl] (Incoming-1,null) Received new cluster view: [appserver2-dev-62749|2] [appserver2-dev-62749]

It's a pretty stock 3.5.0.Beta02 other than I copied the configuration stuff from the standalone-ha.xml profile to standalone-full-ha.xml so I could take advantage of hornetq and some other things only available in the full profile. I'm using mod_cluster with apache2 to front end the cluster. I disabled the local firewall on the two GateIn nodes. I can still access GateIn after the original node shuts down, but I have to log in again. Is that the expected behavior or should the session be preserved? I don't know if those warning logs about dropped messages mean anything or not. I originally had some read and write buffer warnings in the logs but I sysctl'ed those away but the problem persists.

appserver2-dev_standalone-full-ha.xml 30.6 KB
appserver-dev_standalone-full-ha.xml 30.6 KB
Mod_cluster Status.html.zip 1.2 KB

1. Re: Session failover in clustered

mvanco Dec 3, 2012 4:13 AM (in response to dschlenk)

Hi David,

the most important for session replication is following configuration:
        <subsystem xmlns="urn:jboss:domain:web:1.2" default-virtual-server="default-host" native="false">
            <connector name="http" protocol="HTTP/1.1" scheme="http" socket-binding="http"/>
            <connector name="ajp" protocol="AJP/1.3" scheme="http" socket-binding="ajp"/>
            <virtual-server name="default-host" enable-welcome-root="true">
                <alias name="localhost"/>
                <alias name="example.com"/>
                <sso cache-container="web" cache-name="sso" reauthenticate="false"/>
            </virtual-server>
        </subsystem>

With this setup, session replication works by default after failover.

Regards,
Michal Vančo
Actions
2. Re: Session failover in clustered

dschlenk Dec 3, 2012 10:26 AM (in response to mvanco)

I have that configuration present in both nodes. Do I need to change something, like change or add an alias?
Actions
3. Re: Session failover in clustered

dschlenk Dec 3, 2012 11:33 AM (in response to dschlenk)

Maybe a better question is - is the "classic" portal configured for session replication? I deployed a sample clustered app and that appears to replicate properly, but if I log into the classic portal and then shut down the node that the session was created on, I have to log in again.
Actions
4. Re: Session failover in clustered

mvanco Dec 3, 2012 11:36 AM (in response to dschlenk)

Hi David,
can you verify with some developer tool what is your jsessionid at cookie on loadbalancer? That should be something like JSESSIONID=<jsessionid>.<jvmRoute/nodeName>
After failover, jsessionid should remain the same and jvmRoute should change to new active node.
And do you have your mod_cluster setup? (sticky session etc.)

Try to provide as many details as possible so that we can try to reproduce/report possible issue. How do you start cluster nodes (params)?

Regards,
Michal
Actions
5. Re: Session failover in clustered

mvanco Dec 3, 2012 11:37 AM (in response to dschlenk)

Yes, session replication should work at "classic" portal site.
Actions
6. Re: Session failover in clustered

dschlenk Dec 3, 2012 12:04 PM (in response to mvanco)

The jsessionid does not remain the same but jvmRoute does change as expected.

I have mod_cluster setup and am accessing through that.

The output of the status page when both nodes are active is attached to the original post.

As far as configuration, I have:

<subsystem xmlns="urn:jboss:domain:modcluster:1.0">
<mod-cluster-config advertise-socket="modcluster" proxy-list="httpd-dev.pe.spanlink.com:8001">
    <dynamic-load-provider>
      <load-metric type="busyness"/>
    </dynamic-load-provider>
</mod-cluster-config>
</subsystem>

on each jboss node. On the apache side:

Listen 172.17.13.155:8001
MemManagerFile /var/cache/httpd

<VirtualHost *:8001>

<Location />
    Order allow,deny
    Allow from all
</Location>

<Location /mod_cluster_manager>
   SetHandler mod_cluster-manager
   Order deny,allow
   Deny from all
      Allow from 192.168.
      Allow from 10.
      Allow from 172.17.
</Location>

KeepAliveTimeout 60
MaxKeepAliveRequests 0

ManagerBalancerName mycluster
AdvertiseFrequency 5
ServerAdvertise On

</VirtualHost>

Also attached are the full standalone-full-ha.xml files for each node. Note that I define the node name in there so I don't need to specify at the command line. They are separate machines so I don't run with any port offsets. Both nodes and the apache server are on the same physical vmware host as well so there shouldn't be any problems with network equipment blocking multicast or anything.

The startup command I use is just standalone.sh -c standalone-full-ha.xml
Actions

Go to original post