2 Replies Latest reply on Jun 2, 2006 4:06 AM by fotero

    Rollback during session replication

    fotero

      Hi,

      I'm trying to setup a 2 node cluster with HTTP session replication. Everything seems correct, the aplications get deployed, the users can access them, and so on. The problem is that I´m receiving rollback exceptions during session replication:

      10:04:28,371 ERROR [DummyTransaction] beforeCompletion() failed for tx=org.jboss.cache.transaction.DummyTransaction@e3e4fafe, handlers=[TxInterceptor.LocalSynchronizationHandler(gtx=GlobalTransaction:<172.31.5.65:37591>:167, tx=org.jboss.cache.transaction.DummyTransaction@e3e4fafe)]
      java.lang.RuntimeException:
       at org.jboss.cache.interceptors.TxInterceptor$LocalSynchronizationHandler.beforeCompletion(TxInterceptor.java:1065)
       at org.jboss.cache.interceptors.OrderedSynchronizationHandler.beforeCompletion(OrderedSynchronizationHandler.java:72)
       at org.jboss.cache.transaction.DummyTransaction.notifyBeforeCompletion(DummyTransaction.java:247)
       at org.jboss.cache.transaction.DummyTransaction.commit(DummyTransaction.java:54)
       at org.jboss.cache.transaction.DummyBaseTransactionManager.commit(DummyBaseTransactionManager.java:61)
       at org.jboss.web.tomcat.tc5.session.JBossCacheManager.endTransaction(JBossCacheManager.java:1038)
       at org.jboss.web.tomcat.tc5.session.JBossCacheManager.processSessionRepl(JBossCacheManager.java:1017)
       at org.jboss.web.tomcat.tc5.session.JBossCacheManager.storeSession(JBossCacheManager.java:637)
       at org.jboss.web.tomcat.tc5.session.InstantSnapshotManager.snapshot(InstantSnapshotManager.java:52)
       at org.jboss.web.tomcat.tc5.session.ClusteredSessionValve.invoke(ClusteredSessionValve.java:105)
       at org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.java:74)
       at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126)
       at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105)
       at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107)
       at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148)
       at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:869)
       at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:664)
       at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:527)
       at org.apache.tomcat.util.net.MasterSlaveWorkerThread.run(MasterSlaveWorkerThread.java:112)
       at java.lang.Thread.run()V(Unknown Source)
      Caused by: org.jboss.cache.ReplicationException: rsp=sender=172.31.5.66:37857, retval=null, received=false, suspected=false
       at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:3747)
       at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:3672)
       at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:3770)
       at org.jboss.cache.interceptors.BaseRpcInterceptor.replicateCall(BaseRpcInterceptor.java:87)
       at org.jboss.cache.interceptors.ReplicationInterceptor.runPreparePhase(ReplicationInterceptor.java:143)
       at org.jboss.cache.interceptors.ReplicationInterceptor.invoke(ReplicationInterceptor.java:61)
       at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:67)
       at org.jboss.cache.interceptors.TxInterceptor.runPreparePhase(TxInterceptor.java:781)
       at org.jboss.cache.interceptors.TxInterceptor$LocalSynchronizationHandler.beforeCompletion(TxInterceptor.java:1043)
       at org.jboss.cache.interceptors.OrderedSynchronizationHandler.beforeCompletion(OrderedSynchronizationHandler.java:72)
       at org.jboss.cache.transaction.DummyTransaction.notifyBeforeCompletion(DummyTransaction.java:247)
       at org.jboss.cache.transaction.DummyTransaction.commit(DummyTransaction.java:54)
       at org.jboss.cache.transaction.DummyBaseTransactionManager.commit(DummyBaseTransactionManager.java:61)
       at org.jboss.web.tomcat.tc5.session.JBossCacheManager.endTransaction(JBossCacheManager.java:1038)
       at org.jboss.web.tomcat.tc5.session.JBossCacheManager.processSessionRepl(JBossCacheManager.java:1017)
       at org.jboss.web.tomcat.tc5.session.JBossCacheManager.storeSession(JBossCacheManager.java:637)
       at org.jboss.web.tomcat.tc5.session.InstantSnapshotManager.snapshot(InstantSnapshotManager.java:52)
       at org.jboss.web.tomcat.tc5.session.ClusteredSessionValve.invoke(ClusteredSessionValve.java:105)
       at org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.java:74)
       at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126)
       at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105)
       at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107)
       at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148)
       at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:869)
       at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:664)
       at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:527)
      Caused by: org.jboss.cache.lock.TimeoutException: timeout for 172.31.5.66:37857
       at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:3745)
       ... 25 more
      10:04:28,387 WARN [JBossCacheManager] JBossCacheManager.endTransaction(): rolling back transaction with exception: javax.transaction.RollbackException: outcome is false stats: 1
      


      I'm using JBoss 4.0.4 GA with JBoss Cache 1.3.0 and JGroups 2.2.9.2, Solaris 10 SPARC and JRockit VM 1.5.0_06. I changed the JGroups configuration in tc-cluster.sar/META-INF/jboss-service.xml based on the fc-fast.xml, as follows:

      <?xml version="1.0" encoding="UTF-8"?>
      
      <!-- ===================================================================== -->
      <!-- -->
      <!-- Customized TreeCache Service Configuration for Tomcat 5 Clustering -->
      <!-- -->
      <!-- ===================================================================== -->
      
      <server>
      
       <!-- ==================================================================== -->
       <!-- Defines TreeCache configuration -->
       <!-- ==================================================================== -->
      
       <!-- Note we are using TeeCacheAop -->
       <mbean code="org.jboss.cache.aop.TreeCacheAop"
       name="jboss.cache:service=TomcatClusteringCache">
      
       <depends>jboss:service=Naming</depends>
       <depends>jboss:service=TransactionManager</depends>
       <!-- We need the AspectDeployer to deploy our FIELD granularity aspects -->
       <depends>jboss.aop:service=AspectDeployer</depends>
      
       <!-- Name of cluster. Needs to be the same for all nodes in the
       cluster, in order to find each other
       -->
       <attribute name="ClusterName">Tomcat-${jboss.partition.name:Cluster}</attribute>
      
       <!--
       Isolation level : SERIALIZABLE
       REPEATABLE_READ (default)
       READ_COMMITTED
       READ_UNCOMMITTED
       NONE
       -->
       <attribute name="IsolationLevel">REPEATABLE_READ</attribute>
      
       <!-- Valid modes are LOCAL, REPL_ASYNC and REPL_SYNC
      
       If you use REPL_SYNC and a UDP-based ClusterConfig
       we recommend you comment out the FC (flow control)
       protocol in the ClusterConfig section below.
       -->
       <attribute name="CacheMode">REPL_SYNC</attribute>
      
       <!-- Configuration options for use with JBossCache 1.2.4 and later.
       Comment out and replace with the JBossCache 1.2.3 options below
       if you are using JBossCache version 1.2.3.1 or earlier.
      
       UseMarshalling
      
       Indicates whether to the cache should unmarshall objects replicated
       from other cluster nodes, or store them internally as a byte[]
       until a web app requests them. Must be "true" if session replication
       granularity "FIELD" is used in any webapp, otherwise "false" is
       recommended.
      
       InactiveOnStartup
      
       Whether or not the entire tree is inactive upon startup, only
       responding to replication messages after activateRegion() is
       called to activate one or more parts of the tree when a webapp is
       deployed. Must have the same value as "UseMarshalling".
      
       TransactionManagerLookupClass
      
       Make sure to specify BatchModeTransactionManager only!
       -->
       <attribute name="UseMarshalling">false</attribute>
       <attribute name="InactiveOnStartup">false</attribute>
       <attribute name="TransactionManagerLookupClass">org.jboss.cache.BatchModeTransactionManagerLookup</attribute>
      
       <!-- Configuration to use with JBossCache 1.2.3 and earlier.
       Uncomment and comment out the JBossCache 1.2.4 options above
       if you are using JBossCache version 1.2.3.1 or earlier.
      
       Any valid implementation of TransactionManagerLookup can be used.
      
       <attribute name="TransactionManagerLookupClass">org.jboss.cache.JBossTransactionManagerLookup</attribute>
       -->
      
       <!-- JGroups protocol stack properties. Can also be a URL,
       e.g. file:/home/bela/default.xml
       <attribute name="ClusterProperties"></attribute>
       -->
      
       <attribute name="ClusterConfig">
      
       <Config>
       <UDP bind_addr="${jboss.sync.bind.address}"
       mcast_send_buf_size="10000000"
       mcast_addr="${jboss.partition.udpGroup}"
       mcast_port="45577"
       tos="16"
       ucast_recv_buf_size="10000000"
       receive_on_all_interfaces="false"
       loopback="false"
       mcast_recv_buf_size="10000000"
       max_bundle_size="64000"
       max_bundle_timeout="30"
       use_incoming_packet_handler="false"
       use_outgoing_packet_handler="true"
       ucast_send_buf_size="10000000"
       ip_ttl="32"
       enable_bundling="true"/>
       <PING timeout="2000"
       down_thread="false"
       num_initial_members="3"/>
       <MERGE2 max_interval="10000"
       down_thread="false"
       min_interval="5000"/>
       <FD_SOCK srv_sock_bind_addr="${jboss.sync.bind.address}"
       down_thread="false"/>
       <VERIFY_SUSPECT timeout="1500" down_thread="false"/>
       <pbcast.NAKACK max_xmit_size="60000"
       down_thread="false"
       use_mcast_xmit="true"
       gc_lag="50"
       retransmit_timeout="300,600,1200,2400,4800"/>
       <UNICAST timeout="300,600,1200,2400,3600" down_thread="false"/>
       <pbcast.STABLE stability_delay="1000"
       desired_avg_gossip="5000"
       down_thread="false"
       max_bytes="250000"/>
       <VIEW_SYNC avg_send_interval="60000"
       down_thread="false" up_thread="false" />
       <pbcast.GMS print_local_addr="true"
       join_timeout="3000"
       down_thread="false"
       join_retry_timeout="2000"
       shun="true"/>
       <!--FC max_credits="1000000"
       down_thread="false"
       min_threshold="0.10"/-->
       <FRAG2 frag_size="60000" down_thread="false" up_thread="true"/>
       <!--COMPRESS down_thread="false"
       min_size="500"
       compression_level="3"
       up_thread="true"/-->
       <pbcast.STATE_TRANSFER down_thread="false" up_thread="false"/>
       </Config>
      
       </attribute>
      
       <!--
       Number of milliseconds to wait until all responses for a
       synchronous call have been received.
       -->
       <attribute name="SyncReplTimeout">5000</attribute>
      
       <!-- Max number of milliseconds to wait for a lock acquisition -->
       <attribute name="LockAcquisitionTimeout">15000</attribute>
      
       </mbean>
      
      </server>
      


      After a while of getting this exception, the server slow down and a receive the following message:

      16:19:38,401 WARN [TimeScheduler] task org.jgroups.protocols.TP$Bundler$BundlingTimer@ebb8f15c took 6783ms to execute, please check why it is taking so long. It is delaying other tasks
      16:21:50,139 WARN [TimeScheduler] task org.jgroups.protocols.pbcast.STABLE$StabilitySendTask@ebc73c3f took 6187ms to execute, please check why it is taking so long. It is delaying other tasks
      


      Am I missing something or doing something wrong?

      Thanks in advance,
      Fernando

        • 1. Re: Rollback during session replication
          brian.stansberry

          Server 172.31.5.66 is not responding to replication messages within 5 seconds. A couple of *possible* causes of this:

          1) You're under very high load and it's taking more than 5 seconds to process replications. Increasing SyncReplTimeout may help.

          2) You're not using sticky sessions, and the same session is being accessed simultaneously on both servers. The effect of this is that you have locking conflicts. If SyncReplTimeout were set to a value > LockAcquisitionTimeout, you would be seeing TimeoutException instead of ReplicationException. Still not what you want, but at least a more meaningful diagnostic. The solution there is to use sticky sessions.

          • 2. Re: Rollback during session replication
            fotero

            Yes, I was not using sticky sessions. With sticky sessions, everything is working properly.

            Thanks,
            Fernando