6 Replies Latest reply on Oct 23, 2009 3:00 PM by Brian Stansberry

    Failure when cleaning up HTTP Session in JBoss AS Cluster?

    Richard Taylor Newbie

      Let me know if this should be posted somewhere else.

      JBoss 5.1.0
      JBoss Cache 3.2.1
      Three servers in cluster, HTTP Session replication, UDP, total replication, passivation enabled

      We recently rolled out two changes to our system, HTTP Session passivation and an upgrade to JBoss Cache 3.2.1. The changes were tested for several days under load on two other systems, but now that we've deployed to our primary data center we've run into an issue.

      Things ran fine for about 22 hours, then server A and server B went into a loop where each server getting the following two stack traces repeatedly. This happened twice in 24 hours. We've not disabled session passivation and have not seen the issue.

      In order to get the servers out of the loop we had to restart jboss on both servers (one of them twice). JBoss was still responding but the cluster was in an obviously degraded state.

      2009-10-20 15:40:29,709 ERROR [org.jboss.web.tomcat.service.session.distributedcache.impl.jbc.AttributeBasedJBossCacheService] (ContainerBackgroundProcessor[StandardEngine[jboss.web]]) IOException occurred unmarshalling value
      java.io.OptionalDataException
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1349)
       at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
       at javolution.util.FastMap.readObject(Unknown Source)
       at sun.reflect.GeneratedMethodAccessor362.invoke(Unknown Source)
       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
       at java.lang.reflect.Method.invoke(Method.java:597)
       at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:974)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1849)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1753)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
       at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1947)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1871)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1753)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
       at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
       at org.jboss.ha.framework.server.SimpleCachableMarshalledValue.get(SimpleCachableMarshalledValue.java:94)
       at org.jboss.web.tomcat.service.session.distributedcache.impl.jbc.AbstractJBossCacheService.getUnMarshalledValue(AbstractJBossCacheService.java:660)
       at org.jboss.web.tomcat.service.session.distributedcache.impl.jbc.AttributeBasedJBossCacheService.getSessionAttributes(AttributeBasedJBossCacheService.java:166)
       at org.jboss.web.tomcat.service.session.distributedcache.impl.jbc.AbstractJBossCacheService.getDistributableSessionData(AbstractJBossCacheService.java:581)
       at org.jboss.web.tomcat.service.session.distributedcache.impl.jbc.AbstractJBossCacheService.getSessionData(AbstractJBossCacheService.java:364)
       at org.jboss.web.tomcat.service.session.JBossCacheManager.loadSession(JBossCacheManager.java:1832)
       at org.jboss.web.tomcat.service.session.JBossCacheManager.findSession(JBossCacheManager.java:489)
       at org.jboss.web.tomcat.service.session.JBossCacheManager.processExpirationPassivation(JBossCacheManager.java:1603)
       at org.jboss.web.tomcat.service.session.JBossManager.backgroundProcess(JBossManager.java:385)
       at org.apache.catalina.core.ContainerBase.backgroundProcess(ContainerBase.java:1327)
       at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1612)
       at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1621)
       at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1621)
       at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.run(ContainerBase.java:1601)
       at java.lang.Thread.run(Thread.java:619)
      
      2009-10-20 15:40:29,713 ERROR [org.jboss.web.tomcat.service.session.JBossCacheManager.] (ContainerBackgroundProcessor[StandardEngine[jboss.web]]) processExpirationPassivation(): failed handling unloaded session O-lgMwkUSLW1vVpTRQzExg__
      java.lang.NullPointerException
       at java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:881)
       at java.util.concurrent.ConcurrentHashMap.putAll(ConcurrentHashMap.java:909)
       at org.jboss.web.tomcat.service.session.ClusteredSession.populateAttributes(ClusteredSession.java:1661)
       at org.jboss.web.tomcat.service.session.ClusteredSession.update(ClusteredSession.java:1120)
       at org.jboss.web.tomcat.service.session.JBossCacheManager.loadSession(JBossCacheManager.java:1835)
       at org.jboss.web.tomcat.service.session.JBossCacheManager.findSession(JBossCacheManager.java:489)
       at org.jboss.web.tomcat.service.session.JBossCacheManager.processExpirationPassivation(JBossCacheManager.java:1603)
       at org.jboss.web.tomcat.service.session.JBossManager.backgroundProcess(JBossManager.java:385)
       at org.apache.catalina.core.ContainerBase.backgroundProcess(ContainerBase.java:1327)
       at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1612)
       at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1621)
       at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1621)
       at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.run(ContainerBase.java:1601)
       at java.lang.Thread.run(Thread.java:619)
      


      It appears that both servers were trying to clean up the exact same "unloaded" session at the same time. That might be normal, but something in the session data seemed to have gotten corrupted? It's not clear yet, I'm currently looking at the JBossCacheManager code.

      Relevant configs:
      From jboss-web.xml
      <passivation-config>
       <use-session-passivation>true</use-session-passivation>
       <passivation-min-idle-time>-1</passivation-min-idle-time>
       <passivation-max-idle-time>300</passivation-max-idle-time>
      </passivation-config>
      


      From jboss-cache-manager-jboss-beans.xml
      <property name="cacheLoaderConfig">
       <bean class="org.jboss.cache.config.CacheLoaderConfig">
       <!-- Do not change these -->
       <property name="passivation">true</property>
       <property name="shared">false</property>
      
       <property name="individualCacheLoaderConfigs">
       <list>
       <bean class="org.jboss.cache.loader.FileCacheLoaderConfig">
       <!-- Where passivated sessions are stored -->
       <property name="location">${jboss.server.data.dir}${/}session</property>
       <!-- Do not change these -->
       <property name="async">false</property>
       <property name="fetchPersistentState">true</property>
       <property name="purgeOnStartup">true</property>
       <property name="ignoreModifications">false</property>
       <property name="checkCharacterPortability">false</property>
       </bean>
       </list>
       </property>
       </bean>
       </property>
      



      Has anyone seen this issue? I see other people referencing java.io.OptinalDataException in the context of JBoss, but nothing that sounds like this exact same issue. Any input would be greatly appreciated, we'd really like to get HTTP session passivation enabled again.



        • 1. Re: Failure when cleaning up HTTP Session in JBoss AS Cluste
          Brian Stansberry Master

          Please clarify what you mean by "went into a loop." I want to make sure what you saw happen is what I think would happen.

          What I think is:

          1) The ContainerBackgroundProcessor runs every 10 seconds.
          2) When it operates on JBossCacheManager it loops through a set of passivated sessions to see if they are overaged and need to be expired. This fails on a session as shown in the logging in your post.

          If whatever condition caused that failure also exists in multiple other sessions, you'll get a lot of such logging; i.e. a loop over the affected sessions.

          When the ContainerBackgroundProcessor runs again 10 seconds later, the logging will repeat.

          • 2. Re: Failure when cleaning up HTTP Session in JBoss AS Cluste
            Brian Stansberry Master

            The core issue here is to figure out the cause of the java.io.OptionalDataException. What's being deserialized here is a value object your application is putting into the session via setAttribute().

            The handling of this by the AS could certainly be better.

            1) When the deserialization failure occurs, it's being handled in a poor manner:

            https://jira.jboss.org/jira/browse/JBCLUSTER-243

            2) The background processing logic shouldn't keep trying to work with the session cleanly for ever; it should detect a session with a recurring problem and just hard delete it:

            https://jira.jboss.org/jira/browse/JBAS-7397

            • 3. Re: Failure when cleaning up HTTP Session in JBoss AS Cluste
              Brian Stansberry Master

              You can experiment a bit to try to figure out what the problem attribute is. On your classpath you'd need to have jboss-ha-client.jar and jboss-ha-server-api.jar (found in $JBOSS_HOME/common/lib) as well as jboss-common-core.jar and jboss-logging-spi.jar (found in $JBOSS_HOME/lib).

              Then, in pseudo-code:

              import org.jbossha.framework.server.*;
              .
              Object toTest = ... whatever attribute value you want to test
              
              SimpleCachableMarshelledValue mv = new SimpleCachableMarshalledValue(toTest);
              mv.toByteArray(); // internally serializes toTest
              mv.get(); // deserializes


              • 4. Re: Failure when cleaning up HTTP Session in JBoss AS Cluste
                Richard Taylor Newbie

                After looking closer at the logs, the exceptions were being logged every 10 seconds, not in a tight loop like I had originally thought. It felt like a tight loop while watching two production servers throw new errors ;)

                So it appears that it was a single session that got corrupted and the ContainerBackgroundProcessor was trying to clean up the same bad session on 2 of the 3 servers in the cluster. It's unclear how the third did not get caught up in the problem as well.

                Finding something invalid that is put into our sessions is a bit of a needle in a haystack, it's a fairly large app. We've been doing session replication (with failover / deserialization of replicated sessions) working well for several month. It's just when we turned on passivation that this error occurs.

                I would think that if a session were serialized ok/without errors (granted errors might be happening and not caught on serialization), that it could be deserialized ok. To me this implies that it was corrupted / changed while in its serialized form.

                Is it possible that using the org.jboss.cache.loader.FileCacheLoader could cause such corruption of a session? The docs do mention using caution with this in high load environments. I'm currently working on testing the JDBC based loader for passivation.

                Thanks for writing up those two jira issues, those directly describe what we ran into.

                • 5. Re: Failure when cleaning up HTTP Session in JBoss AS Cluste
                  Richard Taylor Newbie

                  Just to clarify, we saw this issue twice, both within two hours. The following lines might help describe what happened in terms of which servers got caught up in the problem. Just want to communicate that it was a different session each time, but in each failure, two servers were trying to clean up the same session if that makes sense.

                  First time, i.e. 3:40pm
                  Server A - failing to clean up session "ABC"
                  Server B - failing to clean up session "ABC"
                  Server C - seemingly OK, no errors being logged

                  Second time, i.e. 5:05pm
                  Server A - seemingly OK, no errors being logged
                  Server B - failing to clean up session "XYZ"
                  Server C - failing to clean up session "XYZ"

                  • 6. Re: Failure when cleaning up HTTP Session in JBoss AS Cluste
                    Brian Stansberry Master

                    Re: FileCacheLoader, I think it would be strange if some corruption happened to the same session on 2 different systems.

                    The limitiations of the FCL are:

                    * Due to the way the FileCacheLoader represents a tree structure on disk (directories and files) traversal is inefficient for deep trees.
                    * Usage on shared filesystems like NFS, Windows shares, etc. should be avoided as these do not implement proper file locking and can cause data corruption.
                    * Usage with an isolation level of NONE can cause corrupt writes as multiple threads attempt to write to the same file.
                    * File systems are inherently not transactional, so when attempting to use your cache in a transactional context, failures when writing to the file (which happens during the commit phase) cannot be recovered.

                    I assume you aren't using a shared filesystem for the session store. The isolation level NONE caveat doesn't apply. The problem with not being able to recover if a write failure occurs in a tx is semi-relevant, but I'd expect then you'd have seen some errors in the logs when the session was persisted.

                    Re: the pattern of failures, how close is the timing of the logging between the servers?

                    You're using buddy replication, right? That would explain only 2 servers being involved -- the server handling the session and its buddy.