6 Replies Latest reply on Feb 12, 2006 7:27 PM by Brian Stansberry

    Consider removing snapshot mode from http session replicatio

    Brian Stansberry Master

      Discussion thread for JBAS-2447.

      Want to open a discussion of the idea of removing the snapshot mode configuration from http session replication, and just using instant mode. This came up in a recent public clustering training Bela and I did.

      Since we moved to using JBossCache, asynchronous replication can be achieved via using REPL_ASYNC in the cache. Queuing can be achieved by using a replication queue in the cache. Message bundling at the JGroups level is also available. So, having another configuration option to achieve largely the same purpose seems like overkill.

      Pro's of removing this option:

      1) Any exceptions thrown in the session replication layer can *potentially* be communicated to the user via an exception on the web request. Need to test this, as by the time the replication code kicks in the request response may already be committed.

      2) Removes another config option.

      Con's of removing this option:

      1) Removes a config option some people may be accustomed to. Have to train users that REPL_ASYNC largely serves the same purpose. Because of this I wouldn't want to remove this option in the 4.0.x series, only 5x.

      2) There is some local overhead associated with session replication (i.e. marshalling session data and storing in the cache). Removing "interval" mode forces the time for this operation into the request response time. This could be significant if another thread has locks on the session's cache nodes. Typically there shouldn't be lock conflicts, as only one thread would be writing to a session.

      3) If we decided to communicate replication problems to the end user (by having Tomcat return a 500), some applications may not want this; i.e. they consider session replication a low priority issue and don't want users seeing errors because of this. So, we might have to add a config switch to prevent this -- and now we have another config switch :)

      I started this post thinking we should get rid of "interval" mode. As I've written I've come to the opinion that we need to check if a session repl failure can be communicated to the user via an HTTP 500. If not, there is no net benefit in getting rid of the option. If so, I'd say get rid of interval mode and add a switch to configure whether session repl problems should result in a 500.

        • 1. Re: Consider removing snapshot mode from http session replic
          Adrian Brock Master

          If they are using "snapshot" then there is already an expectation that it is asynchronous.
          Just keep the config option, but add a warning such that if somebody configures it,
          it tells them how to configure the replacement REPL_ASYNC,

          e.g. here's some code from JCA

           protected void startService() throws Exception
           if (transactionManagerService != null)
           tm = (TransactionManager)getServer().getAttribute(transactionManagerService, "TransactionManager");
           log.warn("Please change your datasource setup to use <depends optional-attribute-name\"TransactionManagerService\">jboss:service=TransactionManager</depends>");
           log.warn("instead of <attribute name=\"TransactionManager\">java:/TransactionManager</attribute>");
           log.warn("Better still, use a *-ds.xml file");
           tm = (TransactionManager)new InitialContext().lookup(tmName);

          • 2. Re: Consider removing snapshot mode from http session replic
            Bela Ban Master

            I agree with Adrian, let's do the warning and keep the option for 1-2 releases, then dump it. Similar to @deprecated

            • 3. Re: Consider removing snapshot mode from http session replic
              Brian Stansberry Master

              Discussed this with Ben a week or so ago and he concurred with deprecating snapshot mode.

              • 4. Re: Consider removing snapshot mode from http session replic
                Brian Stansberry Master

                Following some recent load-testing work, I no longer advocate deprecating snapshot-mode "interval" and propose removing in 4.0.4 final the "deprecated" message that was added in RC1.

                Session replication involves making a serialized byte[] copy of the session object and sticking it in a TreeCache, which replicates it. Adding any kind of asynchronous behavior in the process of getting an already created byte[] out on the wire risks an OOM condition. As sessions are accessed, byte arrays are created; the risk is that these will be created faster than they can be transmitted, and will pile up in a queue, causing an OOM. In recent tests with large (1MB) sessions and a replication queue, this problem was clearly seen; even when a bound was placed on the replication queue, OOM conditions still occurred.

                Snapshot-mode "interval" is the one mechanism for moving the replication process off the request thread that doesn't have this weakness, as the asynchronous aspect is prior to the creation of the byte[]. With interval mode, when a request returns, a reference to the session is simply placed in a map. The only data generated by this process is the map entry. When the interval timer executes, the timer thread iterates through the map entries, telling the session manager to replicate the sessions one by one. The timer thread is the only thread creating the byte arrays.

                With interval snapshot mode, no problems were seen in the load tests that failed with an OOM when the replication queue was used.

                As an aside, JGroups based mechanisms for replacing interval mode (i.e. message bundling) may not be appropriate in the long run. Our intent is to move towards a single JGroups channel, with multiple services on top of it. Message bundling is less appropriate in such a config, as messags from different services will be bundled together.

                • 5. Re: Consider removing snapshot mode from http session replic
                  Ben Wang Master

                  OK, you are saying that interval snapshot mode will throttle the replication load by only allowing 1 thread to process it.

                  • 6. Re: Consider removing snapshot mode from http session replic
                    Brian Stansberry Master

                    Yes, that's right.

                    The danger with interval mode is if there is a high load lots of "dirty" sessions could pile up in the map, waiting for the interval timer thread to replicate them. It could be a long time before they get processed. This of course is a danger with any approach that makes replication asynchronous from the http request. But with interval mode you won't have OOM issues.

                    Actually there's a side benefit to interval mode. Say the interval timer thread is very busy and it takes 10 secs before a given session gets pulled from the map and replicated. If the user visits the session again during those 10 secs, only the session state following the second visit will be replicated.