4 Replies Latest reply on Jun 28, 2007 5:37 AM by uuderzo

    Apparently losing data

    uuderzo

      I'm using JBC to store user sessions in a rich client application, running with JBoss/OC4J ASs.

      Cannot use right the JBC of JBoss installation because of the need to work also on OC4J so we installed the JBC jars as our application libraries.

      Currently the cache is working in LOCAL mode. When a user logs on its session object is stored and every time this session object is requested it is retrieved from the cache, updated with the last access time, stored in the cache and returned to the caller.

      This seems to work well BUT sometimes the mechanism seems to break and no more work until the AS (in my case JBoss) is shutdown and restarted. The problem is that the data is put into the cache but when retrieved it is NULL. No errors were reported, simply the data seems to disappear from the cache.

      On some production environments this trouble can happen abount once a month.

      There is no eviction policy configured, so I cannot guess what can be the trouble.

      Here is how I access the cache:

      private static TreeCacheMBean getJBossCache() throws AutEXCSessionException {
       try {
       if( staticJBossCache != null ) return staticJBossCache;
      
       try {
       InitialContext ctx = new InitialContext();
       Object value = ctx.lookup( "AutJBossCache" ); // se non lo trova, mi da un not bound!!!
       staticJBossCache = (TreeCacheMBean)value;
       if( staticJBossCache != null ) {
       return staticJBossCache;
       }
       } catch( NamingException exc ) { /* non fa nulla, semplicemente ignora e tira dritto */ }
      
       staticJBossCache = new TreeCache();
      
       // Leggo le proprieta' di configurazione dal file.
       PropertyConfigurator config = new PropertyConfigurator(); // configure tree cache. Needs to be in the classpath
       config.configure(
       staticJBossCache,
       CdtBLGPersistenceConfiguration.getInstallationPath() + "jbosscache-service.xml"
       );
      
       // Avvio il service
       staticJBossCache.createService();
       staticJBossCache.startService() ; // kick start tree cache
       return staticJBossCache;
       }
       catch( ConfigureException exc ) { throw new AutEXCSessionException( new SagaException( exc )); }
       catch( Exception exc ) { throw new AutEXCSessionException( new SagaException( exc )); }
       }


      where staticJBossCache is a static member of my class.

      Follows the configuration file:
      <?xml version="1.0" encoding="UTF-8"?>
      
      <!-- ===================================================================== -->
      <!-- -->
      <!-- Sample TreeCache Service Configuration -->
      <!-- -->
      <!-- ===================================================================== -->
      
      <server>
       <classpath codebase="./lib" archives="jboss-cache.jar, jgroups.jar"/>
      
      
       <!-- ==================================================================== -->
       <!-- Defines TreeCache configuration -->
       <!-- ==================================================================== -->
       <mbean
       code="org.jboss.cache.TreeCache"
       name="jboss.cache:service=AutJBossTreeCache">
      
       <depends>jboss:service=Naming</depends>
       <depends>jboss:service=TransactionManager</depends>
      
       <!-- Configure the TransactionManager -->
       <attribute name="TransactionManagerLookupClass">org.jboss.cache.GenericTransactionManagerLookup</attribute>
      
       <!--
       Isolation level : SERIALIZABLE
       REPEATABLE_READ (default)
       READ_COMMITTED
       READ_UNCOMMITTED
       NONE
       -->
       <attribute name="IsolationLevel">NONE</attribute>
      
       <!-- Valid modes are LOCAL, REPL_ASYNC and REPL_SYNC.
       Use REPL_SYNC for JBoss clustering -->
       <attribute name="CacheMode">LOCAL</attribute>
      
       <!-- Just used for async repl: use a replication queue -->
       <attribute name="UseReplQueue">false</attribute>
      
       <!-- Replication interval for replication queue (in ms) -->
       <attribute name="ReplQueueInterval">0</attribute>
      
       <!-- Max number of elements which trigger replication -->
       <attribute name="ReplQueueMaxElements">0</attribute>
      
       <!-- Name of cluster. Needs to be the same for all clusters, in order
       to find each other -->
       <attribute name="ClusterName">AutJBossTreeCache-DefaultCluster</attribute>
      
       <!-- JGroups protocol stack properties. Can also be a URL,
       e.g. file:/home/bela/default.xml
       <attribute name="ClusterProperties"></attribute> -->
      
       <attribute name="ClusterConfig">
       <config>
       <!-- UDP: if you have a multihomed machine,
       set the bind_addr attribute to the appropriate NIC IP address,
       e.g bind_addr="192.168.0.2" -->
       <!-- UDP: On Windows machines, because of the media sense feature
       being broken with multicast (even after disabling media sense)
       set the loopback attribute to true -->
       <UDP
       mcast_addr="228.1.2.3"
       mcast_port="48866"
       ip_ttl="64" ip_mcast="true"
       mcast_send_buf_size="150000"
       mcast_recv_buf_size="80000"
       ucast_send_buf_size="150000"
       ucast_recv_buf_size="80000"
       loopback="true"/>
       <PING
       timeout="2000"
       num_initial_members="3"
       up_thread="false"
       down_thread="false"/>
       <MERGE2
       min_interval="10000"
       max_interval="20000"/>
       <FD
       shun="true"
       up_thread="true"
       down_thread="true" />
       <VERIFY_SUSPECT
       timeout="1500"
       up_thread="false"
       down_thread="false"/>
       <pbcast.NAKACK
       gc_lag="50"
       retransmit_timeout="600,1200,2400,4800"
       max_xmit_size="8192"
       up_thread="false"
       down_thread="false"/>
       <UNICAST
       timeout="600,1200,2400"
       window_size="100"
       min_threshold="10"
       down_thread="false"/>
       <pbcast.STABLE
       desired_avg_gossip="20000"
       up_thread="false"
       down_thread="false"/>
       <FRAG
       frag_size="8192"
       down_thread="false"
       up_thread="false"/>
       <pbcast.GMS
       join_timeout="5000"
       join_retry_timeout="2000"
       shun="true"
       print_local_addr="true"/>
       <pbcast.STATE_TRANSFER
       up_thread="true"
       down_thread="true"/>
       </config>
       </attribute>
      
       <!-- Whether or not to fetch state on joining a cluster -->
       <attribute name="FetchStateOnStartup">true</attribute>
      
       <!-- The max amount of time (in milliseconds) we wait until the
       initial state (ie. the contents of the cache) are retrieved from
       existing members in a clustered environment -->
       <attribute name="InitialStateRetrievalTimeout">5000</attribute>
      
       <!-- Number of milliseconds to wait until all responses for a
       synchronous call have been received. -->
       <attribute name="SyncReplTimeout">10000</attribute>
      
       <!-- Max number of milliseconds to wait for a lock acquisition -->
       <attribute name="LockAcquisitionTimeout">15000</attribute>
      
       <!-- Name of the eviction policy class. -->
       <attribute name="EvictionPolicyClass"></attribute>
      
       <!--
       <attribute name="CacheLoaderClass">org.jboss.cache.loader.bdbje.BdbjeCacheLoader</attribute>
       <attribute name="CacheLoaderConfig">
       location=c:\\tmp\\bdbje
       </attribute>
       <attribute name="CacheLoaderShared">true</attribute>
       <attribute name="CacheLoaderPreload">/</attribute>
       -->
      
       <!--
       <attribute name="CacheLoaderClass">org.jboss.cache.loader.FileCacheLoader</attribute>
       <attribute name="CacheLoaderConfig">
       location=c:\\tmp
       </attribute>
       <attribute name="CacheLoaderShared">true</attribute>
       <attribute name="CacheLoaderPreload">/</attribute>
       -->
       </mbean>
      
       <!-- Uncomment to get a graphical view of the TreeCache MBean above -->
       <!-- <mbean code="org.jboss.cache.TreeCacheView" name="jboss.cache:service=TreeCacheView">-->
       <!-- <depends>jboss.cache:service=TreeCache</depends>-->
       <!-- <attribute name="CacheService">jboss.cache:service=TreeCache</attribute>-->
       <!-- </mbean>-->
      
       <!--mbean
       code="org.jboss.invocation.jrmp.server.JRMPProxyFactory"
       name="mydomain:service=proxyFactory,type=jrmp,target=factory"
       >
       <attribute name="InvokerName">jboss:service=invoker,type=jrmp</attribute>
       <attribute name="TargetName">jboss.cache:service=SicraCache</attribute>
       <attribute name="JndiName">AutJBossCache</attribute>
       <attribute name="InvokeTargetMethod">true</attribute>
       <attribute name="ExportedInterface">org.jboss.cache.TreeCacheMBean</attribute>
       <attribute name="ClientInterceptors">
       <iterceptors>
       <interceptor>org.jboss.proxy.ClientMethodInterceptor</interceptor>
       <interceptor>org.jboss.proxy.SecurityInterceptor</interceptor>
       <interceptor>org.jboss.invocation.InvokerInterceptor</interceptor>
       </iterceptors>
       </attribute>
       <depends>jboss:service=invoker,type=jrmp</depends>
       <depends>jboss.cache:service=AutJBossTreeCache</depends>
       </mbean-->
      </server>


      Can somebody give an hint to solve my headache?

      Thanks in advance.

        • 1. Re: Apparently losing data
          genman



          This seems to work well BUT sometimes the mechanism seems to break and no more work until the AS (in my case JBoss) is shutdown and restarted. The problem is that the data is put into the cache but when retrieved it is NULL. No errors were reported, simply the data seems to disappear from the cache.


          So, you mean that calls to TreeCache.put(x,y); stop working? How do you know this?

          I assume your problem is the data is somehow not being updated in the BDBJE store (or the store is deleted), and upon restart the cache is missing data. Is this the problem?

          By the way, putting important data in /tmp in UNIX would be a bad plan.


          • 2. Re: Apparently losing data
            uuderzo

             

            So, you mean that calls to TreeCache.put(x,y); stop working? How do you know this?


            I don't mean that those calls stop working, but something starts going wrong.

            I try to explain better the flow of data in tree cache:

            1. User logon => TreeCache.put( x, y ), where x is the user session "ticket" and y is the user session data.

            2. Business logic needs user session data => TreeCache.get( x, y ), y is updated with last call time information. After that => TreeCache.put( x, y ). y is returned

            3. Iterate step 2 for the entire life of a user session

            4. When user logs off => TreeCache.remove( x )

            This is a simplified flow but this is the way our system works. And it works well until something breaks and the step 2 starts returning NULL instead of user session data. I don't understand if the data is corrupted when putting into cache in step 1, or when re-putting in step 2 or because tree cache "forgets" the node and starts returning NULL.

            The problem does not happen frequently (about once a month) so it's difficult for me to track the data path and understand what's happening.

            What I know is that in 99,9% cases it works great and that if I can get an object from the cache and set a field of this object, it must be not null. After updating the field I re-put into the cache and cycle this.

            Another thing that I know is that the system can work well for weeks but once the malfunction appears it's no more possible to use it. I mean that after the first malfunctioning session, forther new sessions get the same malfunction. Restarting the application server solves the problem, until it reappears in near future. So it lets me think about something going wrong into the cahe.

            I just wish to know if my approach has something "buggy" or if there is a real possibility that the cache starts forgetting after "something" happened, even without eviction policies installed.

            Thanks again.

            • 3. Re: Apparently losing data
              genman

              If there is no eviction policy, then regardless of the cache loader configuration, all the cache data is kept in memory. Now, the data might fail to be stored using a put().

              I would suggest creating a separate log4j appender that captures all the debug logs for org.jboss.cache, that rolls over (so you don't fill your disk). Then, when you receive back a null, alert your operator, or automatically turn off debug. (This can be done using calls to the log4j API.) You should be able to see what's happening then.

              Perhaps it's your application.

              • 4. Re: Apparently losing data
                uuderzo

                 

                Perhaps it's your application.


                Maybe... but the behaviour looks weird.
                I'll try to put more debug infos and will wait the misfunction to appear one more time.

                I'll put more infos on the forum when I'll discover something new.

                Meantime, if someone has some hint, I'll be glat to read it.

                Thanks.