4 Replies Latest reply on Nov 29, 2007 6:32 AM by mircea.markus

    Cache loader questions

    aditsu

      Hi, I previously had a lot of issues with cache loaders, so I decided not to use them *at all*, in favor of manual preloading, listeners for persisting changes, and cache replication with memory state transfer.
      This has worked pretty well, however we're running into problems when trying to ensure failover and add/remove caches in a running cluster. It seems that JBC has a lot of built-in functionality (related to cache loaders) for dealing with this kind of situations, so it seems difficult and somewhat pointless to rewrite these features using different mechanisms. Therefore I'm (reluctantly) looking again at using a cache loader.

      Before I can do that, I need to understand some details about how cache loaders work and what they can/can't do. The idea is to use a custom cache loader that uses a database as a persistent store. Here are the most important questions:

      1. Is it possible to do reads and writes in batches? Obviously, if I have 1 million records and I want to preload them all, it's better to do one "SELECT *" rather than one "SELECT ID" followed by 1 million "SELECT * WHERE ID=?".
      Similarly, if I have 1 million updates to perform, it's better to group them in batches (asynchronously), and reduce the number of db transactions.

      2. When I add a new cache to the replicated cluster (with a shared cache loader), can it preload all the necessary data from the cache loader (not blocking the cluster in any way during this operation), and then get the latest changes from the cluster (*all* and *only* the changes) so that it is up to date?
      Specifically:
      2.1. Can writes (in the cluster and cache loader) happen while the new cache is loading data?
      2.2. Can the new cache get from the cluster all the changes that happened while it was loading data (from db) and also all the changes that happened before but were not flushed yet because of the async operation, but without getting any data that it already has (i.e. identical in cache and db)?

      3. I understand that with a shared cache loader, the cache that originates a change is also the one that writes it to the cache loader. What happens if the cache loader writes asynchronously, and that cache instance goes down after completing the transaction but before flushing the data to the db? Is it possible to have another cache write it instead?

      Thanks
      Adrian

        • 1. Re: Cache loader questions
          mircea.markus

           

          1. Is it possible to do reads and writes in batches? Obviously, if I have 1 million records and I want to preload them all, it's better to do one "SELECT *" rather than one "SELECT ID" followed by 1 million "SELECT * WHERE ID=?".

          JDBCCacheLoader works like that (starting with 2.0).

          Similarly, if I have 1 million updates to perform, it's better to group them in batches (asynchronously), and reduce the number of db transactions.

          Nice one, I've created http://jira.jboss.com/jira/browse/JBCACHE-1221 for this.


          2. When I add a new cache to the replicated cluster (with a shared cache loader), can it preload all the necessary data from the cache loader (not blocking the cluster in any way during this operation), and then get the latest changes from the cluster (*all* and *only* the changes) so that it is up to date?

          You can disable state transfer then and preload data on startup. If other caches write async then this won't work, though. We have some ideas of handling state transfer more efficiently, but not a clear date on when those will be implemented

          3. I understand that with a shared cache loader, the cache that originates a change is also the one that writes it to the cache loader.

          yes
          What happens if the cache loader writes asynchronously, and that cache instance goes down after completing the transaction but before flushing the data to the db? Is it possible to have another cache write it instead?

          Not. After all that's the drawback of having async writings, you're not gonna be notified whether they failed or not. If it's critical for you to know the data is persisted I think you should use sync replication.


          • 2. Re: Cache loader questions
            aditsu

            Salut, mersi pt. raspuns :)
            (Hi, thanks for your answer)

            1. This is good news, I'll look at the JDBCCacheLoader code
            For the writes, I already have a custom implementation.

            2. This was not answered properly, but from what you say it looks like at least 2.2 is not currently possible.

            3. I'm using sync replication anyway, at least for now, but the question was about the cache loader (which has to be async for performance reasons). I can implement this feature manually (to have each instance attempt to flush the changes from the whole cluster to db, asynchronously), but I was hoping JBC already had this option.

            • 3. Re: Cache loader questions
              aditsu

              Hi, I think you're wrong about point 1. I wrote a simple test:

              package aditsu;
              
              import java.io.IOException;
              import java.util.HashMap;
              import java.util.Map;
              
              import org.apache.log4j.BasicConfigurator;
              import org.jboss.cache.Cache;
              import org.jboss.cache.DefaultCacheFactory;
              import org.jboss.cache.Fqn;
              
              public class JDBCTest {
              
               public static void main(final String... args) throws IOException {
               BasicConfigurator.configure();
               Cache c = DefaultCacheFactory.getInstance().createCache("aditsu/cache-jdbc.xml");
               Map m = new HashMap();
               c.put(new Fqn("a"), m);
               c.put(new Fqn("b"), m);
               c.put(new Fqn("c"), m);
               c.stop();
               c.destroy();
               System.in.read();
               c.create();
               c.start();
               }
              }


              and this is my cache config:

              <?xml version="1.0" encoding="UTF-8"?>
              
              <server>
               <mbean code="org.jboss.cache.jmx.CacheJmxWrapper" name="jboss.cache:service=TreeCache">
               <attribute name="TransactionManagerLookupClass">org.jboss.cache.transaction.DummyTransactionManagerLookup</attribute>
               <attribute name="IsolationLevel">REPEATABLE_READ</attribute>
               <attribute name="CacheMode">LOCAL</attribute>
               <attribute name="CacheLoaderConfig">
               <config>
               <passivation>false</passivation>
               <preload>/</preload>
               <cacheloader>
               <class>org.jboss.cache.loader.JDBCCacheLoader</class>
               <properties>
               cache.jdbc.table.name=jbosscache
               cache.jdbc.table.create=true
               cache.jdbc.table.drop=false
               cache.jdbc.table.primarykey=jbosscache_pk
               cache.jdbc.fqn.column=fqn
               cache.jdbc.fqn.type=varchar(255)
               cache.jdbc.node.column=node
               cache.jdbc.node.type=bytea
               cache.jdbc.parent.column=parent
               cache.jdbc.driver=org.postgresql.Driver
               cache.jdbc.url=jdbc:postgresql://localhost/jbc
               cache.jdbc.user=postgres
               cache.jdbc.password=
               cache.jdbc.sql-concat=concat(1,2)
               </properties>
               <async>false</async>
               <fetchPersistentState>true</fetchPersistentState>
               <ignoreModifications>false</ignoreModifications>
               <purgeOnStartup>false</purgeOnStartup>
               </cacheloader>
               </config>
               </attribute>
               </mbean>
              </server>


              When I start the cache the 2nd time (after pressing enter), I get these log entries:

              5226 [main] DEBUG org.jboss.cache.loader.JDBCCacheLoader - executing sql: select node from jbosscache where fqn=? (/)
              5229 [main] DEBUG org.jboss.cache.CacheImpl.JBossCache-Cluster - cache mode is local, will not create the channel
              5229 [main] DEBUG org.jboss.cache.loader.CacheLoaderManager - preloading transient state from cache loader org.jboss.cache.loader.JDBCCacheLoader@39ab89
              5229 [main] DEBUG org.jboss.cache.loader.JDBCCacheLoader - executing sql: select fqn from jbosscache where parent=? (/)
              5232 [main] DEBUG org.jboss.cache.loader.JDBCCacheLoader - executing sql: select node from jbosscache where fqn=? (/a)
              5235 [main] DEBUG org.jboss.cache.loader.JDBCCacheLoader - executing sql: select fqn from jbosscache where parent=? (/a)
              5238 [main] DEBUG org.jboss.cache.loader.JDBCCacheLoader - executing sql: select node from jbosscache where fqn=? (/c)
              5241 [main] DEBUG org.jboss.cache.loader.JDBCCacheLoader - executing sql: select fqn from jbosscache where parent=? (/c)
              5245 [main] DEBUG org.jboss.cache.loader.JDBCCacheLoader - executing sql: select node from jbosscache where fqn=? (/b)
              5266 [main] DEBUG org.jboss.cache.loader.JDBCCacheLoader - executing sql: select fqn from jbosscache where parent=? (/b)

              So it's executing not one but TWO queries for every single node! That's definitely not what I want, and contradicts what you said.

              Adrian

              • 4. Re: Cache loader questions
                mircea.markus

                servus, ai dreptate, scuze :(
                you are right, my bad.

                The thing is that the cache loader itself know how to load data in batch, but the calling code does not use this. I've updated http://jira.jboss.com/jira/browse/JBCACHE-1221