3 Replies Latest reply on Apr 4, 2006 8:11 AM by Manik Surtani

    JBCACHE-525 and JBCACHE-526

    Elias Ross Master

      I wrote a JDBM CacheLoader, primarily for cheap companies who don't want to pay for Berkeley DB, and also because I was bored.

      The JdbmCacheLoader is now checked into CVS. It seems to perform a bit worse than the Berkeley DB for "machine gunning" inserts and deletes, and a bit more comparable for grouped change sets.

      Since I plan on "machine gunning" a lot of small changes, I took a look at the Async wrapper. It basically applied single changes in a separate thread. I made some improvements to it so I could batch changes in, and also optimize operations such as "put" which actually does a "get" to return the old value. Although I haven't considered all the implications :-) performance is way up.

      I had the following points for discussion:

      There's a couple of related changes that floated into CVS, which lead Manik Surtani wanted to comment on.

      How is prepare(TX, List, one_phase = true) different than put() ? The JDBCCacheLoader should have a way to apply multiple operations in a single DB transactions... Is that's what's happing?

      How should the default AsyncCacheLoader behave? I prefer having the settings I want for my application the default. :-)

      Although "Object AsyncCL.put" and "remove" will return null if you don't want it to return a value through a configuration setting, could we create a constant such as CacheLoader.UNKNOWN or something? This way, I wouldn't confuse the caller if we wanted to optimize ourselves.

        • 1. Re: JBCACHE-525 and JBCACHE-526
          Manik Surtani Master

          Hi Elias

          A number of points here, lets hope these don't get too incoherent. :-)

          1. Batching writes in berkeley db, jdbc and jdbm cache loaders

          I like the idea of trying to maintain an open connection to the underlying db. Perhaps something prepare() could do when calling modifications individually. So perhaps a private version of put() (_put()?) which assumes an open db connection while the public put() would open one and then call the private method. Even if the connection comes from a pool (as in the case of JDBC cache loaders), much quicker to do a single lookup at the start of prepare().

          2. JDBC Cache Loader: Multiple operations in single db transaction

          We did have a discussion thread on this, perhaps you'd like to participate on this here: http://www.jboss.com/index.html?module=bb&op=viewtopic&t=76590

          3. AsyncCacheLoader

          I presume the 'defaults' you speak of here pertain to batching writes. How does this work, do you queue up writes and trigger the writer process based on the queue size and queue age, whichever trigger fires first? I'd use a sensible default of an unlimited queue size and a queue age trigger of say 10 seconds.

          Could you please document this as a JIRA task, if you haven't already?

          4. CacheLoader.put() and remove() to return an UNKNOWN constant if configured to do so

          Let me look through the codebase and see where these return values are used in the first place. The CacheLoader should not be directly accessed by client code anyway.

          Cheers,
          Manik




          • 2. Re: JBCACHE-525 and JBCACHE-526
            Elias Ross Master

             

            "manik.surtani@jboss.com" wrote:
            Hi Elias

            A number of points here, lets hope these don't get too incoherent. :-)

            1. Batching writes in berkeley db, jdbc and jdbm cache loaders

            I like the idea of trying to maintain an open connection to the underlying db. Perhaps something prepare() could do when calling modifications individually. So perhaps a private version of put() (_put()?) which assumes an open db connection while the public put() would open one and then call the private method. Even if the connection comes from a pool (as in the case of JDBC cache loaders), much quicker to do a single lookup at the start of prepare().


            http://jira.jboss.com/jira/browse/JBCACHE-529

            I think it would be a big win without too much work actually. For the JDBM implemenetation, I created two versions of the modification methods (though I used 0 instead of _).

            "manik.surtani@jboss.com" wrote:

            2. JDBC Cache Loader: Multiple operations in single db transaction

            We did have a discussion thread on this, perhaps you'd like to participate on this here: http://www.jboss.com/index.html?module=bb&op=viewtopic&t=76590


            Same discussion point as 1. if I am not mistaken.

            "manik.surtani@jboss.com" wrote:

            3. AsyncCacheLoader

            I presume the 'defaults' you speak of here pertain to batching writes. How does this work, do you queue up writes and trigger the writer process based on the queue size and queue age, whichever trigger fires first? I'd use a sensible default of an unlimited queue size and a queue age trigger of say 10 seconds.

            Could you please document this as a JIRA task, if you haven't already?


            You can take a look at the code. Basically, the behavior is quite simple:
             private final List mods = new ArrayList(batchSize);
            ...
             private void run0() throws InterruptedException {
             mods.clear();
             Object o = queue.take();
             addTaken(o); // appends to mods
             while (mods.size() < batchSize)
             {
             o = queue.poll(pollWait);
             if (o == null)
             break;
             addTaken(o);
             }
            


            I'm going to keep tracking this as http://jira.jboss.com/jira/browse/JBCACHE-526 .

            Basically, the algorithm keeps fetching data until poll(100 ms) returns null (nothing found) or the batch size is reached. There's no "age" per se, but unless something new shows up quickly, the loop breaks out and the writes are all applied. It would probably be better to poll for less each time, either by calculating the relative since the loop started, or by (say) halving the poll time. So, first time is 100 ms, second 50 ms, 25 etc... So, at most it will take 200 ms to gather a batch.