5 Replies Latest reply on Apr 7, 2006 6:48 AM by Manik Surtani

    Batch processing in JBossCache

    Ben Wang Master

      I know we have been discussing this before without a good solution. But I want to raise it here 1) just in case something useful comes up, 2) save for future reference.

      Currently, we only provide transaction that acts as a batch process. E.g., tx.begin(), ..., tx.commit() will batch everything in between in a modification list. But the problem is it is a two-phase operation, meaning we have a prepare and a commit phases.

      So either 1) user needs to do an explicit transaction management, or 2) we do it implicitly (e.g., PojoCache and http session repl), of which is a bit awkard and is problematic with jta interaction.

      What would be nice is a batch operation semantics, e.g., batch.begin(), ..., batch.end() and everything in between will also be sent in a modification list. However, batch differs from tx in that it is only one phase, that is, has a prepare phase then a *local* commit phase. This can speed up the performance potentially.

      So how would we do it? I see two possibilities: 1) integrate into current txinterceptor such that we distinguish between tx or batch there. And if it is batch, we do *local* commit only, 2) has a separate batchinterceptor and make these two interceptor mutually exclusive. So only one mode is allowed per cache instance.

      Thoughts?

        • 1. Re: Batch processing in JBossCache
          Manik Surtani Master

          The TX Interceptor currently has a concept of wrapping calls in implicit tx's for optimistic locking.

          I.e., if there is no tx and optimistic locking is true, then implicitTx is set to true - and this causes the TxInterceptor to create a tx, process the call, and when returning, commit the tx.

          Not quite what you are looking for (I'd imagine batching would span several method invocations to be useful) but there may be some overlaps.

          To be honest, I do think batching can be quite useful - I'm sure a lot of people use txs (unnecessarily weighty) when all they really need is a simple batching process. We could refactor out the batching currently done into a top-level construct. The Tx Interceptor would automatically set batching to true if a tx is used. Batching would also have to be enabled and disabled explicitly for this to work though:

          cache.beginBatch();
          cache.put()....
          cache.commitBatch();
          


          If a tx is used, tx.commit() would just call cache.commitBatch(), and watch for errors to roll back on ...

          • 2. Re: Batch processing in JBossCache
            Bela Ban Master

            Yuck ! An additonal 2 methods ? Aren't we fat enough yet ? :-)

            • 3. Re: Batch processing in JBossCache
              Ben Wang Master

              Yes, I do think lots of folks don't need it to be transactional (ie, 2 phase protocol). So I certainly would vote for that in my PojoCache impl. :-)

              What about the tx and batch interaction though? I am saying example like:

              tx.begin();
              ...
              cache.batchBegin();
              ...
              cacht.batchCommit();
              ...
              tx.commit();

              Finally, to address Bela's question. Currently we probably have like 200 apis. Add 2 would be nothing. :-) But seriously, we can delegate the API out to a BatchManager (similar to TransactionManager). That way, we will have cleaner interface.

              • 4. Re: Batch processing in JBossCache
                Bela Ban Master

                Can't we move this into an implementation of (future, JSR 107 compliant) org.jboss.cache.Cache ?
                I'm afraid that we will duplicate code for transactions, e.g. we need to maintain the modifications per 'batch', which we already do for transactions...

                • 5. Re: Batch processing in JBossCache
                  Manik Surtani Master

                  This is why I suggested the batching mechanism being reused by the tx management mechanism.

                  I.e., if we detect a call within a tx and no batching being done, we internally call batch.begin()

                  On tx commit time, we'd call batch.commit().

                  Some other things to be aware of - if a batch process has started before a tx starts, we need to commit that batch process before starting a tx - and then start a new batch for the tx.

                  Needs some thought either way.