4 Replies Latest reply on Feb 17, 2006 9:57 AM by hmesha

    JBCACHE-382, improve the performance of JDBCCacheLoader

    hmesha

      This forum entry to Start the discussion of this issue.


      This issue calls for the following 2 actions to improve the performance of JDBCCacheLoader when executing SQL statements

      1. During putObject (TreeCacheAop), we can try to optimize by batching the put, e.g., use put(fqn, map) in one shot.

      2. Investigate the feasibility of using stored procedure call during prepare phase to minimize network call.

      In #1, I think it would be more effective if we use the batch functionality of the statement object that was introduced in JDBC 2.0 inside JDBCCacheLoader. Since we always run inside a transaction, The integrity of the data will be protected and it'll benefit all the modules that use JDBCCacheLoader.

      In #2, Using stored procedure will improve the performance but it means that we have to write stored procedure for every single DBMS that we support.

      Let me hear your ideas on what this issue:)

      Cheers,

      Hany

        • 1. Re: JBCACHE-382, improve the performance of JDBCCacheLoader
          hmesha

          WRT using stored procedure:

          Stored procedure is DBMS vendor specific so if we use stored proc then we have to write stored procedure for all the databases that we support. Also, it's not a "write once run anywhere" solution.

          My suggestion is to make the stored procedure optional to use and pluggable through the configuration file as we currently do with the other variablies inside JDBCCacheLoader. Therefore, It'll be upto the user to write the stored procedure for the database they use.

          Any other ideas?

          • 2. Re: JBCACHE-382, improve the performance of JDBCCacheLoader
            manik

            I'm not too keen on the stored proc idea - not all DBs even support stored procs (MySQL < 5.0)

            Having putObject() call put(fqn,map) may help. If we are in a TX though, isn't this how it is done anyway - we call loader.prepare(modificationlist) on the cache loader when the tx is coompleting?



            • 3. Re: JBCACHE-382, improve the performance of JDBCCacheLoader

              Either stored procedure or batch update is needed on the cache loader side. This is not a problem on the POJO Cache side. Any cache loader user that is doing a lot of get/set within the same transaction will trigger this kind of problem.

              • 4. Re: JBCACHE-382, improve the performance of JDBCCacheLoader
                hmesha

                Through emails we have been discussing the use of batch facility of JDBC 2.0 with this issue. Below are the replies. Based on this discussion, I'm moving this issue to 1.4 release for further investigation.

                Hany

                Hany Mesha wrote:


                WRT this issue, there was 2 options to improve the performance.

                1) the use of Stored Proc. which is ruled out in the design forum thread http://www.jboss.com/index.html?module=bb&op=viewtopic&t=76590

                2) the use of batch update, I have tried several technique to enforce batch update on put() and I didn't succeed. I came to the conclusion that the updates are spread over multiple transactions making it impossible to do batch update and maintain working state of the objects.


                I don't see a better way to improve performance with the current CacheLoader design and my knowledge of JDBC programming. So I have run out of options at this point, any thoughts?

                P.S. If you're interested in seeing the impl of the batch update, please let me know and I can send you the class and its test case


                Bela Ban wrote:

                Maybe there is no solution then.

                Okay, to recapitulate, the loader gets a put(List modifications) and,
                for each modification M, sends a JDBC statement to the DB.

                What we want to bundle is the communication between loader and DB, but
                if we cannot use stored procs, then we may have to look into batch
                updates (available with JDBC 2.0):
                http://www.jguru.com/faq/view.jsp?EID=5079 (google "batch JDBC")

                This might be the solution. I don't mind requiring JDBC 2.0 as baseline
                for JDBCCacheLoader.
                Can you look into this and see what JDBC max version is supported, say,
                by Oracle, MySQL, MSSQL and PostgresQL ?

                Or, check whether JDBC 2.0 is supported by the driver, and ? if yes ? do
                batch updates, otherwise fall back to single statements.


                Ben Wang wrote:


                Yeah, if we can't find a good enough solution, then let's mark it and
                put it in later release. Maybe something better will come up. Or is
                using as Hibernate as a persistence plugin will help here? That is, if
                Hibernate can help, can we write a layter that works for both our native
                JDBC and Hibernate. We don't ship Hibernate as a default. But if a suer
                wants it, he will download it himself?

                I have cc?ed Steve again for this thread. Steve, care for a quick
                comment? :?)

                Thanks,

                ?Ben


                Steve Ebersole wrote:

                JDBC2??? Welcome to the 20th century; oh wait... ;)

                Most (all *real*) databases support JDBC3 (you know JDBC4 is already in
                public review, right...). All the ones you listed support JDBC3. Of
                course "support" varies in its definition, but the batch update api is
                supported by all of them.

                JDBC batch updates, by definition, cannot span transactions. They need
                to be done within the same transaction. Also, you can only compose
                batches made up of the same prepared statement (the different elements
                in the batch are just different sets of parameters); this in particular
                is a real weakness in the JDBC batch update api. Another option (if you
                need to batch together different statements) would be to use the
                dialect?specific statement grouping syntax (i.e., "BEGIN ... END" on
                Oracle) if such a thing is supported.

                Or, of course, we can talk about potential Hibernate usage here...


                Bela Ban worte:

                Steve just discovered what an idiot I am wrt JDBC... :?) Last time I
                used JDBC was 19XX...

                Oh, I see, if we can only batch together statements of the same kind
                (INSERT, UPDATE), then that feature
                may be useless to us.

                Hany: can you check what statements we actually use in the JDBCCacheLoader ?

                Hmm, actually looking at this, we're only using 2 kinds of statements:
                INSERTs and UPDATEs. Probably REMOVEs too.

                So, as an optimization, we could try to at least bundle statements of
                the same kind,
                e.g. I I I U U I U U U U R I U U

                could be bundled into 3 inserts, 2 updates, 1 insert, 4 updates, 1
                remove, 1 insert, 2 updates...

                If that's the case, then (Hany) I suggest you add these comments to the
                case, and move it to 1.4,marked as an optimization


                Bela Ban worte:

                Okay. Hany, I think you have enough information to get started on this.
                If we can find out whether
                addBatch() is supported for a given driver (maybe it'll throw an
                UnsupportedOperationException if not), we could
                make use of it, and simply fall back to multiple statements if this is
                not supported.
                Thanks for the clarification Steve !
                Cheers,


                Hany Mesha worte:

                First, let me thank Steve for a well done explanation.
                Now, Here's what I have done:

                I created JDBC2CacheLoader to experiment with the batch update facility. I mainly focused on put(List modifications). After all, it's the performance bottle neck reported on this issue. In put(List modifications) I test the driver for batch update support. If true, then I mount the update statements on a list, then at the end I call executeBatch() to perfrom the batch update.

                What I have found out is that the CacheLoader, by design, we do make flow control decisions between updates and for the reasons that Steve stated below that the batch of updates can only be applied to a narrow scope. If the scope is widen, the update statements collide with each other (see the attached files for more details)

                Running the test cases with batch update and without shows that we're not gaining any performance using batch in that limited scope.

                Therefore, as I said in the beginning of this email thread, CacheLoader design prevents us from using batch update effectively.

                Finally, I'm suggesting to push this issue to 1.4 for further investigation of a better solution. Perhaps, using Hibernate as Steve suggested.

                What you think?