8 Replies Latest reply on Mar 1, 2004 11:10 AM by wtff

    find queries read in PKs for the complete resultset

      hello,

      my problem is that I got a table with 500000 records in it and a CMP EJB which has a find method.
      The JBossCMP doc describes in detail all options for read-ahead, fetch-size, eager-and lazy load groups and all that.

      However, all these optimizations seem to apply only to the loading of individual records. A find query first reads in ALL PrimaryKey fields for the complete resultset, instead of just reading in the first few matches.

      So, if my ejb find method has two params which, lets say I set to '%', so that the method will match the complete resultset of 500000 records, then I cannot page through this resultset because JBoss tries to prefetch all 500000 PKs, even if I just try to iterate through the first 10 records of the returned collection.

      Can someone help me here?
      I assume that every person working with CMP on JBoss must have come by this problem for it is surely inacceptable to prefetch PKs for a complete resultsets in cases where the size of the resultset can vary and is not know a-priori.

      Thanks a lot!

        • 1. Re: find queries read in PKs for the complete resultset
          aloubyansky

          This is going to be fixed soon in 3.2.4 with lazy (upon request) result set loading.

          • 2. Re: find queries read in PKs for the complete resultset

            aahhhh, thanks a lot!!! This is very good news.

            This is a big load off for me, since I need to allow users to page through resultsets and lazy resultset loading is exactly what I depend on.

            • 3. Re: find queries read in PKs for the complete resultset
              mikea-xoba

              this is also related to a question i was about to ask, which is:

              "i know there is caching for entity beans based on their primary keys, but is there also caching based on previous executions of finder methods?"

              [correct me if i'm wrong, please]

              for instance, if i have an 'account' entity bean and i make the primary key = login_name, then when someone logs in using their login_name, i imagine jboss could certainly get a cached version of that person's account bean when the application executes findByPrimaryKey(login_name), if the cache was still valid.

              on the other hand, it may be better design to have a primary key of Integer in the account entity-bean, and have the login_name just as a normal CMP field, in which case one would instead need to execute a generic finder method to find the account bean with the given login, i.e., findByLoginName(login_name). [in that case, the account creation process would ensure no two accounts had the same login_name, and the advantage would be that users could then change login_name's if they wanted, over time].

              but if every login or every access of a user's account entity bean involved a finder method as opposed to a primary key lookup, performance would vary greatly depending on whether finder method results are cached. so, are they? i read all the $$-docs but don't remember if the answer is buried somewhere in there, and am not yet familiar enough with jboss' code to figure it out myself.

              thanks!
              mike

              • 4. Re: find queries read in PKs for the complete resultset
                pilhuhn

                Have a look at the various commit-options.
                From 3.2.2 on, you can also put a

                <dbindex/>
                tag on a column, which also helps you speed up lookups in finders.

                • 5. Re: find queries read in PKs for the complete resultset
                  mikea-xoba

                  thanks for the info, and i'm already well aware of the commit options and the dbindex xml-element. those certainly can improve performance.

                  but i believe those things just pertain to the database or primary-key caching itself, right, and not to the caching behaviour for finder methods in particular? for instance, as the other gentleman pointed out, he found the finder method was always loading in all his 1/2 million primary keys! what i'm specifically wondering is if caching for finder methods can be made just as efficient (i.e., all in-memory) as it is for caching by primary key.

                  basically what i'm imagining (in lieu of investigating the actual code) is that the jboss cache is fundamentally a big [in-memory] Map from recent primary keys to entity bean objects, obviating the need for serialization and database access for those keys in the Map.

                  so, "is there also a Map of recent finder-parameters to collections of entity bean objects?"

                  that would be the 'finder cache' i'm looking and hoping for.

                  thanks again, mike

                  • 6. Re: find queries read in PKs for the complete resultset
                    pilhuhn

                    Afaik, there is no such map, but there was discussion last year in april in Paris about it. I don't recall the outcome though.

                    Basically, issues are invalidation of found entries and the need to find all cached finder entries that might point to a changed object. And this perhaps even per finder.

                    • 7. Re: find queries read in PKs for the complete resultset
                      aloubyansky

                      By the spec each finder must query the database. The exception is findByPrimaryKey in case of commit option A (and JBoss-specific D) which looks in the cache first. If the instance is cached query is not executed.

                      • 8. Re: find queries read in PKs for the complete resultset

                        >> By the spec each finder must query the database.

                        Too sad. Is this really so? This is a very peculiar thing to write into a spec. Usually, spec statements tend to be less concrete, leaving vendors room for optimizations...
                        If the above statement is indeed in the spec, someone should change it. Resultset caching is too nice a thing to leave it out of J2EE. ;(

                        mikea-xoba:
                        >> as the other gentleman pointed out
                        *g the who? the what?!

                        Jboss does not cache resultsets but c-jdbc does.
                        C-JDBC is an open-source framework for db clustering and offers support for resultset caching as well.
                        For one thing you could use c-jdbc as an intermediate layer between jboss and your database to add resultset caching capabilities.

                        Furthermore, provided that the c-jdbc algorithm for resultset-caching is valid, it might serve as a prove-of-concept that such a thing is possible. Hmm, c-jdbc even deals with clustered dbs, but on the other side also makes the assumption that all db accesses happen through jdbc connections...