4 Replies Latest reply on Sep 22, 2004 3:05 AM by kburns

    Fast iterations on large numbers of cache objects

    kburns

      Hi,

      I have a large number of objects in my cache (1000) all under the same node (/a/b), all instances of the same class (A). Each of these objects itself contains a Vector of objects (class B). Part of my business logic requires calling a method of class B for all class B instances contained in all Class A instances.

      I've implemented this as follows:

      1. query for all class A instances
      myTree.getChildrenNames("/a/b")

      2. for each FQN in the returned Set, get the object
      A obj = (A) myTree.getObject(FQN)

      3. for each instance of class A, iterate over the Vector of B's calling the method

      When running this test, it takes around 4 - 5 seconds to complete. I really need this time to be around 0.5 seconds. When using a Vector in place of the cache, the test takes less than 100 ms - hardly surprising since the objects are already in memory.

      I assume that the creation of the objects (during the myTree.getObject() process) is what is taking the time.

      The best solution I can think of is to maintain two copies of the objects that I need to quickly iterate over. One copy in the cache, a separate copy in memory (a Vector). This solution is viable only because this fast iteration only involves reading the object's data. When the object changes (in the cache) I can update the Vector version through a TreeCacheListener.nodeModified() mechanism.

      Has anyone else faced a similar issue?

      Have I overlooked a way to just use the cache only to do this fast iteration?

      Many thanks

      Ken

        • 1. Re: Fast iterations on large numbers of cache objects
          norbert

          If not being evicted, all your objects will reside in memory. What takes the time is the way you gonna access them.

          Looking up a thousand Names first and then looking up a thousand Objects by name is pretty much the slowest way at all to do this.

          This is like a paperboy that just delivers one paper at a time returning to his office to get the address of the next customer instead of making use of the fact that all customers live in houses that are neatly lined up in a row :-)

          Doing it this way will be much faster:

          Node myNode = myTree.get("/a/b");
          Map allOfMyNodesChildren = myNode.getChildren();
          Iterator it = allOfMyNodesChildren.entrySet().iterator();
          
          while(it.hasNext()) {
           ...
          }
          


          Even faster would be if you implement a TreeCacheListener that stores references to all the objects in question in a vector you gonna iterate over. Keep in mind, as long we are not talking of Strings or explicit use of 'clone()' Java Objects are passed by reference and not copied. Therefore it's pretty cheap (in terms of memory and cpu-use) to keep a (redundant) list of references to your objects. All u have to make shure is, that u don't keep references of objects u want to be garbage-collected.

          • 2. Re: Fast iterations on large numbers of cache objects
            kburns

            Hi Norbert,

            I tried your second suggestion with storing (a reference) to the objects in a Vector.

            Just to recap to setup:
            - 1000 objects of class A, each storing a Vector containing 10 objects of class B
            - Iterate over the 1000 objects (A), for each, calling a method on each of the B's

            Case 1:
            Store the 1000 objects in a TreeCacheAop
            -> Query (from the cache) takes about 2.5 seconds

            Case 2:
            Store the 1000 objects in a Vector only
            -> Query (from the Vector) takes about 20 ms

            Case 3:
            Store the 1000 objects in a TreeCacheAop and also "add" the object to a Vector
            -> Query (from the Vector, ignoring the cache) takes about 2.5 seconds

            Why is Case 3 taking so much time (and also about the same time as Case 1)? I don't understand the inner workings of the cache, but is it the CacheInterceptor that is slowing things down? To me it appears that all operations on the object are being routed through the cache version, no matter if I query the cache or the reference in my Vector?

            Looks like the only way to handle this fast access is to maintain a seperate copy of the object?

            Can anyone help?

            Regards

            Ken

            • 3. Re: Fast iterations on large numbers of cache objects

              Ken,

              If you are using TreeCacheAop, I am interested to see why it takes this long. Is it possible that you can send me your setup? I prefer it in JUnit test case file so I may incorporate it into the testsuite later. You will get the credit, of course. :-)

              To answer your Case 3 question, yes, if you declare both Class A and B *advisable*, then everything you do is intercepted. But the cost of aop is upfront; that is, during putObject. Afterwards, it should be relatively quick.

              -Ben

              • 4. Re: Fast iterations on large numbers of cache objects
                kburns

                Hi Ben,

                After some further investigations, I've gone back to the Person, Student classes in the release examples.

                Seems that the aop interception is happening during access of objects, and that accessing objects contained within objects significantly increases access times. Here is the setup I used.

                1. Add the following lines into the "Person" class:

                Vector myHobbies = new Vector();

                public void addHobby(Hobby theHobby)
                {
                myHobbies.add(theHobby);
                }

                public Vector getHobbyVector()
                {
                return myHobbies;
                }


                2. Create a new class "Hobby"

                public class Hobby
                {

                String name = null;

                public Hobby(String theName)
                {
                name = theName;
                }

                public Hobby()
                {
                }

                public String getName()
                {
                return name;
                }

                public void setName(String name)
                {
                this.name = name;
                }


                public String toString()
                {
                return "name=" + getName();
                }

                }


                3. Create objects:
                1000 Students adding 10 "Hobby" instances to each student via the addHobby() method in the Person class.


                4. Add each Student instance into a TreeCacheAop instance and also into a Vector (myVector).

                5. Iterate:

                Do a query on the Vector (myVector)...

                Object[] array = myVector.toArray();
                Student a = null;
                String hobbyName;
                Vector hobVect = null;
                Object[] hobarray = null;
                for (int i=0; i<array.length; i++)
                {
                a = (Student) array;

                // test 1
                hobbyName = a.getName();

                // test 2
                hobVect = a.getHobbyVector();
                hobArray = hobVect.toArray();

                // test 3
                loop through hobArray calling "getName()" on each


                }

                Results:

                test 1 - duration = 100ms
                test 2 - 430ms
                test 3 - 530 ms

                Seems to me that as soon as you access the Vector of objects (hobbies) under each Student is where the performance hit is kicking in?

                I'll try and put this into a JUnit test for you tomorrow.

                Hope the above sheds some light.

                Any ideas so far?

                Many thanks

                Ken