8 Replies Latest reply on Nov 10, 2006 12:22 AM by genman

    Possible optimization of CachedSetImpl

    genman

      I was thinking that, instead of using a simple integer tag for each POJO in a Set, that you could incorporate the hash code (or part of it) along with a counter for objects that happened to have the same hash code, but not equal.

      For example, the Set.contains() method could be written as:

       public boolean contains(Object o)
       {
       int hash = o.hashCode();
       Collection<Object> children = node.getChildrenNames();
       for (int i = 0; i < children.size(); i++) {
       Long key = new Long((long)hash | ((long)i << 32));
       Object o2 = ...get(key);
       if (o2 == null)
       return false;
       if (o.equals(o2))
       return true;
       }
       return false;
       }
      


      Worst-case performance is O(N), best is O(1).

      Set.remove() would mean more complication; you'd have to actually shift entries around to fix gaps, if there were any.

      I could contribute this as a patch if it made sense.

      One thing I wonder though is, does it hurt to use Integer in an FQN instead of a String? I didn't see any problems during tests... Long.toString() would be fine I guess but less efficient.

        • 1. Re: Possible optimization of CachedSetImpl

          I have looked at your patch. They look good. Just two comments here:

          1. In putNoMask and getNoUnMask, you chosen to call putObject and getObject, respectively (instead of attach and detach). Any reason for it?

          2. On remove(), shifting the POJO has always been expensive. I have been pondering on a way to get rid of that. Actually, it is OK to gap in between now since you have used the hashcode+. So actually, for remove(), we can get the Node.getChildren() and iterate on that child Fqn to the whole set if needed. This way, we don't reply on the specific consecutive odering. What do you think?

          -Ben

          • 2. Re: Possible optimization of CachedSetImpl
            genman

            1. I actually don't know the difference ... I may have copied that from the original implementation. The tests seemed to pass. What's the sematic difference?

            2. I'm not sure how this would work better?

            The need for offsetting the hash code is to deal with a typically rare edge condition. If the hash function is reasonably distributed, then the chances of hash code collision (and therefore the need to do the hash code+1 trick) would be at best 1 in 2^32 and likely only a few orders of magnitude worse otherwise.

            I think people expect hash collisions would happen more often, but that's because in ordinary hash tables, collisions happen from evaluating (hash code % some small number).

            • 3. Re: Possible optimization of CachedSetImpl

               

              "genman" wrote:
              1. I actually don't know the difference ... I may have copied that from the original implementation. The tests seemed to pass. What's the sematic difference?


              attach/detach/find will go through the proper PojoCache interceptors (e.g., tx lock and rollback). See pojocache-aop.xml for the corresponding stack.


              2. I'm not sure how this would work better?

              The need for offsetting the hashcode is to deal with a typically rare edge condition. If the hash function is reasonably distributed, then the chances of hash code collision (and therefore the need to do the hash code+1 trick) would be at best 1 in 2^32 and likely only a few orders of magnitude worse otherwise.

              I think people expect hash collisions would happen more often, but that's because in ordinary hash tables, collisions happen from evaluating (hash code % some small number).


              I am not talking about offsetting the hashcode. I like your implementation! What I was referring to is the need to loop through the cache index consecutively, e.g., for remove(), you have:
               for (int i = 0; i < size; i++)
               {
               Object key = toLong(hashCode, i);
               Object o2 = getNoUnmask(key);
               if (o2 == null)
               break;
               if (removed)
               {
               // move o2 to old key
               pCache_.detach(AopUtil.constructFqn(getFqn(), key));
               pCache_.attach(AopUtil.constructFqn(getFqn(), oldkey), o2);
               }
               if (o.equals(o2))
               {
               pCache_.detach(AopUtil.constructFqn(getFqn(), key));
               removed = true;
               }
               oldkey = key;
               }
               return removed;
              

              insead of index looping, I could have like in pseudo code:
              // This is assumed to consist of only set objects.
              Collection list = cache.getNode(getFqn()).getChildren();
              for(Collection c: list)
              {
              // check if o is present
              Object o2 = cache.find(c);
              if(o == o2) cache.detach(c);
              return;
              }
              

              This way, we don't maintain odered list of index.

              BTW, the unit tests have some failure from the Set tests. I will probably need to look into to see where they go wrong.

              Thanks,

              -Ben

              • 4. Re: Possible optimization of CachedSetImpl
                genman

                I'll change putObject() to use attach().

                As for the existing loop. The loop does go from 0..size, but in practice it will only execute once or twice. The "if (o2 == null) break;" statement is how it breaks out.

                Sorry about the unit tests. I should have run the whole suite.

                • 5. Re: Possible optimization of CachedSetImpl

                  Well, the cost is more of shifting the current pojo at index (i+1) to (i) with detach and then attach when I do a remove (o) where o resides at index (i).

                  So I am saying we don't have to shift since Set is neither ordered or indexed.

                  BTW, just looking at it again, I don't think add(o) will prevent duplication as key can be different. What do you think?

                  Yeah, the whole suite is functionalPojoCacheTests. It'd certainly be nice to run through it once a while. :-)

                  • 6. Re: Possible optimization of CachedSetImpl
                    genman

                    I don't believe "remove" requires moving more than a single key ordinarily, and if not only those keys with the same hash code. Anyway, is there some sort of load test I can run to verify performance?

                    • 7. Re: Possible optimization of CachedSetImpl

                      In pojo.collection.CollectionTest there are some tests like Xtest... that Scott added a while ago to benchmark the performance. They should belong to the performance directory (parallel to functional). It'd be nice if you can take these, enhance them, and check them into the corresponding performance directory! Thanks.

                      • 8. Re: Possible optimization of CachedSetImpl
                        genman

                        Done.

                        The performance of Set is now about twice as good as List for adds. Iteration is about the same for either.