4 Replies Latest reply on Dec 23, 2008 12:10 AM by kringdahl

    Recommendation for collections with large number of items -

    kringdahl

      We've been experimenting with ways to store large collection of items in the cache and have not been able to settle on the right way to do this. One of the major challenges is that collections are aspectized in the pojo cache. We've found that collections that have hundreds of items begin to suffer performance degradation. When collections get to a thousand or more, CPU consumption almost takes over the entire system and it appears that all of the time is being spent weaving the collection. The alternate approaches we have taken are:

      - Structure the tree cache such that items in the collection are all under a common node and then use the findAll capability to treat each item in a collection as an individual tree node. This eliminates collections all together but querying the entire collection is somewhat expensive
      - Use a Java class that implements Serializable (not annotating Replicable) having a single property which is a collection. This essentially forces the cache to treat the collection as a whole object and eliminates the weaving problem. This has proven to be better performing than the other approaches. But, it too has performance issues. Persisting a collection of 1000 small items takes around a minute or so. Compared to a weaved collection though, the weaved collection is almost impossible to persist at that size.

      So, my question to the JBoss Cache team is what is the recommendation to store collections of items in the pojo cache that could get into the hundreds or even thousands? Has there been any testing with respect to large collections of objects? Thanks in advance for any feedback.

        • 1. Re: Recommendation for collections with large number of item
          manik

          There hasn't been any specific testing or benchmarking for the best approach with which to store large collections, but I suspect the best may be something like the following:

          1. Create a facade implementation of your collection, let's say a List.
          2. The facade creates a node in the cache for the list, say /lists/mylist
          3. Each list item is stored as child nodes under the node in question, e.g., /lists/mylist/uuid
          4. /lists/mylist contains metadata about the collection, including a mapping of location in the list to uuid, as well as list size.

          This way, you get the fine grained replication needed as the only nodes changed when adding or removing elements are /lists/mylist and the specific sub-node added or removed. Retrieval and iteration should be efficient as well, since you just need to query /lists/mylist and each subnode in turn. Sizing operations would be fast as well. The only thing that would be slow is List.contains() but IMO that is tedious even in a JDK linked list or array list.

          Yes, this is a lot of extra hoops to jump through, but to gain performance with large collections, this may be the most efficient thing you can do.

          I'm sure others will appreciate it if you started a wiki page on this and posted results of your findings/experiments and maybe even code for such collection facades.

          HTH,
          Manik

          • 2. Re: Recommendation for collections with large number of item
            kringdahl

            Thanks Manik. We'll certainly be doing some experimentation here as it is a problem we need to address in the near term. When we've settled on an approach, I will post some results here.

            • 3. Re: Recommendation for collections with large number of item
              jason.greene

              Can you qualify what you mean by performance? It sounds like you are measuring full collection attach time, and not individual collection element operations.

              • 4. Re: Recommendation for collections with large number of item
                kringdahl

                Right, performance is a very general comment. But, for my purposes, I'll quantify performance as how quickly I can add or remove a single element from a collection of several hundred. That should be a relatively quick operation. But referencing my earlier post, using a native collection (as a List or Set) is a non-starter due to the massive amount of CPU used weaving the collection.