6 Replies Latest reply: May 23, 2012 1:25 PM by dex chen RSS

    What is the correct way to get all keySet of a cache

    dex chen Novice

      On Cache interface, the method Set<K> keySet() and values() are all stated "

      This method should only be used for testing or debugging purposes such as to verify that the cache contains all the

          * keys (values) entered. Any other use involving execution of this method on a production system is not recommended."

       

      However, I did not see other alternatives to get keySet of a cache. What is the correct way to do this?

       

      Note: I have replication mode here.

        • 1. Re: What is the correct way to get all keySet of a cache
          Martin Gencur Novice

          Getting all keys by keySet is not recommended because it's a dangerous method. It's not atomic and could result in inconsistencies, i.e. you could get incomplete list in the event of adding new keys by another thread on another node in the cluster while getting the keySet.

           

          The proper solution is IMO to store the key set as a separate cache entry:

           

          Set<String> keys = new HashSet<String>();

          keys.add("key1");

          ...

          keys.add("keyX");

           

          cache.put(KNOWN_KEYS, keys);

           

          Then you can atomically get the whole set of keys.

          • 2. Re: What is the correct way to get all keySet of a cache
            dex chen Novice

            thanks for the suggestion and the insight.

             

            The suggested approache does not scale. In my case, the number of keys could be up to 100K. This will result in extra replication across nodes.

            I am aware of the potential "inconsistencies" or "not atmoic". That is one of the reasons that I posted question to have a cache wide lock or make a cache readonly earlier.

             

            In my case, I have the cache configured to use sync in replication, and use a cluster wide lock-token to ensure there is not adding/deleting of cache items when I call  cache.keySet(). 

            Do you see there are any other reasons that keySet() method on cache is not recommend to use?

             

            The comment in the source code does not state why the method is not recommend for production.

             

            Then, it seems the size() method wil not be reliable either. 

             

            It seems to me that operations such as keySet() or values() or size() of cache is so fundmental that we have to support.

            • 3. Re: What is the correct way to get all keySet of a cache
              Vladimir Rodionov Newbie

              Some caches can have millions or even billions of keys and this API is not suitable for very large caches, imo (it will takes too long and too much RAM to collect all keys from the cluster), but you can try, of course. I think instead of returning Set<> this API call should return iterator of keys and must allow to specify Filter object as well.

              • 4. Re: What is the correct way to get all keySet of a cache
                dex chen Novice

                I agree it could take long time to get all keys. The memory is a different issue. The question is how a user can get the keyset of a cache reliably.

                 

                The memory is a different issue.

                • 5. Re: What is the correct way to get all keySet of a cache
                  Galder Zamarreño Master

                  With replication mode, keySet() is not problematic, you can use it anytime really.

                   

                  With distribution mode though, it only gives you a local view of the keys present in the cache. IOW, it doesn't go and try to find all keys in the distributed cache, since that could be lengthy.

                  • 6. Re: What is the correct way to get all keySet of a cache
                    dex chen Novice

                    Galder: that's what I want to get comfirmed. thatnks.