1 2 Previous Next 17 Replies Latest reply on Aug 15, 2012 10:42 AM by Mircea Markus

    retrieving all values/keys in a cache

    Travis Camechis Newbie

      How do you retrieve all the values or keys in a Distributed cache?  It looks like all the collection methods are not recommended for running in production( keySet() ).  I would assume at some point you would want to retrieve a list of all the keys/values so you know what exists in your cache.  I may be missing something obvious or missing a concept.

        • 1. retrieving all values/keys in a cache
          Mircea Markus Master

          In dist mode keySet() returns all the keys local to your node. In order to get all the keys from the system you'd need to call keySet() on each node and "merge" the results. One of the reasons we didn't wanted to implement aggragate operations to return all the data from the cluster is because that would mean migrating lots of data(potentially all the data in the system) on the node where this aggragate operatin is called.

          To check weather a specific key is in the system use cache.contains - that's reliable.

          • 2. retrieving all values/keys in a cache
            Travis Camechis Newbie

            hmm, So there is no good way to perform a findAll?  I mean at some point if your system is starting up your client can't guess at what may be in the cache.   Also just to make sure I am clear, when we talk about local node in Dist mode the way I see my setup would be an Application Server connecting to cache servers via the hot rod client.  Would the application server be considered the local node?

            • 3. Re: retrieving all values/keys in a cache
              Sanne Grinovero Master

              it all depends on what you need the "findAll" for. Why should your application ever do that?

               

              If you want to count the elements in cache, or apply some operation on each of them, you should be way safer using the Map/Reduce API http://community.jboss.org/wiki/InfinispanDistributedExecutionFramework

              Being a data grid, it's usually much better to send your operations to nodes rather than downloading your terabytes of information from the others, saturating network and likely going in out-of-memory.

               

              In practice, most applications work via keys: they know exactly what they are looking for, and don't need to see the full picture.

              • 4. retrieving all values/keys in a cache
                Travis Camechis Newbie

                So say you have this scenario. A user logins and you need to display all of there created projects.  Would the correct answer be to create a cache that contains is ( key = user,  value = List of all project ids ) and then access the project cache using all the keys?  What happens if there is no user, how would you display all the projects?

                • 5. retrieving all values/keys in a cache
                  Mircea Markus Master

                  I think Sanne's idea with Map/Reduce should work for your scenario.

                  • 6. retrieving all values/keys in a cache
                    Travis Camechis Newbie

                    I definitely understand why you wouldn't want to perform the get all because in some scenarios like you said you could end up attempting to pull back tons of data.  I am just trying to get my head wrapped around laying out the data.  I am currently using a CRUD scaffolded web app generated by spring roo.  All the app does is list all  projects, delete a project, edit a project.  Its the "list projects" thats getting me.  In SQL world you would select * from projects. 

                    • 7. retrieving all values/keys in a cache
                      Galder Zamarreño Master

                      One option would be for you to have a replicated cache with only the project names, and then contain the project information in a distributed cache. The replicated cache would give you the assurance that the local copy contains all projects in the cache, and individual project info could be retrieved on demand, maybe needing to go to a remote node but that would be handled by the distribution mode.

                       

                      Another possibility would be to use the query module (http://community.jboss.org/docs/DOC-14155) if you want to query particular projects at some point, i.e select * where project.name is like....

                      • 8. retrieving all values/keys in a cache
                        Travis Camechis Newbie

                        thanks for the reply.  I think that makes since.  This my first time playing with a datagrid so sometimes hard to get out the RDBMS style of layout data in the grid to make things easy to find.

                         

                        On a side not I thought I read some where that the Query module does not work in distributed mode ( via hotrod )?  Is that correct?  If not does it work with any protocols in DIST mode?

                        • 9. retrieving all values/keys in a cache
                          Galder Zamarreño Master

                          Yeah, remote querying, as in clients sending queries to backend hot rod servers is not there yet, but it's planned - https://issues.jboss.org/browse/ISPN-484

                           

                          Querying can only be done against an embedded cache, or cache that lives locally. Distributed queries are also planned: https://issues.jboss.org/browse/ISPN-200

                          • 10. Re: retrieving all values/keys in a cache
                            Randall Hauch Master

                            There certainly are times when it's useful to get all of the keys in a cache, and using distributed execution or map-reduce would be a perfectly good way to do that. Unfortunately, IIUC distributed execution and map-reduce only work on clustered caches. Is that still the case in 5.1 and 5.2? For example, in 5.1.2 executing a DistributedCallable against a local cache produces the following exception:

                             

                            Caused by: java.lang.IllegalStateException: Can not use non-clustered cache for DefaultExecutorService
                                      at org.infinispan.distexec.DefaultExecutorService.ensureProperCacheState(DefaultExecutorService.java:499)
                                      at org.infinispan.distexec.DefaultExecutorService.<init>(DefaultExecutorService.java:139)
                                      at org.infinispan.distexec.DefaultExecutorService.<init>(DefaultExecutorService.java:117)
                                      at org.modeshape.jcr.InfinispanUtil.getAllKeys(InfinispanUtil.java:64)
                            

                             

                            If this is the case, how would one get the keys on a local cache? It seems like the only option is to use "keySet()", which is described as being unreliable.

                             

                            Ideally, it should be possible to use distributed execution or map-reduce against any cache, regardless of whether or not it was clustered. That way code written to use Infinispan doesn't have to use different logic based upon how Infinispan is configured.

                            • 11. Re: retrieving all values/keys in a cache
                              Mircea Markus Master

                              Ideally, it should be possible to use distributed execution or map-reduce against any cache, regardless of whether or not it was clustered. That way code written to use Infinispan doesn't have to use different logic based upon how Infinispan is configured.

                              +1. An email thread[1] was started today around the same issue, in case you want to add your comments there as well.

                               

                              [1] http://infinispan.markmail.org/search/#query:+page:1+mid:heodmmc3hicyvuok+state:results

                              • 12. Re: retrieving all values/keys in a cache
                                Randall Hauch Master

                                Any thoughts on how to reliably get all the keys on a local cache? Is "keySet()" the only option, and if so will it reliably get all the keys (even when a cache store is configured)?

                                 

                                Is it okay to log an enhancement in JIRA to enable using the distributed execution and map-reduce on any cache?

                                • 13. Re: retrieving all values/keys in a cache
                                  Mircea Markus Master

                                  Any thoughts on how to reliably get all the keys on a local cache? Is "keySet()" the only option, and if so will it reliably get all the keys (even when a cache store is configured)?

                                  KeySet is reliable for the local caches but doesn't read stuff from the loader. MapReduce has same loader limitattion ATM.

                                  Is it okay to log an enhancement in JIRA to enable using the distributed execution and map-reduce on any cache?

                                  Yes please, thank you!

                                  • 14. Re: retrieving all values/keys in a cache
                                    Randall Hauch Master

                                    Any thoughts on how to reliably get all the keys on a local cache? Is "keySet()" the only option, and if so will it reliably get all the keys (even when a cache store is configured)?

                                    KeySet is reliable for the local caches but doesn't read stuff from the loader. MapReduce has same loader limitattion ATM.

                                    Wow, those are serious limitations. Is this documented somewhere? If so, I must have completely missed it!

                                     

                                    In other words, there is no reliable way to get *all* of the keys from a local cache that is using a cache store. Is this correct?

                                     

                                    And when using MapReduce on a clustered cache, there's no guarantee that *all* entries are processed by the MapReduce invocation. Is this correct?

                                    1 2 Previous Next