10 Replies Latest reply on Dec 4, 2006 4:44 PM by genman

    Please vote for your favorite issues -- mine :-)

    genman


      There's two features I came up with solutions for:

      JBCACHE-880 - Add per node expiration

      This is essentially an eviction (or removal) policy that allows you to specify when a node is automatically evicted. It is simply activated by adding a policy to your configuration, and by setting a key on a Node. How and why would this feature be useful to you?

      JBCACHE-841 - Add Node sorting

      An example use would be to cache a phone directory, and then be able to iterate or select a subset of children, such as "G-I".

      The simple solution is described in the issue. Basically, you would be able to indicate the sorted order of nodes by supplying a Comparator.

        • 1. Re: Please vote for your favorite issues -- mine :-)
          belaban

          880: I don't think this is important as regions already do that (for a set of nodes though). What's the compelling use case for per-node eviction ? So a -1 from me

          841: I think we should solve this problem fundamentally in a different way. Manik and I thought about functors, simple functions which (following the Visitor pattern) iterate over the nodes, e.g. in in-order, depth first order. So think of it as a closure that's applied to all nodes of a given subtree. This way we could implement sorted lists, map-reduce, *query* and in general, have a generic mechanism to transform a tree.

          • 2. Re: Please vote for your favorite issues -- mine :-)
            genman

            880: I know you and Manik don't believe me, but this is a fundamental feature that most cache services provide as part of their basic API, i.e. put(k, v, ttl).

            The dev discussion explains this motivation in detail. I already gave about 3 examples (HTTP cache, session cache, geospatial cache), are these not compelling examples?

            841: I don't see how you can implement efficient queries (that is better than O(N)) without some sort of indexing, or in this case per-node sorting.

            • 3. Re: Please vote for your favorite issues -- mine :-)
              belaban

               

              "genman" wrote:
              880: I know you and Manik don't believe me, but this is a fundamental feature that most cache services provide as part of their basic API, i.e. put(k, v, ttl).

              The dev discussion explains this motivation in detail. I already gave about 3 examples (HTTP cache, session cache, geospatial cache), are these not compelling examples?


              Why can't they be implemented using regions.

              Elias, I'm not really against this, but I'm against adding API (or parameters to existing methods) for things that can be achieved in a different way.
              Manik: if we can do this using the Options parameter, then I'm fine with Elias' proposal, otherwise still a -1 from me.


              841: I don't see how you can implement efficient queries (that is better than O(N)) without some sort of indexing, or in this case per-node sorting.


              Okay, fine as long as we do *not* need to add the 2 proposed methods to the Node interface, but to the OrderedNode *class*. So I assume this requires another enum to let the Node factory know which type of node to create ? However, if you want to use an OrderedNode, you'd have to narrow Node to OrderedNode.

              • 4. Re: Please vote for your favorite issues -- mine :-)
                manik

                Re: 880:

                No, it's not that I am against this idea, just that I think that this should be a custom extenstion (something a user may implement) rather than a part of the core JBoss Cache product.

                Then again, there has been some talk about this on the forums and it is useful for some of our core use cases (inc. http session repl).

                Regarding implementation, Elias' impl using an 'expiry' attribute in the node's data map is a little hacky and I don't like the fact that it is not typed so there is no way of checking this, but it is easily customised and extended as a result - plus it has very little impact on the overall arch.

                So I'm leaning to a +1 for the feature, and a +-1 on the impl. I'd like to hear more from others (especially users on this forum) before I make a clear decision on this though.

                Thanks for providing a prototype on this though, Elias, I do appreciate that.

                • 5. Re: Please vote for your favorite issues -- mine :-)
                  manik

                  Re: 841

                  Since with 2.0.0, the node is now a top-level construct, I can see the value in adding comparators to the dataset. I'mnot averse to adding stuff to the API provided it is useful and cohesive to the core offering of JBoss Cache. Again, +1 on the idea.

                  Functors is a nice genericisation for applying transformations to the data in the cache, and is a useful hook for allowing people to do stuff to the data that we *cannot* foresee or is sufficiently domain-specific, but I think sorting is (a) fairly fundamental to many use-cases and (b) fairly core to any system that organises and stores data.

                  So +1 on the API.

                  Again, re: the implementation, I don't think JBoss Retro will support JDK6 -> 4 weaving for a while. And there is no way I'm baselining on 6 for some time to come.

                  Also, I need to be careful of feature-creep in 2.0.0, so I may defer this to 2.1.0. (Adding an API method at that stage will not break the existing 2.0.0 API)

                  Unless, that is, this feature (complete with comprehensive test suites and full regression against the existing test suite) is contributed in time for 2.0.0. ;-)

                  • 6. Re: Please vote for your favorite issues -- mine :-)
                    genman

                    Thanks for your support.

                    Regarding your suggestion about functors: I'm actually thinking that something akin to a materialized view would be almost better. I may have the SQL terminology wrong, but the idea is that you take a query and then map the results to a table. When the underlying data is changed, then the results are updated.

                    The idea is that once you get your initial results, you would attach node listeners to the tree (perhaps just a region) and the results would be updated. In a case of a sorted query, the sorted data would be updated then.

                    I think this would probably be the best of both having an ordered result and what you speak of. You'd effectively have O(1) performance. The only negative consequence is that performance would be less.

                    I guess the implementation would be something like this:

                    public CacheQuery {
                     void evaluate(Node n);
                     Node getResultRoot();
                     void close();
                    }
                    


                    The queries would run against a region, not the whole tree. The key would be the "close" method that detaches the listener from the region when done. The problem is, how do you deal with all the locking and things like node removals?

                    If you are curious, though, I actually got a POJO TreeMap implemented using the OrderedNode. I can send the patches to you to look at. Though I do agree that it would almost be better to have a query language. I tried getting a TreeSet done, but it's actually very difficult as it's almost impossible to write a comparator for a Map's value. (You can try as an exercise.) What I would end up doing is keeping two trees around.

                    As for per-node expiration (removal), I believe that my implementation is "hacky" but doesn't impact the rest of the core code. If there is sufficient interest, it might be best to integrate it as a more developed feature.


                    • 7. Re: Please vote for your favorite issues -- mine :-)
                      belaban

                      The view idea is interesting, and we should discuss it further and put it on the roadmap.
                      IMO it is an orthogonal issue from functors though, as functors are a generic way of applying a function to a subset of all nodes. Functors are not restricted to queries.

                      • 8. Re: Please vote for your favorite issues -- mine :-)
                        manik

                        I agree. Functors would apply some transformation to a node, not a query/selection.

                        Still, important and interesting ideas here. Could we perhaps create JIRA tasks for these (so they don't get lost) and target them for, say, 2.3.0 for now?

                        The 3 items here would be:

                        1) OrderedNode
                        2) Functors
                        3) Cache querying

                        And create wiki pages to make a note of ideas, etc. and link to these pages from the JIRA tasks?

                        • 9. Re: Please vote for your favorite issues -- mine :-)
                          genman

                          I do know that functors can be used for both querying and updating, but for cache products, the data that's stored is typically a "mirrored" set of data from another source (e.g. the database.) The need for updating a large number of nodes seems unlikely. Maybe for evictions?

                          There are existing issues for 1, 2, and 3 I believe. I don't think there is any single page for capturing all of these.

                          And thinking more broadly of my per-node, time-based eviction requirement, if regions were easily constructed ("cheap") it would work well enough to simply create a sub-region or something where I could specify exactly the configuration I needed for a particular node.

                          • 10. Re: Please vote for your favorite issues -- mine :-)
                            genman


                            One thought I had was, why couldn't Regions be attached to Nodes directly? I know, for instance, Log4J attaches Appenders to Categories ("Loggers").