7 Replies Latest reply on Apr 8, 2008 6:35 AM by manik

    Fqns containing just Strings

    manik

      We've spoken about this at length in the past, and got to the point where we'd be happy with Fqns containing just Strings and Java primitives.

      After discussing things further at JBoss World, we really should just limit Fqn elements to Strings. If people wish to use other types, they could implement their own Fqn subclasses that can encode/decode primitives, etc. into String representations.

      Here is a summary of the discussions - please feel free to comment.

      Purpose: partly to improve performance with Fqn usage in Maps and the calculation of Fqn hash codes, as well as frequent creation.

      1. Remove Fqn generics!
      This was introduced in JBC 2.0.0 and has added an unnecessary layer of complexity for little benefit as far as Fqns are concerned.

      2. Fqns should only contain String elements
      If this were to contain primitives, these would be encoded as Strings internally anyway, so may as well let users do this themselves
      Perhaps as a subclass of Fqn (but make sure we make some internal methods final)

      3. Change Node.name to be a String as well instead of an Object, since Fqns would only contain Strings.

      4. Fqn.fromString() should escape "invalid" characters such as "/" to disambiguate from a String containing such characters.

      5. All Fqn constructors to be private (or protected). Instantiation via factory methods only.
      Reduces unnecessary object construction if we maintain a weak hashmap of pre-known Fqns and we can reuse these since Fqns are immutable.
      Need to be careful of and test for mem leaks!

      6. Profile to see if a search-and-get or search-and-instantiate-and-put is actually faster (on average) than outright construction, to validate the above approach.

      7. Fqn factory method to allow for String varargs and Fqn + String vararg signatures.

      8. All escaping of Fqn strings happens internally.

      9. Fqns hold a ref to the String elements (ArrayList) as well as a complete String rep, for quick hashCode() calculations and equals() comparisons.

      10. Expose an “intern()� method akin to String.intern() with the same effects and caveats about permgen space.

      11. Always intern Fqn.ROOT and other internal Fqn constructs like /_BUDDY_BACKUP_/ by default.


        • 1. Re: Fqns containing just Strings
          brian.stansberry

          Please note that changing from Fqn to Fqn is going to 100% break the Hibernate 2nd Level Cache use case, requiring a major rewrite. I'm not at all certain the proper semantics can be maintained. We talked about this at JBW and I thought there might be a workable solution, but as I think about it more, I'm not seeing one yet.

          The workaround we've discussed involves hashing the entity PK and treating a cache node as a hash bucket, with the PK as a key in the node's attribute map. This leads to the potential to have lock conflicts between transactions that by chance use entities whose PKs resolve to the same hash bucket. It also breaks the current putForExternalRead implementation, which fails fast if the node exists. It also makes invalidation more coarse grained, since a write to one key in a bucket invalidates the entire bucket on all remote peers.

          We've bounced around the idea of not storing the entity as a value in the hash bucket node's attribute map, but rather a GUID string. Hibernate/JBC integration would first find the GUID, and then use that to construct a String-only Fqn where it would store the real entity. This would deal with the putForExternalRead problem. With a lot of jujitsu in the Hibernate/JBC integration it *perhaps* could get around most of the potential for lock conflicts. It would certainly be complicated and prone to race conditions as multiple peers attempt to cache entities.

          But, as I think about it, the GUID approach absolutely won't work with invalidation, as the PK/GUID key/value pair will never be available on any remote peer. Each node will therefore cache a given entity under a different Fqn, and an update of the entity on one node will fail to invalidate the other caches.

          Other thing we've discussed is asking the Hibernate guys to come up with some infrastructure to deterministically convert an entity PK into a unique string. But,

          1) I don't think such magic exists, at least not for all types. For entities, we could add a requirement that all fields in the PK be primitive, which is not a major restriction, but is an obscure pain point.

          2) For query caching, the equivalent to the entity PK is an object that encapsulates the query string and all parameters. Deterministically converting that object into a unique string is another more complex variant on the entity PK task. Lots of fun escaping stuff to distinguish our representation of things from random query content. Further, any entity field type used as a parameter than goes into the query needs to be "stringable", not just those the go into PKs. In practice, again not a major restriction, but an obscure pain point.

          3) Perhaps most significantly, I don't see providing such an API as being a high priority for the Hibernate team.


          Bottom line, please recognize that making whatever JBC release incorporates this switch usable for Hibernate will be a pain. At minimum it will take time and resources, and the effort will compete for resources with whatever other things are going on. And thus may take longer. At worst, it will introduce a problem that can't be solved effectively. Either way, until the new integration is done, that JBC release will not be usable by EJB3 or JBoss AS.

          • 2. Re: Fqns containing just Strings
            brian.stansberry

            Haven't thought hard about this, but my gut instinct tells me the perf benefits of String-only Fqns may be obtainable without requiring only strings. Like you said, create Fqns only via factory methods. Factory detects the string-only situation and optimizes based on that. If not string-only, no optimization. More sophisticated approach segments the Fqn into the typical String-only leading portion and the non-String trailing portion.

            • 3. Re: Fqns containing just Strings
              jason.greene

              After discussing this with Brian, I agree with his position, that we should not force Fqns to be String only. The primary reason is that an Fqn is the definitive key of a node in the cache . Further, the node is the definitive notion of an "entity" in a domain model. While it makes things easier for us to restrict Fqns to strings, all we are really doing is pushing the problem onto users, and all of the possible solutions are not nearly as effective as the solution today.

              So I propose we look at this from a different perspective, accept that an Fqn must represent multiple Java types, and solve the remaining issues with that prerequisite in mind.

              The key issues are:


              1. Generic type information is wrong and leads to issues (wrong constructor is sometimes called)

                We can solve this problem by just removing the generic info. Generics is just not capable of representing an Fqn, since it has an unbounded number of components that are not guaranteed to represent the same type.

              2. Fqn's and Strings are not reflexive (Fqn.fromString(fqn.toString()) is not guaranteed to work)

                This issue leads to a lot of confusion and misuse of the API. I believe the solution support reflexiveness by adding a getEncodedString(). This method would produce an encoded string that could be used with Fqn.fromString() to produce an equivalent fqn (Fqn.fromString(fqn.getEncodedString()).equals(fqn)). This, of course, means encoding type information for non-string types, and adding proper escaping. Special light-weight encoding would be used for known types, with encoded java serialization as the fallback.This means component types in an Fqn must all be Serializable.

                Fqn.toString() would remain the same (normal Java semmantics), to allow for pretty printing.

              3. Performance in equals()

                We can introduce something similar to String.intern() with Fqns. This can be implemented by just having a hash map whose key and value is the single true Fqn instance. Equivalent but non-identical fqns will resolve correctly, due to the collections contract. However, identical fqns will resolve quickly since the object identity test short circuits the full comparison.

              4. Marshalling Fqns with user defined types (requires marshaling the region fqn first)

                This is solved by just having a clear requirement that regions can only use FQNs that are made up of types in the jboss cache classloader.

              5. Cacheloaders rely on Fqn.toString()

                This is solved by fixing the reflexiveness problem. They can use the encoded string instead of the pretty print string.


                So to summarize, I believe these are the changes we should make:

                • Remove generics from Fqn API
                • Enforce all Fqn components to be Serializable
                • Introduce Fqn.getEncodedString
                • Update cache loaders to use the encoded string
                • Update documentation


              • 4. Re: Fqns containing just Strings
                manik

                I see your point, Brian. The GUID approach really would be just a fudge.

                My main purpose in restricting Fqns is performance, so let's see if this can be achieved in other ways.

                But first, addressing Jason's suggestions:

                "jason.greene@jboss.com" wrote:
                After discussing this with Brian, I agree with his position, that we should not force Fqns to be String only. The primary reason is that an Fqn is the definitive key of a node in the cache. Further, the node is the definitive notion of an "entity" in a domain model. While it makes things easier for us to restrict Fqns to strings, all we are really doing is pushing the problem onto users, and all of the possible solutions are not nearly as effective as the solution today.


                Agreed. We would just be pushing the problem into the user space.

                "jason.greene@jboss.com" wrote:

                So to summarize, I believe these are the changes we should make:

                • Remove generics from Fqn API
                • Enforce all Fqn components to be Serializable
                • Introduce Fqn.getEncodedString
                • Update cache loaders to use the encoded string
                • Update documentation


                These all make sense. Re: generics, agreed, it was a bad idea introducing them for Fqns in the first place. Re: enforcing Fqn components to be serializable - I believe this currently is the case if the cache is replicated. Makes sense to enforce this in a more strict manner, using the Serializable interface instead of Object in Fqn.

                So now lets look at the performance of equals() and hashCode(). I'm assuming we're ok with just exposing factory methods on Fqn and keeping the constructors private? And then hold refs of all the Fqns internally, as Jason suggested? I presume we're looking at a weak ref hash map, to prevent mem leaks?






                • 5. Re: Fqns containing just Strings
                  genman

                  I would think you'd "intern" the components of the FQN rather than FQN itself. FQN by their nature are unique and it's the name components that frequently repeated. And then you wouldn't have to remove the constructors of FQN. And thanks to having the pool "equals" could use object identity to compare elements. You also wouldn't have to intern() Strings and many primitive types (like Boolean and integers in a range) as there is already a pool for those.

                  For more storage efficiency, you could FQN to hold a reference to its parent and just the leaf element rather than a list. The leaf element would be placed in the FQN component pool. You can also optimize hashCode computation by basing the cached hashcode off of the parent XOR'd by the child.

                  For encoding, you want the common case that "toString()" is the same as "toEncodedString".

                  And so there would be four cases: 1. String encoded same as before, except you escape / using \, and \ with \. 2. Java primitives, which would be prefixed with \ and the type, e.g. \I123 for integer 123. 3. Types with registered java.bean.PropertyEditor, you'd prefix with \C and the class and the String value, for instance: \Cjava.net.URL"http://example.net"/ and escape internal quotes. 4. Unregistered, regular Serializable types, where you use the type and value encoded as Base64 or similar.

                  But there's probably a better system out there for encoding, and if you can locate it, go for it.

                  • 6. Re: Fqns containing just Strings
                    manik

                    Certainly some valid points here to consider.

                    • 7. Re: Fqns containing just Strings
                      manik

                      I'm surprised I haven't created a JIRA for this yet. :-)

                      http://jira.jboss.org/jira/browse/JBCACHE-1322