1 2 Previous Next 22 Replies Latest reply on Jan 14, 2008 10:24 AM by brian.stansberry

    Implicit marshalled values - a better way of handling region

    manik

      We had this discussion in Neuchatel, I'm bringing it up again here to solidify an approach.

      This pertains to JBCACHE-1231

      Currently, we place direct object references in the cache. If the object in the cache is in a defined region with it's own class loader, we do the following when marshalling (and teh reverse for unmarshalling) the node (for replication, cache loading, etc etc).

      1) serialize Fqn of the region
      2) use the region's registered class loader to serialize cached objects

      AFAIK - and please correct me if I am wrong - Brian, for HTTP sessions, your "cached objects" are MarshalledValue instances that only unmarshall when you do a MarshalledValue.get() as opposed to when the JBC marshaller unmarshalls.

      I presume this means we have 2 levels of marshalling/unmarshalling, which could be wasteful.

      How about we do this:

      1. In JBC I have my own "MarshalledValue" equivalent, which stores a reference to the object, a byte array and a class loader (all of which can be built lazily).
      2. When an object is put in cache (cache.put()) I create a MarshalledValue and store the object as well as the thread's class loader.
      3. When the object is requested (cache.get()) I inspect the object and if it is a MarshalledValue, I return the object. If the object is not present, I deserialize the byte stream, first with the associated class loader, and if one is not present, the calling thread's class loader.
      4. When replicating, the marshalled value serializes the object using the associated class loader. When receiving a replication event, the cache marshaller just creates a marshalled value with a null class loader and stores the byte array in the marshalled value, letting this unmarshall lazily when requested, as in 3. above.

      The impact of this is:

      * Gets rid of ugly Fqn hacks when marshalling streams
      * No double marshalling for HTTP sessions
      * Transparent classloader switching based on calling thread
      * Lazy deserialization of replication (== faster sync replication)

      The downsides:

      * May mean changing client code if the client already deals with MarshalledValues (such as HTTP session clustering)
      * Incompatible wire format with 2.0.0, but I am happy to break this as long as we provide an old, backward compatible marshaller.

      Thoughts?

        • 1. Re: Implicit marshalled values - a better way of handling re
          brian.stansberry

           


          AFAIK - and please correct me if I am wrong - Brian, for HTTP sessions, your "cached objects" are MarshalledValue instances that only unmarshall when you do a MarshalledValue.get() as opposed to when the JBC marshaller unmarshalls.

          I presume this means we have 2 levels of marshalling/unmarshalling, which could be wasteful.


          By default, region based marshalling is not enabled. You have to enable it if you want to use FIELD (which doesn't use MarshalledValue). But if you enabled it for FIELD and also had non-FIELD apps, yes, you'd have double marshalling.

          Plus, having to change the cache config to support FIELD is crappy.

          2. When an object is put in cache (cache.put()) I create a MarshalledValue and store the object as well as the thread's class loader.


          We have to be careful with the classloader ref, as leaking classloaders is a big no-no. It could be stored as a WeakReference, but need to think through how much that buys you versus not storing it at all and just using the TCCL. Particularly since on remote nodes the classloader ref will be null.

          I know in the standard AS use cases the read is always done with the correct TCCL.

          3. When the object is requested (cache.get()) I inspect the object and if it is a MarshalledValue, I return the object. If the object is not present, I deserialize the byte stream, first with the associated class loader, and if one is not present, the calling thread's class loader.


          ... and throw away the ref to the byte[] -- i.e. no double memory usage.


          I'll need to find my scribbled design notes from when I thought about shared HTTP sessions, but I know that one thing that came out of my thinking was a need to flush the type system by converting everything back to a byte[]. This would be done on a redeploy, where we want to preserve cached data but the types are no longer usable. The MarshalledValue impl I was thinking of would have a method for that. I was assuming the session managment layer would walk the tree, get the marshalled values and invoke the "flush" method on them; if this is instead an internal detail of JBC there would need to be an API (probably on Node) to tell it to do the same thing.


          Other random thoughts...

          1) Shouldn't use the MarshalledValue for primitive types.
          2) For collections, should probably just use it rather than doing something like trying to determine the type of all collection elements.
          2) Perhaps its a behavior that should be cache-wide configurable. So, if I'm storing a bunch of Sets whose elements are all Strings, I could turn this off.

          • 2. Re: Implicit marshalled values - a better way of handling re
            manik

             

            "bstansberry@jboss.com" wrote:

            AFAIK - and please correct me if I am wrong - Brian, for HTTP sessions, your "cached objects" are MarshalledValue instances that only unmarshall when you do a MarshalledValue.get() as opposed to when the JBC marshaller unmarshalls.

            I presume this means we have 2 levels of marshalling/unmarshalling, which could be wasteful.


            By default, region based marshalling is not enabled. You have to enable it if you want to use FIELD (which doesn't use MarshalledValue). But if you enabled it for FIELD and also had non-FIELD apps, yes, you'd have double marshalling.


            You're just talking about double classloader-specific marshalling. Even if region-based marshalling is off, there will still be double-marshalling, i.e., Pojo -> a byte[] in your MarshalledValue, which is put in the cache. The CacheMarshaller then marshalls your MarshalledValue into a (bigger) byte[] for replication. Or is there something clever in your MarshalledValue class that just passes the byte[] payload in writeExternal()?


            "bstansberry@jboss.com" wrote:

            Plus, having to change the cache config to support FIELD is crappy.


            Agreed.

            "bstansberry@jboss.com" wrote:

            2. When an object is put in cache (cache.put()) I create a MarshalledValue and store the object as well as the thread's class loader.


            We have to be careful with the classloader ref, as leaking classloaders is a big no-no. It could be stored as a WeakReference, but need to think through how much that buys you versus not storing it at all and just using the TCCL. Particularly since on remote nodes the classloader ref will be null.

            I know in the standard AS use cases the read is always done with the correct TCCL.


            True. Maybe just the TCCL approach would work then.

            "bstansberry@jboss.com" wrote:

            3. When the object is requested (cache.get()) I inspect the object and if it is a MarshalledValue, I return the object. If the object is not present, I deserialize the byte stream, first with the associated class loader, and if one is not present, the calling thread's class loader.


            ... and throw away the ref to the byte[] -- i.e. no double memory usage.


            Yes, definitely. Only one would exist at any time - the object ref or the byte array; never both.

            "bstansberry@jboss.com" wrote:

            I'll need to find my scribbled design notes from when I thought about shared HTTP sessions, but I know that one thing that came out of my thinking was a need to flush the type system by converting everything back to a byte[]. This would be done on a redeploy, where we want to preserve cached data but the types are no longer usable. The MarshalledValue impl I was thinking of would have a method for that. I was assuming the session managment layer would walk the tree, get the marshalled values and invoke the "flush" method on them; if this is instead an internal detail of JBC there would need to be an API (probably on Node) to tell it to do the same thing.


            Hmm. Let me think about it, but I can't really see a way for this to happen automatically. It would have to be an API call.

            "bstansberry@jboss.com" wrote:

            1) Shouldn't use the MarshalledValue for primitive types.


            Or any other JDK objects - Dates, etc.

            "bstansberry@jboss.com" wrote:

            2) For collections, should probably just use it rather than doing something like trying to determine the type of all collection elements.
            2) Perhaps its a behavior that should be cache-wide configurable. So, if I'm storing a bunch of Sets whose elements are all Strings, I could turn this off.


            Yes, there should always be a way to disable this and allow manual spinning of marshalled values. I'd enable it by default though just to allow for lazy unmarshalling.


            • 3. Re: Implicit marshalled values - a better way of handling re
              jason.greene

               

              "manik.surtani@jboss.com" wrote:

              about double classloader-specific marshalling. Even if region-based marshalling is off, there will still be double-marshalling, i.e., Pojo -> a byte[] in your MarshalledValue, which is put in the cache. The CacheMarshaller then marshalls your MarshalledValue into a (bigger) byte[] for replication. Or is there something clever in your MarshalledValue class that just passes the byte[] payload in writeExternal()?


              This is how it should be done if its not already.


              True. Maybe just the TCCL approach would work then.


              Yes, this is the simplest, cleanest way.



              "bstansberry@jboss.com" wrote:

              I'll need to find my scribbled design notes from when I thought about shared HTTP sessions, but I know that one thing that came out of my thinking was a need to flush the type system by converting everything back to a byte[]. This would be done on a redeploy, where we want to preserve cached data but the types are no longer usable. The MarshalledValue impl I was thinking of would have a method for that. I was assuming the session managment layer would walk the tree, get the marshalled values and invoke the "flush" method on them; if this is instead an internal detail of JBC there would need to be an API (probably on Node) to tell it to do the same thing.


              Hmm. Let me think about it, but I can't really see a way for this to happen automatically. It would have to be an API call.


              I think this is very useful. I think it needs a better name than flush though. Maybe releaseObjectReferences(fqn) or something like that.


              "bstansberry@jboss.com" wrote:

              2) For collections, should probably just use it rather than doing something like trying to determine the type of all collection elements.
              2) Perhaps its a behavior that should be cache-wide configurable. So, if I'm storing a bunch of Sets whose elements are all Strings, I could turn this off.


              Yes, there should always be a way to disable this and allow manual spinning of marshalled values. I'd enable it by default though just to allow for lazy unmarshalling.


              I agree, this behavior should be the default. The overhead is minor (temporary double buffer).

              • 4. Re: Implicit marshalled values - a better way of handling re
                brian.stansberry

                 

                "manik.surtani@jboss.com" wrote:

                Or is there something clever in your MarshalledValue class that just passes the byte[] payload in writeExternal()?


                Here's what it does (I take no credit or blame; long predates me ;) )

                 public void writeExternal(ObjectOutput out) throws IOException
                 {
                 int length = serializedForm != null ? serializedForm.length : 0;
                 out.writeInt(length);
                 if( length > 0 )
                 {
                 out.write(serializedForm); // this is a byte[] created in the c'tor
                 }
                 out.writeInt(hashCode);
                 }


                The class is in the AS server module, org.jboss.invocation.MarshalledValue. It was originally written for use in remote invocations, and thus doesn't lazy-serialize. When we started using it for caching, that IMHO was a mistake; we should have written a version that has the behavior discussed on this thread.


                Or any other JDK objects - Dates, etc.


                Just have to be careful to exclude anything that can wrap a non-JDK type. Also can't use instanceof in type checking.


                Re: "releaseObjectReferences(fqn)" as the method name, sounds good. The "flush" name was just me being lazy in my post. :)

                +1 as well to having this be the default behavior.

                • 5. Re: Implicit marshalled values - a better way of handling re
                  brian.stansberry

                  Twist on this I just remembered. With entity caching, user types can be part of the FQN as well. Primary key of the entity is the last element of the FQN. A user type in an FQN is of course generally possible; entity caching is just a particular example where I've had to handle it with region-based marshalling. (The region portion of the FQN is only strings).

                  • 6. Re: Implicit marshalled values - a better way of handling re
                    jason.greene

                     

                    "bstansberry@jboss.com" wrote:
                    Twist on this I just remembered. With entity caching, user types can be part of the FQN as well. Primary key of the entity is the last element of the FQN. A user type in an FQN is of course generally possible; entity caching is just a particular example where I've had to handle it with region-based marshalling. (The region portion of the FQN is only strings).


                    IMO we should start requiring that FQNs are only composed of serializable types that are on the cache classpath. POJO Cache does a very similar thing to entity caching, and it should also be changed not to do this. We should instead compute a value based off of the hash code. In my case, which should be possible with an entity cache, I can actually allow collisions, by simply using the key object as an attribute (key) on the node, instead of the last element of an fqn. The value of the key would point to an internal fqn, containing the data of the object. Alternatively, a sub fqn structure could be created using a unique value. In this alternative case, you would have to iterate all children, performing a comparison.

                    -Jason



                    • 7. Re: Implicit marshalled values - a better way of handling re
                      jason.greene

                       

                      "jason.greene@jboss.com" wrote:
                      In this alternative case, you would have to iterate all children, performing a comparison.


                      By iterating all children, I mean all children of the same hash code.

                      • 8. Re: Implicit marshalled values - a better way of handling re
                        manik

                         

                        "jason.greene@jboss.com" wrote:
                        "bstansberry@jboss.com" wrote:
                        Twist on this I just remembered. With entity caching, user types can be part of the FQN as well. Primary key of the entity is the last element of the FQN. A user type in an FQN is of course generally possible; entity caching is just a particular example where I've had to handle it with region-based marshalling. (The region portion of the FQN is only strings).


                        IMO we should start requiring that FQNs are only composed of serializable types that are on the cache classpath. POJO Cache does a very similar thing to entity caching, and it should also be changed not to do this. We should instead compute a value based off of the hash code. In my case, which should be possible with an entity cache, I can actually allow collisions, by simply using the key object as an attribute (key) on the node, instead of the last element of an fqn. The value of the key would point to an internal fqn, containing the data of the object. Alternatively, a sub fqn structure could be created using a unique value. In this alternative case, you would have to iterate all children, performing a comparison.

                        -Jason



                        +100. Hugely limiting in the way we have to support custom objects in Fqns right now. I understand why people don't want just Strings as Fqn elements, but we should restrict this to Strings + boxed primitives or something like that.

                        Again, though, not something we can do now (2.1.0). Perhaps 3.0.0?


                        • 9. Re: Implicit marshalled values - a better way of handling re
                          manik

                           

                          "manik.surtani@jboss.com" wrote:


                          +100. Hugely limiting in the way we have to support custom objects in Fqns right now.


                          And slow too. A large percentage of time spent marshalling calls is spent marshalling Fqns as individual elements so we can allow for custom objects.

                          I'd like to make this a priority for 3.x, to restrict Fqn elements to {String, byte, short, int, long, float, double, char, boolean}. Thoughts?



                          • 10. Re: Implicit marshalled values - a better way of handling re
                            manik

                            Regarding the marshalling of custom Fqn elements (for now/2.x), how about we apply the same MarshalledValue approach for Fqn elements that aren't in a set of JDK classes?

                            I propose that for the purpose of the MarshalledValue, we *just* use primitives as the set of elements/values that are not wrapped in a MarshalldValue.

                            The argument is that even though things like Strings and Dates don't need specific class loaders, should we bother with the cost of deserialization when, after replication, the objects may never be used and just get evicted after some time? After all, MarshalledValues provide a mechanism of lazy serialization/deserialization as well as the use of specific class loaders.

                            Thoughts?

                            • 11. Re: Implicit marshalled values - a better way of handling re
                              jason.greene

                               

                              "manik.surtani@jboss.com" wrote:

                              +100. Hugely limiting in the way we have to support custom objects in Fqns right now. I understand why people don't want just Strings as Fqn elements, but we should restrict this to Strings + boxed primitives or something like that.

                              Again, though, not something we can do now (2.1.0). Perhaps 3.0.0?


                              We should deprecate it in 2.1.0, but still allow it to work. It never worked like people expected anyway. Most importantly, we should change all of our code that does this sooner than later.

                              This is an example of how it could possibly work with entity caching:
                              /entities/[TYPE]/[HASHCODE]
                               KEY1 -> UUID1
                               KEY2 -> UUID2
                               KEY3 -> UUID3
                              /entities/[TYPE]/[HASHCODE]/[UUID1]
                               [FIELD_NAME] -> [FIELD_VALUE]
                               ...
                              /entities/[TYPE]/[HASHCODE]/[UUID2]
                               [FIELD_NAME] -> [FIELD_VALUE]
                               ...
                              /entities/[TYPE]/[HASHCODE]/[UUID3]
                               [FIELD_NAME] -> [FIELD_VALUE]
                               ...
                              


                              So a given entity requires 2 node lookups, no matter the number of collisions. There will, of course, be contention on the hashcode, but this is expected.

                              • 12. Re: Implicit marshalled values - a better way of handling re
                                jason.greene

                                 

                                "manik.surtani@jboss.com" wrote:

                                I'd like to make this a priority for 3.x, to restrict Fqn elements to {String, byte, short, int, long, float, double, char, boolean}. Thoughts?


                                It should be anything that has can be mapped to AND from a string. So that would be all primitives and wrapper types, and enums. We could support lazily loading an enum (its just a string), or we could just tell people they have to convert them to and from strings themselves.

                                • 13. Re: Implicit marshalled values - a better way of handling re
                                  manik

                                  Wouldn't enums still be a problem in that they would require user type definitions and, hence, a class loader? I agree that they could easily be coded to and decoded from a String, but what of the case where there are 2 enums with the same values, e.g.,

                                  enum Type1{A, B, C}
                                  
                                  enum Type2{A, B, C}
                                  
                                  assert Type1.A.toString().equals(Type2.A.toString());
                                  




                                  • 14. Re: Implicit marshalled values - a better way of handling re
                                    jason.greene

                                    The serialization format uses the class desc followed by the value. We could do this as well, using an illegal identifier symbol for a delimiter (;) so your example would become:

                                    foo.package.Type1;A

                                    The classloading problems can be easily handled by lazily loading the type when that portion of the fqn is accessed as an object. When accessed as a string, the above form is returned.

                                    However, there is nothing stopping the developer from doing this themself. So after thinking about it. It is not worth the effort.

                                    1 2 Previous Next