1 2 Previous Next 28 Replies Latest reply on Mar 27, 2009 11:05 AM by dmlloyd

    Common marshalling infrastructure

    dmlloyd

      At Manik's request, I'm starting a thread here to address this topic. I've started a rough design of an API which forms a basis for a more pluggable form of marshalling than what the basic Java serialization infrastructure provides, and I'd like to outline some of the design ideas here, as well as solicit feedback as to the usefulness of this design to JBoss Cache or any other project for that matter.

      The requirements being addressed are as follows (some are from Manik's https://jira.jboss.org/jira/browse/JBCACHE-1336 and some are from JBoss Remoting):

      * Make streams lightweight enough that they can be discarded, and/or provide a mechanism by which a stream may be reused easily and safely (that is, not leak disastrously)
      * Allow a pluggable scheme for writing abbreviated class headers (for example, using a short magic number for commonly-used classes)
      * Allow a pluggable scheme for writing actual objects
      * Allow custom Externalizers for specific object types
      * Allow a pluggable scheme for object replacement/resolution
      * Make stream headers optional and/or pluggable
      * Make compatibility with JDK serialization possible but optional (from both an API and a wire format perspective)

      (Please add any more requirements that I missed).

      I've made a proof-of-concept (well, evidence-of-concept anyway) API that addresses these requirements which I'll step through. I'm listing the interface name and URL before the description to prevent the forums from messing up the formatting of the URLs.

      * MarshallerFactory (http://tinyurl.com/5z6gqj)

      The main interface used to make marshaller instances is MarshallerFactory. This factory would be configured with all the pluggable elements (which I'll go through down below). Once it's fully configured, use the two create methods to create the actual Marshallers. The create option would be as lightweight as possible by design.

      * Marshaller (http://tinyurl.com/5ompvf)
      * Unmarshaller (http://tinyurl.com/6y6euy)

      The two marshaller interfaces, Marshaller and Unmarshaller are fairly straightforward. You would call the start() method with a stream to read from or write to, and call finish() when you're done. Also there's methods to discard the class and instance cache as described by the javadoc to facilitate pooling if desired.

      * StreamHeader (http://tinyurl.com/5nhg5o)

      You can override the stream header by implementing StreamHeader and passing it in to the MarshallerFactory. In my opinion though, the default should be to write no header.

      * ClassMarshaller (http://tinyurl.com/64yr7d)
      * ClassMarshallerFactory (http://tinyurl.com/5j4vuy)

      Customize writing the class header or reading and resolving the class header by implementing ClassMarshaller and its factory interface, ClassMarshallerFactory. The implementation gets plugged into the corresponding setter in MarshallerFactory. Probably it would be sensible to provide a base implementation of this class that reads and writes the class name similar to Java serialization, which can be extended to provide e.g. magic number capability as described in the comments of JBCACHE-1336. Implementations could also write classloader information or whatever. This class would also be responsible for maintaining the class cache; the instance would be discarded when the corresponding discard method is called on the marshaller/unmarshaller.

      * ObjectMarshaller (http://tinyurl.com/5dxhup)
      * ObjectMarshallerFactory (http://tinyurl.com/6dgmxy)

      The basic object writing mechanism is pluggable by means of the ObjectMarshaller interface and its factory. The default implementation would be to support the standard Java serialization semantics. Also this object is what implements the object instance cache.

      * Externalizer (http://tinyurl.com/5hofaj)

      It should be possible to plug in a custom externalizer for any class (for example, non-serializable classes can be forwarded by using a custom externalizer). The Externalizer interface can fulfill this function. Right now I just have it set up as a list that gets scanned. But after thinking about it, maybe the MarshallerFactory should keep a Map<Class<?>, Externalizer> or something?

      * MarshallerObjectInputStream (http://tinyurl.com/6csfdg)
      * MarshallerObjectOutputStream (http://tinyurl.com/6e67oy)

      Also I created lightweight(ish) abstract versions of ObjectInputStream and ObjectOutputStream which will make it much easier to make marshallers that support the Java serialization API (instances can be passed in to custom readObject() and writeObject() methods for example). There is a security access check every time an Object*Stream is constructed which we can't really get around, which might in turn slow things down enough that it would be better to pool marshallers than rebuild them per-request. Time will tell.

      That's about it. You can browse around that directory or check out the whole works from http://anonsvn.jboss.org/repos/sandbox/david.lloyd/jboss-marshalling/trunk and look at it, and let me know what you think of the API.

        • 1. Re: Common marshalling infrastructure
          jason.greene

          This looks great to me. So essentially cache would define a ClassMarshaller which would use magic numbers, and also an Externalizer for the classes those numbers represent. Then it would use an ObjectMarshaller to actually build the byte array.

          • 2. Re: Common marshalling infrastructure
            dmlloyd

            I've changed the Externalizer in trunk so that there's an ExternalizerFactory which returns an Externalizer given a Class, or null if there is none.

            • 3. Re: Common marshalling infrastructure
              dmlloyd

               

              "jason.greene@jboss.com" wrote:
              This looks great to me. So essentially cache would define a ClassMarshaller which would use magic numbers, and also an Externalizer for the classes those numbers represent. Then it would use an ObjectMarshaller to actually build the byte array.


              Actually to build the byte array you could just implement ByteOutput and pass your implementation in to Marshaller.start().

              • 4. Re: Common marshalling infrastructure
                manik

                Looks good and well thought out. A couple of questions:

                * ByteInput and ByteOutput - could they be compatible with org.jboss.cache.io.ByteBuffer? Essentially this class is an extension of org.jgroups.util.Buffer, which provides direct access to a byte array. I'd hate to have to copy things around to use these new interfaces.

                * StreamHeader is useful for me since I need to add version bits to the stream such that the receiving end can use the appropriate marshaller implementation based on the version.

                * How are ObjectMarshallers selected? Based on the class info from the ClassMarshaller? Could this be ClassMarshaller + some token from the StreamHeader? (see previous point for reason why)

                * Pooling MOIS and MOOS - probably will have to happen due to the cost of constructing these, and if they are reused, why would you discard the Object and Cache Marshallers from the Marshaller?

                * Reference counting. Do you propose any sort of ref counting, if I were to write the same instance to the stream multiple times?

                • 5. Re: Common marshalling infrastructure
                  manik

                   

                  "manik.surtani@jboss.com" wrote:

                  * ByteInput and ByteOutput - could they be compatible with org.jboss.cache.io.ByteBuffer? Essentially this class is an extension of org.jgroups.util.Buffer, which provides direct access to a byte array. I'd hate to have to copy things around to use these new interfaces.


                  Ignore this, I just noticed that ByteInput and ByteOutput are interfaces and I can implement stuff that talks to ByteBuffer. :-)

                  • 6. Re: Common marshalling infrastructure
                    dmlloyd

                     

                    "manik.surtani@jboss.com" wrote:
                    * How are ObjectMarshallers selected? Based on the class info from the ClassMarshaller? Could this be ClassMarshaller + some token from the StreamHeader? (see previous point for reason why)


                    You'd configure your MarshallerFactory to use a specific ObjectMarshaller. The ObjectMarshaller is generally responsible for the wire format of the stream as well as object instance pooling. The current design expects you to select your marshaller before you start reading... I didn't think of the case where you want to somehow detect it. I guess the thing to do in this case is write a bit of code that reads the header, determines the marshaller to use, and then connects the stream to that header? I'm not sure there's a clean way to integrate detection that wouldn't be more easily done outside of the framework. I'll think about it some more, or if you have any ideas...?

                    "manik.surtani@jboss.com" wrote:
                    * Pooling MOIS and MOOS - probably will have to happen due to the cost of constructing these, and if they are reused, why would you discard the Object and Cache Marshallers from the Marshaller?


                    You want to dump your instance and class caches on every "session". The Object/ClassMarshallers should be very cheap to construct (basically building an empty hashmap on write, or an empty arraylist on read). The Marshallers *should* be decoupled from MOIS/MOOS usually; I would expect that an implementation would be able implement pooling of these instances if needed. Note that even if you retain 100% JDK compatibility, if you never serialize an object that has readObject/writeObject methods, then you never even need to have a MOIS/MOOS at all, so that's the cheapest cost possible. :-) In other words, if you use Externalizable, Externalizer, and default serialization exclusively, then the cost is never incurred.

                    Also keep in mind that constructing MOIS/MOOS are not *as* expensive as standard Object*Streams because they use the alternate "build my own implementation" constructor which just nulls out most of the fields. However there is still the issue of a security check on each construction, which may or may not be significant.


                    • 7. Re: Common marshalling infrastructure
                      dmlloyd

                       

                      "manik.surtani@jboss.com" wrote:
                      * Reference counting. Do you propose any sort of ref counting, if I were to write the same instance to the stream multiple times?


                      No, I hadn't planned on it - what do you have in mind? The instance and class cache would take care of making sure that full class/instance data is written to the stream only one time, if that's what you're getting at. There's no reference counting involved or anything like that though.

                      • 8. Re: Common marshalling infrastructure
                        manik

                        Re: ref counting, I understand about class data, but as long as instance data is also only written once then that is fine.

                        Re: version headers, perhaps I could swap ObjectMarshallers midway?

                        e.g.,

                        Map<Short, ObjectMarshaller> objectMarshallers = ... ;
                        ByteInput in = new ByteBufferByteInputAdapter(byteBuffer);
                        unmarshaller.start(in);
                        short versionId = (Short) unmarshaller.readObjectUnshared();
                        unmarshaller.setObjectMarshaller( objectMarshallers.get(versionId) );
                        // ...
                        // continue unmarshalling rest of my state
                        // ...
                        
                        



                        And when marshalling how would object marshallers be selected? I'm guessing a preference for anything that uses magic numbers (how would you denote that an internal class - which *may* implement Externalizable as well - should be marshalled based on a magic number? Another marker interface, perhaps? Or a sub-interface to Externalizable - Marshallable? And if this is not present then test for Externalizable, and then Serializable, etc., and finally falling back to an Externalizer?

                        • 9. Re: Common marshalling infrastructure
                          dmlloyd

                           

                          "manik.surtani@jboss.com" wrote:
                          Re: ref counting, I understand about class data, but as long as instance data is also only written once then that is fine.

                          Re: version headers, perhaps I could swap ObjectMarshallers midway?

                          e.g.,

                          Map<Short, ObjectMarshaller> objectMarshallers = ... ;
                          ByteInput in = new ByteBufferByteInputAdapter(byteBuffer);
                          unmarshaller.start(in);
                          short versionId = (Short) unmarshaller.readObjectUnshared();
                          unmarshaller.setObjectMarshaller( objectMarshallers.get(versionId) );
                          // ...
                          // continue unmarshalling rest of my state
                          // ...
                          
                          



                          Doing a readObject just to get a number could be overkill, maybe something simpler:

                          Map<Short, MarshallerFactory> marshallers = ...;
                          ByteInput in = ...
                          int magic = in.get() << 8 | in.get();
                          Unmarshaller unmarshaller = marshallers.get(Short.valueOf(magic)).createUnmarshaller();
                          // use unmarshaller to read your state
                          


                          "manik.surtani@jboss.com" wrote:
                          And when marshalling how would object marshallers be selected? I'm guessing a preference for anything that uses magic numbers (how would you denote that an internal class - which *may* implement Externalizable as well - should be marshalled based on a magic number? Another marker interface, perhaps? Or a sub-interface to Externalizable - Marshallable? And if this is not present then test for Externalizable, and then Serializable, etc., and finally falling back to an Externalizer?


                          The default (I guess) ObjectMarshaller would look for an Externalizer for that object, and if it's found, it would use that, otherwise it would follow the rules for object serialization as they are specified by the JDK. A subclass might add a "tag" byte that says "this is an already-known instance" and return a preset instance.

                          The default ClassMarshaller would work similarly - it would write a class descriptor per the JDK spec. Similarly a subclass might write a tag byte and magic number for preset classes.

                          If the object is not a pre-known instance, an ObjectMarshaller or ClassMarshaller could write an "unknown" tag byte and would delegate to the default instance of that interface to take the default action.


                          • 10. Re: Common marshalling infrastructure
                            manik

                            Ok, all sounds good so far.

                            • 11. Re: Common marshalling infrastructure
                              trustin

                              Would we need to read objects from a stream which contains objects serialized with different marshallers? (Read: Do we need unmarshaller auto-detection?)

                              • 12. Re: Common marshalling infrastructure
                                dmlloyd

                                I've been thinking about that - I think it is up to the user to handle detection (through an additional header, or some other mechanism, like the negotiation phase that Remoting 3 uses). The header written by the marshaller should be used to verify the stream's integrity and possibly select a version if the marshaller has multiple supported wire protocol versions.

                                • 13. Re: Common marshalling infrastructure
                                  trustin

                                  All sounds good to me, too. :)

                                  • 14. Re: Common marshalling infrastructure
                                    starksm64

                                    So where does the invocation target class loader fit in, I don't see it in the apis.

                                    I have been talking to Ron about some current remoting class loading issues that arise from the invocation handler being the only one who knows the correct class loader for unmarshalling application specific classes:

                                    1. thread pool receives a remoting request.
                                    2. unmarshall just enough to understand the request, but don't unmarshall any invocation payload. Application specific data needs to be isolated outside of the remoting control structures to allow this to happen.
                                    3. dispatch the request to a handler.
                                    4. handler sets the TCL
                                    5. handler or its delegate unmarshalls application payload
                                    6. application does what is does.
                                    7. handler serializes application return/exception and then unsets the TCL
                                    8. remoting layer completes request control information

                                    1 2 Previous Next