Common marshalling infrastructure
dmlloyd Aug 6, 2008 12:13 AMAt Manik's request, I'm starting a thread here to address this topic. I've started a rough design of an API which forms a basis for a more pluggable form of marshalling than what the basic Java serialization infrastructure provides, and I'd like to outline some of the design ideas here, as well as solicit feedback as to the usefulness of this design to JBoss Cache or any other project for that matter.
The requirements being addressed are as follows (some are from Manik's https://jira.jboss.org/jira/browse/JBCACHE-1336 and some are from JBoss Remoting):
* Make streams lightweight enough that they can be discarded, and/or provide a mechanism by which a stream may be reused easily and safely (that is, not leak disastrously)
* Allow a pluggable scheme for writing abbreviated class headers (for example, using a short magic number for commonly-used classes)
* Allow a pluggable scheme for writing actual objects
* Allow custom Externalizers for specific object types
* Allow a pluggable scheme for object replacement/resolution
* Make stream headers optional and/or pluggable
* Make compatibility with JDK serialization possible but optional (from both an API and a wire format perspective)
(Please add any more requirements that I missed).
I've made a proof-of-concept (well, evidence-of-concept anyway) API that addresses these requirements which I'll step through. I'm listing the interface name and URL before the description to prevent the forums from messing up the formatting of the URLs.
* MarshallerFactory (http://tinyurl.com/5z6gqj)
The main interface used to make marshaller instances is MarshallerFactory. This factory would be configured with all the pluggable elements (which I'll go through down below). Once it's fully configured, use the two create methods to create the actual Marshallers. The create option would be as lightweight as possible by design.
* Marshaller (http://tinyurl.com/5ompvf)
* Unmarshaller (http://tinyurl.com/6y6euy)
The two marshaller interfaces, Marshaller and Unmarshaller are fairly straightforward. You would call the start() method with a stream to read from or write to, and call finish() when you're done. Also there's methods to discard the class and instance cache as described by the javadoc to facilitate pooling if desired.
* StreamHeader (http://tinyurl.com/5nhg5o)
You can override the stream header by implementing StreamHeader and passing it in to the MarshallerFactory. In my opinion though, the default should be to write no header.
* ClassMarshaller (http://tinyurl.com/64yr7d)
* ClassMarshallerFactory (http://tinyurl.com/5j4vuy)
Customize writing the class header or reading and resolving the class header by implementing ClassMarshaller and its factory interface, ClassMarshallerFactory. The implementation gets plugged into the corresponding setter in MarshallerFactory. Probably it would be sensible to provide a base implementation of this class that reads and writes the class name similar to Java serialization, which can be extended to provide e.g. magic number capability as described in the comments of JBCACHE-1336. Implementations could also write classloader information or whatever. This class would also be responsible for maintaining the class cache; the instance would be discarded when the corresponding discard method is called on the marshaller/unmarshaller.
* ObjectMarshaller (http://tinyurl.com/5dxhup)
* ObjectMarshallerFactory (http://tinyurl.com/6dgmxy)
The basic object writing mechanism is pluggable by means of the ObjectMarshaller interface and its factory. The default implementation would be to support the standard Java serialization semantics. Also this object is what implements the object instance cache.
* Externalizer (http://tinyurl.com/5hofaj)
It should be possible to plug in a custom externalizer for any class (for example, non-serializable classes can be forwarded by using a custom externalizer). The Externalizer interface can fulfill this function. Right now I just have it set up as a list that gets scanned. But after thinking about it, maybe the MarshallerFactory should keep a Map<Class<?>, Externalizer> or something?
* MarshallerObjectInputStream (http://tinyurl.com/6csfdg)
* MarshallerObjectOutputStream (http://tinyurl.com/6e67oy)
Also I created lightweight(ish) abstract versions of ObjectInputStream and ObjectOutputStream which will make it much easier to make marshallers that support the Java serialization API (instances can be passed in to custom readObject() and writeObject() methods for example). There is a security access check every time an Object*Stream is constructed which we can't really get around, which might in turn slow things down enough that it would be better to pool marshallers than rebuild them per-request. Time will tell.
That's about it. You can browse around that directory or check out the whole works from http://anonsvn.jboss.org/repos/sandbox/david.lloyd/jboss-marshalling/trunk and look at it, and let me know what you think of the API.